Message 150795 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	Arach, Arfrever, Huzaifa.Sidhpurwala, Mark.Shannon, PaulMcMillan, Zhiping.Deng, alex, barry, benjamin.peterson, christian.heimes, dmalcolm, eric.araujo, georg.brandl, gvanrossum, jcea, lemburg, pitrou, skrah, terry.reedy, v+python, vstinner
Date	2012-01-07.13:17:32
SpamBayes Score	0.0
Marked as misclassified	No
Message-id	<4F0845E2.5040304@egenix.com>
In-reply-to	<CAO_YWRWerZ-cn_89wCdXu_oouiMTxRMphi0jgR-3GQE0cEGbUg@mail.gmail.com>

Content
Paul McMillan wrote: > >> I'll upload a patch that demonstrates the collisions counting >> strategy to show that detecting the problem is easy. Whether >> just raising an exception is a good idea, is another issue. > > I'm in cautious agreement that collision counting is a better > strategy. The dict implementation performance would suffer from > randomization. > >> The dict implementation could then alter the hash parameter >> and recreate the dict table in case the number of collisions >> exceeds a certain limit, thereby actively taking action >> instead of just relying on randomness solving the issue in >> most cases. > > This is clever. You basically neuter the attack as you notice it but > everything else is business as usual. I'm concerned that this may end > up being costly in some edge cases (e.g. look up how many collisions > it takes to force the recreation, and then aim for just that many > collisions many times). Unfortunately, each dict object has to > discover for itself that it's full of offending hashes. Another > approach would be to neuter the offending object by changing its hash, > but this would require either returning multiple values, or fixing up > existing dictionaries, neither of which seems feasible. I ran some experiments with the collision counting patch and could not trigger it in normal applications, not even in cases that are documented in the dict implementation to have a poor collision resolution behavior (integers with zeros the the low bits). The probability of having to deal with dictionaries that create over a thousand collisions for one of the key objects in a real life application appears to be very very low. Still, it may cause problems with existing applications for the Python dot releases, so it's probably safer to add it in a disabled-per-default form there (using an environment variable to adjust the setting). For 3.3 it could be enabled per default and it would also make sense to allow customizing the limit using a sys module setting. The idea with adding a parameter to the hash method/slot in order to have objects provide a hash family function instead of a fixed unparametrized hash function would probably have to be implemented as additional hash method, e.g. .__uhash__() and tp_uhash ("u" for universal). The builtin types should then grow such methods in order to make hashing safe against such attacks. For objects defined in 3rd party extensions, we would need to encourage implementing the slot/method as well. If it's not implemented, the dict implementation would have to fallback to raising an exception. Please note that I'm just sketching things here. I don't have time to work on a full-blown patch, just wanted to show what I meant with the collision counting idea and demonstrate that it actually works as intended.

Paul McMillan wrote:
> 
>> I'll upload a patch that demonstrates the collisions counting
>> strategy to show that detecting the problem is easy. Whether
>> just raising an exception is a good idea, is another issue.
> 
> I'm in cautious agreement that collision counting is a better
> strategy. The dict implementation performance would suffer from
> randomization.
> 
>> The dict implementation could then alter the hash parameter
>> and recreate the dict table in case the number of collisions
>> exceeds a certain limit, thereby actively taking action
>> instead of just relying on randomness solving the issue in
>> most cases.
> 
> This is clever. You basically neuter the attack as you notice it but
> everything else is business as usual. I'm concerned that this may end
> up being costly in some edge cases (e.g. look up how many collisions
> it takes to force the recreation, and then aim for just that many
> collisions many times). Unfortunately, each dict object has to
> discover for itself that it's full of offending hashes. Another
> approach would be to neuter the offending object by changing its hash,
> but this would require either returning multiple values, or fixing up
> existing dictionaries, neither of which seems feasible.

I ran some experiments with the collision counting patch and
could not trigger it in normal applications, not even in cases
that are documented in the dict implementation to have a poor
collision resolution behavior (integers with zeros the the low bits).
The probability of having to deal with dictionaries that create
over a thousand collisions for one of the key objects in a
real life application appears to be very very low.

Still, it may cause problems with existing applications for the
Python dot releases, so it's probably safer to add it in a
disabled-per-default form there (using an environment variable
to adjust the setting). For 3.3 it could be enabled per default
and it would also make sense to allow customizing the limit
using a sys module setting.

The idea with adding a parameter to the hash method/slot in order
to have objects provide a hash family function instead of a fixed
unparametrized hash function would probably have to be implemented
as additional hash method, e.g. .__uhash__() and tp_uhash ("u"
for universal).

The builtin types should then grow such methods
in order to make hashing safe against such attacks. For objects
defined in 3rd party extensions, we would need to encourage
implementing the slot/method as well. If it's not implemented,
the dict implementation would have to fallback to raising an
exception.

Please note that I'm just sketching things here. I don't have
time to work on a full-blown patch, just wanted to show what
I meant with the collision counting idea and demonstrate that
it actually works as intended.

History
Date	User	Action	Args
2012-01-07 13:17:34	lemburg	set	recipients: + lemburg, gvanrossum, barry, georg.brandl, terry.reedy, jcea, pitrou, vstinner, christian.heimes, benjamin.peterson, eric.araujo, Arfrever, v+python, alex, skrah, dmalcolm, Arach, Mark.Shannon, Zhiping.Deng, Huzaifa.Sidhpurwala, PaulMcMillan
2012-01-07 13:17:34	lemburg	link	issue13703 messages
2012-01-07 13:17:32	lemburg	create