Author lemburg
Recipients Arach, Arfrever, Huzaifa.Sidhpurwala, Mark.Shannon, PaulMcMillan, Zhiping.Deng, alex, barry, benjamin.peterson, christian.heimes, dmalcolm, eric.araujo, georg.brandl, gvanrossum, jcea, lemburg, pitrou, skrah, terry.reedy, v+python, vstinner
Date 2012-01-07.13:17:32
SpamBayes Score 0.0
Marked as misclassified No
Message-id <4F0845E2.5040304@egenix.com>
In-reply-to <CAO_YWRWerZ-cn_89wCdXu_oouiMTxRMphi0jgR-3GQE0cEGbUg@mail.gmail.com>
Content
Paul McMillan wrote:
> 
>> I'll upload a patch that demonstrates the collisions counting
>> strategy to show that detecting the problem is easy. Whether
>> just raising an exception is a good idea, is another issue.
> 
> I'm in cautious agreement that collision counting is a better
> strategy. The dict implementation performance would suffer from
> randomization.
> 
>> The dict implementation could then alter the hash parameter
>> and recreate the dict table in case the number of collisions
>> exceeds a certain limit, thereby actively taking action
>> instead of just relying on randomness solving the issue in
>> most cases.
> 
> This is clever. You basically neuter the attack as you notice it but
> everything else is business as usual. I'm concerned that this may end
> up being costly in some edge cases (e.g. look up how many collisions
> it takes to force the recreation, and then aim for just that many
> collisions many times). Unfortunately, each dict object has to
> discover for itself that it's full of offending hashes. Another
> approach would be to neuter the offending object by changing its hash,
> but this would require either returning multiple values, or fixing up
> existing dictionaries, neither of which seems feasible.

I ran some experiments with the collision counting patch and
could not trigger it in normal applications, not even in cases
that are documented in the dict implementation to have a poor
collision resolution behavior (integers with zeros the the low bits).
The probability of having to deal with dictionaries that create
over a thousand collisions for one of the key objects in a
real life application appears to be very very low.

Still, it may cause problems with existing applications for the
Python dot releases, so it's probably safer to add it in a
disabled-per-default form there (using an environment variable
to adjust the setting). For 3.3 it could be enabled per default
and it would also make sense to allow customizing the limit
using a sys module setting.

The idea with adding a parameter to the hash method/slot in order
to have objects provide a hash family function instead of a fixed
unparametrized hash function would probably have to be implemented
as additional hash method, e.g. .__uhash__() and tp_uhash ("u"
for universal).

The builtin types should then grow such methods
in order to make hashing safe against such attacks. For objects
defined in 3rd party extensions, we would need to encourage
implementing the slot/method as well. If it's not implemented,
the dict implementation would have to fallback to raising an
exception.

Please note that I'm just sketching things here. I don't have
time to work on a full-blown patch, just wanted to show what
I meant with the collision counting idea and demonstrate that
it actually works as intended.
History
Date User Action Args
2012-01-07 13:17:34lemburgsetrecipients: + lemburg, gvanrossum, barry, georg.brandl, terry.reedy, jcea, pitrou, vstinner, christian.heimes, benjamin.peterson, eric.araujo, Arfrever, v+python, alex, skrah, dmalcolm, Arach, Mark.Shannon, Zhiping.Deng, Huzaifa.Sidhpurwala, PaulMcMillan
2012-01-07 13:17:34lemburglinkissue13703 messages
2012-01-07 13:17:32lemburgcreate