Author PaulMcMillan
Recipients Arach, Arfrever, Huzaifa.Sidhpurwala, Mark.Shannon, PaulMcMillan, Zhiping.Deng, alex, barry, benjamin.peterson, christian.heimes, dmalcolm, georg.brandl, gvanrossum, haypo, jcea, lemburg, merwok, pitrou, terry.reedy, v+python
Date 2012-01-06.20:56:40
SpamBayes Score 5.26179e-12
Marked as misclassified No
Message-id <CAO_YWRWerZ-cn_89wCdXu_oouiMTxRMphi0jgR-3GQE0cEGbUg@mail.gmail.com>
In-reply-to <4F06EDC3.7040600@egenix.com>
Content
> An attack can be based on trying to find many objects with the same
> hash value, or trying to find many objects that, as they get inserted
> into a dictionary, very often cause collisions due to the collision
> resolution algorithm not finding a free slot.

Yep. Allowing an attacker to produce very large dictionaries is also bad.

> if the application
> puts too much trust into large blobs of input data - which is
> the actual security issues we're trying to work around here...

To be very clear the issue is ANY large blob of data anywhere in the
application, not just on input. An attack could happen after whatever
transform your application runs on the data before returning it.

> I'll upload a patch that demonstrates the collisions counting
> strategy to show that detecting the problem is easy. Whether
> just raising an exception is a good idea, is another issue.

I'm in cautious agreement that collision counting is a better
strategy. The dict implementation performance would suffer from
randomization.

> The dict implementation could then alter the hash parameter
> and recreate the dict table in case the number of collisions
> exceeds a certain limit, thereby actively taking action
> instead of just relying on randomness solving the issue in
> most cases.

This is clever. You basically neuter the attack as you notice it but
everything else is business as usual. I'm concerned that this may end
up being costly in some edge cases (e.g. look up how many collisions
it takes to force the recreation, and then aim for just that many
collisions many times). Unfortunately, each dict object has to
discover for itself that it's full of offending hashes. Another
approach would be to neuter the offending object by changing its hash,
but this would require either returning multiple values, or fixing up
existing dictionaries, neither of which seems feasible.
History
Date User Action Args
2012-01-06 20:56:42PaulMcMillansetrecipients: + PaulMcMillan, lemburg, gvanrossum, barry, georg.brandl, terry.reedy, jcea, pitrou, haypo, christian.heimes, benjamin.peterson, merwok, Arfrever, v+python, alex, dmalcolm, Arach, Mark.Shannon, Zhiping.Deng, Huzaifa.Sidhpurwala
2012-01-06 20:56:41PaulMcMillanlinkissue13703 messages
2012-01-06 20:56:40PaulMcMillancreate