Message 150769 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	PaulMcMillan
Recipients	Arach, Arfrever, Huzaifa.Sidhpurwala, Mark.Shannon, PaulMcMillan, Zhiping.Deng, alex, barry, benjamin.peterson, christian.heimes, dmalcolm, eric.araujo, georg.brandl, gvanrossum, jcea, lemburg, pitrou, terry.reedy, v+python, vstinner
Date	2012-01-06.20:56:40
SpamBayes Score	5.261791e-12
Marked as misclassified	No
Message-id	<CAO_YWRWerZ-cn_89wCdXu_oouiMTxRMphi0jgR-3GQE0cEGbUg@mail.gmail.com>
In-reply-to	<4F06EDC3.7040600@egenix.com>

Content
> An attack can be based on trying to find many objects with the same > hash value, or trying to find many objects that, as they get inserted > into a dictionary, very often cause collisions due to the collision > resolution algorithm not finding a free slot. Yep. Allowing an attacker to produce very large dictionaries is also bad. > if the application > puts too much trust into large blobs of input data - which is > the actual security issues we're trying to work around here... To be very clear the issue is ANY large blob of data anywhere in the application, not just on input. An attack could happen after whatever transform your application runs on the data before returning it. > I'll upload a patch that demonstrates the collisions counting > strategy to show that detecting the problem is easy. Whether > just raising an exception is a good idea, is another issue. I'm in cautious agreement that collision counting is a better strategy. The dict implementation performance would suffer from randomization. > The dict implementation could then alter the hash parameter > and recreate the dict table in case the number of collisions > exceeds a certain limit, thereby actively taking action > instead of just relying on randomness solving the issue in > most cases. This is clever. You basically neuter the attack as you notice it but everything else is business as usual. I'm concerned that this may end up being costly in some edge cases (e.g. look up how many collisions it takes to force the recreation, and then aim for just that many collisions many times). Unfortunately, each dict object has to discover for itself that it's full of offending hashes. Another approach would be to neuter the offending object by changing its hash, but this would require either returning multiple values, or fixing up existing dictionaries, neither of which seems feasible.

> An attack can be based on trying to find many objects with the same
> hash value, or trying to find many objects that, as they get inserted
> into a dictionary, very often cause collisions due to the collision
> resolution algorithm not finding a free slot.

Yep. Allowing an attacker to produce very large dictionaries is also bad.

> if the application
> puts too much trust into large blobs of input data - which is
> the actual security issues we're trying to work around here...

To be very clear the issue is ANY large blob of data anywhere in the
application, not just on input. An attack could happen after whatever
transform your application runs on the data before returning it.

> I'll upload a patch that demonstrates the collisions counting
> strategy to show that detecting the problem is easy. Whether
> just raising an exception is a good idea, is another issue.

I'm in cautious agreement that collision counting is a better
strategy. The dict implementation performance would suffer from
randomization.

> The dict implementation could then alter the hash parameter
> and recreate the dict table in case the number of collisions
> exceeds a certain limit, thereby actively taking action
> instead of just relying on randomness solving the issue in
> most cases.

This is clever. You basically neuter the attack as you notice it but
everything else is business as usual. I'm concerned that this may end
up being costly in some edge cases (e.g. look up how many collisions
it takes to force the recreation, and then aim for just that many
collisions many times). Unfortunately, each dict object has to
discover for itself that it's full of offending hashes. Another
approach would be to neuter the offending object by changing its hash,
but this would require either returning multiple values, or fixing up
existing dictionaries, neither of which seems feasible.

History
Date	User	Action	Args
2012-01-06 20:56:42	PaulMcMillan	set	recipients: + PaulMcMillan, lemburg, gvanrossum, barry, georg.brandl, terry.reedy, jcea, pitrou, vstinner, christian.heimes, benjamin.peterson, eric.araujo, Arfrever, v+python, alex, dmalcolm, Arach, Mark.Shannon, Zhiping.Deng, Huzaifa.Sidhpurwala
2012-01-06 20:56:41	PaulMcMillan	link	issue13703 messages
2012-01-06 20:56:40	PaulMcMillan	create