This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lemburg
Recipients Arach, Arfrever, Huzaifa.Sidhpurwala, Mark.Shannon, PaulMcMillan, Zhiping.Deng, alex, barry, benjamin.peterson, christian.heimes, dmalcolm, eric.araujo, georg.brandl, gvanrossum, gz, jcea, lemburg, pitrou, skrah, terry.reedy, tim.peters, v+python, vstinner, zbysz
Date 2012-01-11.16:03:18
SpamBayes Score 1.110223e-16
Marked as misclassified No
Message-id <4F0DB2C1.6010308@egenix.com>
In-reply-to <1326293048.3531.6.camel@localhost.localdomain>
Content
Antoine Pitrou wrote:
> 
> Antoine Pitrou <pitrou@free.fr> added the comment:
> 
>> OTOH, the collision counting patch is very simple, doesn't have
>> the performance issues and provides real protection against the
>> attack.
> 
> I don't know about real protection: you can still slow down dict
> construction by 1000x (the number of allowed collisions per lookup),
> which can be enough combined with a brute-force DOS.

On my slow dev machine 1000 collisions run in around 22ms:

python2.7 -m timeit -n 100 "dict((x*(2**64 - 1), 1) for x in xrange(1, 1000))"
100 loops, best of 3: 22.4 msec per loop

Using this for a DOS attack would be rather noisy, much unlike
sending a single POST.

Note that the choice of 1000 as limit is rather arbitrary. I just
chose it because it's high enough because it's very unlikely to be
hit by an application that is not written to trigger it and it's low
enough to still provide a good run-time behavior. Perhaps an
even lower figure would be better.

> Also, how about false positives? Having legitimate programs break
> because of legitimate data would be a disaster.

Yes, which is why the patch should be disabled by default (using
an env var) in dot-releases. It's probably also a good idea to
make the limit configurable to adjust to ones needs.

Still, it is *very* unlikely that you run into real data causing
more than 1000 collisions for a single insert.

For full protection the universal hash method idea would have
to be implemented (adding a parameter to the hash methods, so
that they can be parametrized). This would then allow switching
the dict to an alternative hash implementation resolving the collision
problem, in case the implementation detects high number of
collisions.
History
Date User Action Args
2012-01-11 16:03:20lemburgsetrecipients: + lemburg, gvanrossum, tim.peters, barry, georg.brandl, terry.reedy, jcea, pitrou, vstinner, christian.heimes, benjamin.peterson, eric.araujo, Arfrever, v+python, alex, zbysz, skrah, dmalcolm, gz, Arach, Mark.Shannon, Zhiping.Deng, Huzaifa.Sidhpurwala, PaulMcMillan
2012-01-11 16:03:19lemburglinkissue13703 messages
2012-01-11 16:03:18lemburgcreate