Author lemburg
Recipients Arach, Arfrever, Huzaifa.Sidhpurwala, Mark.Shannon, PaulMcMillan, Zhiping.Deng, alex, barry, benjamin.peterson, christian.heimes, dmalcolm, fx5, georg.brandl, gvanrossum, gz, haypo, jcea, lemburg, mark.dickinson, merwok, neologix, pitrou, skrah, terry.reedy, tim.peters, v+python, zbysz
Date 2012-01-12.09:27:35
SpamBayes Score 0.0
Marked as misclassified No
Message-id <4F0EA780.8020007@egenix.com>
In-reply-to <1326358404.42.0.384907026194.issue13703@psf.upfronthosting.co.za>
Content
Frank Sievertsen wrote:
> 
> I don't want my software to stop working because someone managed to enter 1000 bad strings into it. Think of a software that handles names of customers or filenames. We don't want it to break completely just because someone entered a few clever names.

Collision counting is just a simple way to trigger an action. As I mentioned
in my proposal on this ticket, raising an exception is just one way to deal
with the problem in case excessive collisions are found. A better way is to
add a universal hash method, so that the dict can adapt to the data and
modify the hash functions for just that dict (without breaking other
dicts or changing the standard hash functions).

Note that raising an exception doesn't completely break your software.
It just signals a severe problem with the input data and a likely
attack on your software. As such, it's no different than turning on DOS
attack prevention in your router.

In case you do get an exception, a web server will simply return a 500 error
and continue working normally.

For other applications, you may see a failure notice in your logs. If
you're sure that there are no possible ways to attack the application using
such data, then you can simply disable the feature to prevent such
exceptions.

> Randomization fixes most of these problems.

See my list of issues with this approach (further up on this ticket).

> However, it breaks the steadiness of hash(X) between two runs of the same software. There's probably code out there that assumes that hash(X) always returns the same value: database- or serialization-modules, for example.
> 
> There might be good reasons to also have a steady hash-function available. The broken code is hard to fix if no such a function is available at all. Maybe it's possible to add a second steady hash-functions later again?

This is one of the issues I mentioned.

> For the moment I think the best way is to turn on randomization of hash() by default, but having a way to turn it off.
History
Date User Action Args
2012-01-12 09:27:36lemburgsetrecipients: + lemburg, gvanrossum, tim.peters, barry, georg.brandl, terry.reedy, jcea, mark.dickinson, pitrou, haypo, christian.heimes, benjamin.peterson, merwok, Arfrever, v+python, alex, zbysz, skrah, dmalcolm, gz, neologix, Arach, Mark.Shannon, Zhiping.Deng, Huzaifa.Sidhpurwala, PaulMcMillan, fx5
2012-01-12 09:27:35lemburglinkissue13703 messages
2012-01-12 09:27:35lemburgcreate