Author Giovanni.Bajo
Recipients Arfrever, Giovanni.Bajo, PaulMcMillan, Vlado.Boza, alex, arigo, benjamin.peterson, camara, christian.heimes, dmalcolm, haypo, koniiiik, lemburg, mark.dickinson, serhiy.storchaka
Date 2012-11-07.13:16:39
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <F2A503FA-E7EB-40FB-A7BB-37BEB4723269@gmail.com>
In-reply-to <509A4D37.2050308@egenix.com>
Content
Il giorno 07/nov/2012, alle ore 12:59, Marc-Andre Lemburg <report@bugs.python.org> ha scritto:

> 
> Marc-Andre Lemburg added the comment:
> 
> On 07.11.2012 12:55, Mark Dickinson wrote:
>> 
>> Mark Dickinson added the comment:
>> 
>> [MAL]
>>> I don't understand why we are only trying to fix the string problem
>>> and completely ignore other key types.
>> 
>> [Armin]
>>> estimating the risks of giving up on a valid query for a truly random
>>> hash, at an overestimated one billion queries per second ...
>> 
>> That's fine in principle, but if this gets extended to integers, note that our current integer hash is about as far from 'truly random' as you can get:
>> 
>>    Python 3.4.0a0 (default:f02555353544, Nov  4 2012, 11:50:12) 
>>    [GCC 4.2.1 (Apple Inc. build 5664)] on darwin
>>    Type "help", "copyright", "credits" or "license" for more information.
>>>>> [hash(i) for i in range(20)]
>>    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
>> 
>> Moreover, it's going to be *very* hard to change the int hash while preserving the `x == y implies hash(x) == hash(y)` invariant across all the numeric types (int, float, complex, Decimal, Fraction, 3rd-party types that need to remain compatible).
> 
> Exactly. And that's why trying to find secure hash functions isn't
> going to solve the problem. Together with randomization they may
> make things better for strings, but they are no solution for numeric
> types, and they also don't allow detecting possible attacks on your
> systems.
> 
> But yeah, I'm repeating myself :-)
> 

I don't see how it follows. Python has several hash functions in its core, one of which is the string hash function; it is currently severely broken from a security standpoint; it also happens to be probably the most common case for dictionaries in Python, and the ones that it is more easily exploited in web frameworks. 

If we can manage to fix the string hash function (eg: through SipHash) we will be one step further in mitigating the possible attacks.

Solving collisions and mitigating attacks on numeric types is a totally different problem because it is a totally different function. I suggest we keep different discussions and different bugs for it. For instance, I'm only personally interested in mitigating attacks on the string hash function.
-- 
Giovanni Bajo
History
Date User Action Args
2012-11-07 13:16:40Giovanni.Bajosetrecipients: + Giovanni.Bajo, lemburg, arigo, mark.dickinson, haypo, christian.heimes, benjamin.peterson, Arfrever, alex, dmalcolm, PaulMcMillan, serhiy.storchaka, Vlado.Boza, koniiiik, camara
2012-11-07 13:16:40Giovanni.Bajolinkissue14621 messages
2012-11-07 13:16:39Giovanni.Bajocreate