Message 151664 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	Arach, Arfrever, Huzaifa.Sidhpurwala, Jim.Jewett, Mark.Shannon, PaulMcMillan, Zhiping.Deng, alex, barry, benjamin.peterson, christian.heimes, dmalcolm, eric.araujo, eric.snow, fx5, georg.brandl, grahamd, gregory.p.smith, gvanrossum, gz, jcea, lemburg, mark.dickinson, neologix, pitrou, skrah, terry.reedy, tim.peters, v+python, vstinner, zbysz
Date	2012-01-20.01:11:23
SpamBayes Score	9.1858805e-09
Marked as misclassified	No
Message-id	<CAMpsgwYbiLgUvVnm+Pgv0o_Y1JbLVgRDuC_KXb5PoA+6HATMKQ@mail.gmail.com>
In-reply-to	<4F18B762.1000608@egenix.com>

Content
> Since the hash function is known, it doesn't make things much > harder. Without suffix you just need hash('') to find out what > the prefix is. With suffix, two values are enough. With my patch, hash('') always return zero. I don't remember who asked me to do that, but it avoids to leak too easily the secret :-) I wrote some info how to compute the secret: http://bugs.python.org/issue13703#msg150706 I don't see how to compute the secret, but it doesn't mean that it is impossible :-) I suppose that you have to brute force some bits, at least if you only have repr(dict) which gives only (indirectly) the lower bits of the hash. > (things obviously get tricky once overflow kicks in) hash() doesn't overflow: if you know the string, you can run the algorithm backward. To divide, you can compute 1/1000003 mod 2^32 (or mod 2^64): 2021759595 and 16109806864799210091. So x/1000003 mod 2^32 = x*2021759595 mod 2^32. See my invert_mod() function of: https://bitbucket.org/haypo/misc/src/tip/python/mathfunc.py > With Victor's approach hash(0) would output the whole seed, > but even if the seed is not known, creating an attack data > set is trivial, since hash(x) = P ^ x ^ S. I suppose that it would be too simple to compute the secret of a randomized integer hash, so it is maybe better to leave them unchanged. Using a different secret from strings and integer would not protect Python against an attack only using integers, but integer keys are less common than string keys (especially on web applications). Anyway, I changed my mind about randomized hash: I now prefer counting collisions :-)

> Since the hash function is known, it doesn't make things much
> harder. Without suffix you just need hash('') to find out what
> the prefix is. With suffix, two values are enough.

With my patch, hash('') always return zero. I don't remember who asked
me to do that, but it avoids to leak too easily the secret :-) I wrote
some info how to compute the secret:
http://bugs.python.org/issue13703#msg150706

I don't see how to compute the secret, but it doesn't mean that it is
impossible :-) I suppose that you have to brute force some bits, at
least if you only have repr(dict) which gives only (indirectly) the
lower bits of the hash.

> (things obviously get tricky once overflow kicks in)

hash() doesn't overflow: if you know the string, you can run the
algorithm backward. To divide, you can compute 1/1000003 mod 2^32 (or
mod 2^64): 2021759595 and 16109806864799210091. So x/1000003 mod 2^32
= x*2021759595 mod 2^32.

See my invert_mod() function of:
https://bitbucket.org/haypo/misc/src/tip/python/mathfunc.py

> With Victor's approach hash(0) would output the whole seed,
> but even if the seed is not known, creating an attack data
> set is trivial, since hash(x) = P ^ x ^ S.

I suppose that it would be too simple to compute the secret of a
randomized integer hash, so it is maybe better to leave them
unchanged. Using a different secret from strings and integer would not
protect Python against an attack only using integers, but integer keys
are less common than string keys (especially on web applications).

Anyway, I changed my mind about randomized hash: I now prefer counting
collisions :-)

History
Date	User	Action	Args
2012-01-20 01:11:24	vstinner	set	recipients: + vstinner, lemburg, gvanrossum, tim.peters, barry, georg.brandl, terry.reedy, gregory.p.smith, jcea, mark.dickinson, pitrou, christian.heimes, benjamin.peterson, eric.araujo, grahamd, Arfrever, v+python, alex, zbysz, skrah, dmalcolm, gz, neologix, Arach, Mark.Shannon, eric.snow, Zhiping.Deng, Huzaifa.Sidhpurwala, Jim.Jewett, PaulMcMillan, fx5
2012-01-20 01:11:24	vstinner	link	issue13703 messages
2012-01-20 01:11:23	vstinner	create