This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients Arach, Arfrever, Huzaifa.Sidhpurwala, Jim.Jewett, Mark.Shannon, PaulMcMillan, Zhiping.Deng, alex, barry, benjamin.peterson, christian.heimes, dmalcolm, eric.araujo, eric.snow, fx5, georg.brandl, grahamd, gregory.p.smith, gvanrossum, gz, jcea, lemburg, mark.dickinson, neologix, pitrou, skrah, terry.reedy, tim.peters, v+python, vstinner, zbysz
Date 2012-01-20.01:11:23
SpamBayes Score 9.1858805e-09
Marked as misclassified No
Message-id <CAMpsgwYbiLgUvVnm+Pgv0o_Y1JbLVgRDuC_KXb5PoA+6HATMKQ@mail.gmail.com>
In-reply-to <4F18B762.1000608@egenix.com>
Content
> Since the hash function is known, it doesn't make things much
> harder. Without suffix you just need hash('') to find out what
> the prefix is. With suffix, two values are enough.

With my patch, hash('') always return zero. I don't remember who asked
me to do that, but it avoids to leak too easily the secret :-) I wrote
some info how to compute the secret:
http://bugs.python.org/issue13703#msg150706

I don't see how to compute the secret, but it doesn't mean that it is
impossible :-) I suppose that you have to brute force some bits, at
least if you only have repr(dict) which gives only (indirectly) the
lower bits of the hash.

> (things obviously get tricky once overflow kicks in)

hash() doesn't overflow: if you know the string, you can run the
algorithm backward. To divide, you can compute 1/1000003 mod 2^32 (or
mod 2^64): 2021759595 and 16109806864799210091. So x/1000003 mod 2^32
= x*2021759595 mod 2^32.

See my invert_mod() function of:
https://bitbucket.org/haypo/misc/src/tip/python/mathfunc.py

> With Victor's approach hash(0) would output the whole seed,
> but even if the seed is not known, creating an attack data
> set is trivial, since hash(x) = P ^ x ^ S.

I suppose that it would be too simple to compute the secret of a
randomized integer hash, so it is maybe better to leave them
unchanged. Using a different secret from strings and integer would not
protect Python against an attack only using integers, but integer keys
are less common than string keys (especially on web applications).

Anyway, I changed my mind about randomized hash: I now prefer counting
collisions :-)
History
Date User Action Args
2012-01-20 01:11:24vstinnersetrecipients: + vstinner, lemburg, gvanrossum, tim.peters, barry, georg.brandl, terry.reedy, gregory.p.smith, jcea, mark.dickinson, pitrou, christian.heimes, benjamin.peterson, eric.araujo, grahamd, Arfrever, v+python, alex, zbysz, skrah, dmalcolm, gz, neologix, Arach, Mark.Shannon, eric.snow, Zhiping.Deng, Huzaifa.Sidhpurwala, Jim.Jewett, PaulMcMillan, fx5
2012-01-20 01:11:24vstinnerlinkissue13703 messages
2012-01-20 01:11:23vstinnercreate