This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients Arach, Arfrever, Huzaifa.Sidhpurwala, Mark.Shannon, PaulMcMillan, Zhiping.Deng, alex, barry, benjamin.peterson, christian.heimes, dmalcolm, eric.araujo, georg.brandl, gvanrossum, gz, jcea, lemburg, pitrou, skrah, terry.reedy, tim.peters, v+python, vstinner, zbysz
Date 2012-01-11.09:56:10
SpamBayes Score 7.438494e-15
Marked as misclassified No
Message-id <CAMpsgwZraWfhT_bXn2F-BmYUWkPKf5Gg1evRrijffn2xmTXjzg@mail.gmail.com>
In-reply-to <4F0D5638.7090903@egenix.com>
Content
>  * it is exceedingly complex

Which part exactly? For hash(str), it just add two extra XOR.

>  * the method would need to be implemented for all hashable Python types

It was already discussed, and it was said that only hash(str) need to
be modified.

>  * it causes startup time to increase (you need urandom data for
>   every single hashable Python data type)

My patch reads 8 or 16 bytes from /dev/urandom which doesn't block. Do
you have a benchmark showing a difference?

I didn't try my patch on Windows yet.

>  * it causes run-time to increase due to changes in the hash
>   algorithm (more operations in the tight loop)

I posted a micro-benchmark on hash(str) on python-dev: the overhead is
nul. Did you have numbers showing that the overhead is not nul?

>  * causes different processes in a multi-process setup to use different
>   hashes for the same object

Correct. If you need to get the same hash, you can disable the
randomized hash (PYTHONHASHSEED=0) or use a fixed seed (e.g.
PYTHONHASHSEED=42).

>  * doesn't appear to work well in embedded interpreters that
>   regularly restarted interpreters (AFAIK, some objects persist across
>   restarts and those will have wrong hash values in the newly started
>   instances)

test_capi runs _testembed which restarts a embedded interpreters 3
times, and the test pass (with my patch version 5). Can you write a
script showing the problem if there is a real problem?

In an older version of my patch, the hash secret was recreated at each
initiliazation. I changed my patch to only generate the secret once.

> The most important issue, though, is that it doesn't really
> protect Python against the attack - it only makes it less
> likely that an adversary will find the init vector (or a way
> around having to find it via crypt analysis).

I agree that the patch is not perfect. As written in the patch, it
just makes the attack more complex. I consider that it is enough.

Perl has a simpler protection than the one proposed in my patch. Is
Perl vulnerable to the hash collision vulnerability?
History
Date User Action Args
2012-01-11 09:56:12vstinnersetrecipients: + vstinner, lemburg, gvanrossum, tim.peters, barry, georg.brandl, terry.reedy, jcea, pitrou, christian.heimes, benjamin.peterson, eric.araujo, Arfrever, v+python, alex, zbysz, skrah, dmalcolm, gz, Arach, Mark.Shannon, Zhiping.Deng, Huzaifa.Sidhpurwala, PaulMcMillan
2012-01-11 09:56:11vstinnerlinkissue13703 messages
2012-01-11 09:56:10vstinnercreate