Message 151048 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	Arach, Arfrever, Huzaifa.Sidhpurwala, Mark.Shannon, PaulMcMillan, Zhiping.Deng, alex, barry, benjamin.peterson, christian.heimes, dmalcolm, eric.araujo, georg.brandl, gvanrossum, gz, jcea, lemburg, pitrou, skrah, terry.reedy, tim.peters, v+python, vstinner, zbysz
Date	2012-01-11.09:56:10
SpamBayes Score	7.438494e-15
Marked as misclassified	No
Message-id	<CAMpsgwZraWfhT_bXn2F-BmYUWkPKf5Gg1evRrijffn2xmTXjzg@mail.gmail.com>
In-reply-to	<4F0D5638.7090903@egenix.com>

Content
> * it is exceedingly complex Which part exactly? For hash(str), it just add two extra XOR. > * the method would need to be implemented for all hashable Python types It was already discussed, and it was said that only hash(str) need to be modified. > * it causes startup time to increase (you need urandom data for > every single hashable Python data type) My patch reads 8 or 16 bytes from /dev/urandom which doesn't block. Do you have a benchmark showing a difference? I didn't try my patch on Windows yet. > * it causes run-time to increase due to changes in the hash > algorithm (more operations in the tight loop) I posted a micro-benchmark on hash(str) on python-dev: the overhead is nul. Did you have numbers showing that the overhead is not nul? > * causes different processes in a multi-process setup to use different > hashes for the same object Correct. If you need to get the same hash, you can disable the randomized hash (PYTHONHASHSEED=0) or use a fixed seed (e.g. PYTHONHASHSEED=42). > * doesn't appear to work well in embedded interpreters that > regularly restarted interpreters (AFAIK, some objects persist across > restarts and those will have wrong hash values in the newly started > instances) test_capi runs _testembed which restarts a embedded interpreters 3 times, and the test pass (with my patch version 5). Can you write a script showing the problem if there is a real problem? In an older version of my patch, the hash secret was recreated at each initiliazation. I changed my patch to only generate the secret once. > The most important issue, though, is that it doesn't really > protect Python against the attack - it only makes it less > likely that an adversary will find the init vector (or a way > around having to find it via crypt analysis). I agree that the patch is not perfect. As written in the patch, it just makes the attack more complex. I consider that it is enough. Perl has a simpler protection than the one proposed in my patch. Is Perl vulnerable to the hash collision vulnerability?

>  * it is exceedingly complex

Which part exactly? For hash(str), it just add two extra XOR.

>  * the method would need to be implemented for all hashable Python types

It was already discussed, and it was said that only hash(str) need to
be modified.

>  * it causes startup time to increase (you need urandom data for
>   every single hashable Python data type)

My patch reads 8 or 16 bytes from /dev/urandom which doesn't block. Do
you have a benchmark showing a difference?

I didn't try my patch on Windows yet.

>  * it causes run-time to increase due to changes in the hash
>   algorithm (more operations in the tight loop)

I posted a micro-benchmark on hash(str) on python-dev: the overhead is
nul. Did you have numbers showing that the overhead is not nul?

>  * causes different processes in a multi-process setup to use different
>   hashes for the same object

Correct. If you need to get the same hash, you can disable the
randomized hash (PYTHONHASHSEED=0) or use a fixed seed (e.g.
PYTHONHASHSEED=42).

>  * doesn't appear to work well in embedded interpreters that
>   regularly restarted interpreters (AFAIK, some objects persist across
>   restarts and those will have wrong hash values in the newly started
>   instances)

test_capi runs _testembed which restarts a embedded interpreters 3
times, and the test pass (with my patch version 5). Can you write a
script showing the problem if there is a real problem?

In an older version of my patch, the hash secret was recreated at each
initiliazation. I changed my patch to only generate the secret once.

> The most important issue, though, is that it doesn't really
> protect Python against the attack - it only makes it less
> likely that an adversary will find the init vector (or a way
> around having to find it via crypt analysis).

I agree that the patch is not perfect. As written in the patch, it
just makes the attack more complex. I consider that it is enough.

Perl has a simpler protection than the one proposed in my patch. Is
Perl vulnerable to the hash collision vulnerability?

History
Date	User	Action	Args
2012-01-11 09:56:12	vstinner	set	recipients: + vstinner, lemburg, gvanrossum, tim.peters, barry, georg.brandl, terry.reedy, jcea, pitrou, christian.heimes, benjamin.peterson, eric.araujo, Arfrever, v+python, alex, zbysz, skrah, dmalcolm, gz, Arach, Mark.Shannon, Zhiping.Deng, Huzaifa.Sidhpurwala, PaulMcMillan
2012-01-11 09:56:11	vstinner	link	issue13703 messages
2012-01-11 09:56:10	vstinner	create