Message 150713 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	terry.reedy
Recipients	Arfrever, Huzaifa.Sidhpurwala, Mark.Shannon, PaulMcMillan, Zhiping.Deng, alex, barry, benjamin.peterson, christian.heimes, dmalcolm, eric.araujo, georg.brandl, gvanrossum, jcea, lemburg, pitrou, terry.reedy, v+python, vstinner
Date	2012-01-06.02:57:39
SpamBayes Score	6.938894e-15
Marked as misclassified	No
Message-id	<1325818660.63.0.112419628547.issue13703@psf.upfronthosting.co.za>
In-reply-to

Content
Those who use or advocate a simple randomized starting hash (Perl, Ruby, perhaps MS, and the CCC presenters) are presuming that the randomized hash values are kept private. Indeed, they should be (and the docs could note this) unless an attacker has direct access to the interpreter. An attacker who does, as in a Python programming class, can much more easily freeze the interpreter by 'accidentally' writing code equivalent to "while True: pass". I do not think we, as Python developers, should be concerned about esoteric timing attacks. They strike me as a site issue rather than a language issue. As I understand them, they require large numbers of probes coupled with responses based on the same hash function. So a site being so probed already has bit of a problem. And if hashing were randomized per process, and probes were randomly distributed among processes, and processes were periodically killed and restarted with new seeds, could such an attack get anywhere (besides the DOS effect of the probing)? The point of the CCC talk was that with one constant known hash, one could lock up a server for a long while with just one upload. So I think we should copy Perl and Ruby, do the easy thing, and add a random seed to 3.3 hashing, subject to keeping equality for equal numbers. Let whatever thereby fails, fail, and be updated. For prior versions, add an option for strings and perhaps numbers, and document that some tests will fail if enabled. We could also consider, for 3.3, making the output of hash() be different from the internal values used for dicts, perhaps by switching random seeds in hash(). So even if someone does return hash(x) values to potential attackers, they are not the values used in dicts. (This would require a slight change in the doc.)

Those who use or advocate a simple randomized starting hash (Perl, Ruby, perhaps MS, and the CCC presenters) are presuming that the randomized hash values are kept private. Indeed, they should be (and the docs could note this) unless an attacker has direct access to the interpreter. An attacker who does, as in a Python programming class, can much more easily freeze the interpreter by 'accidentally' writing code equivalent to "while True: pass".

I do not think we, as Python developers, should be concerned about esoteric timing attacks. They strike me as a site issue rather than a language issue. As I understand them, they require *large* numbers of probes coupled with responses based on the same hash function. So a site being so probed already has bit of a problem. And if hashing were randomized per process, and probes were randomly distributed among processes, and processes were periodically killed and restarted with new seeds, could such an attack get anywhere (besides the DOS effect of the probing)? The point of the CCC talk was that with one constant known hash, one could lock up a server for a long while with just one upload.

So I think we should copy Perl and Ruby, do the easy thing, and add a random seed to 3.3 hashing, subject to keeping equality for equal numbers. Let whatever thereby fails, fail, and be updated. For prior versions, add an option for strings and perhaps numbers, and document that some tests will fail if enabled.

We could also consider, for 3.3, making the output of hash() be different from the internal values used for dicts, perhaps by switching random seeds in hash(). So even if someone does return hash(x) values to potential attackers, they are not the values used in dicts. (This would require a slight change in the doc.)

History
Date	User	Action	Args
2012-01-06 02:57:40	terry.reedy	set	recipients: + terry.reedy, lemburg, gvanrossum, barry, georg.brandl, jcea, pitrou, vstinner, christian.heimes, benjamin.peterson, eric.araujo, Arfrever, v+python, alex, dmalcolm, Mark.Shannon, Zhiping.Deng, Huzaifa.Sidhpurwala, PaulMcMillan
2012-01-06 02:57:40	terry.reedy	set	messageid: <1325818660.63.0.112419628547.issue13703@psf.upfronthosting.co.za>
2012-01-06 02:57:40	terry.reedy	link	issue13703 messages
2012-01-06 02:57:39	terry.reedy	create