Author PaulMcMillan
Recipients Arfrever, PaulMcMillan, Zhiping.Deng, alex, barry, benjamin.peterson, christian.heimes, dmalcolm, georg.brandl, gvanrossum, jcea, pitrou, terry.reedy, vstinner
Date 2012-01-04.06:00:36
SpamBayes Score 0.0
Marked as misclassified No
Message-id <1325656838.58.0.858793240155.issue13703@psf.upfronthosting.co.za>
In-reply-to
Content
A couple of things here:

First, my proposed change is not cryptographically secure. There simply aren't any cryptographic hashing algorithms available that are in the performance class we need. My proposal does make the collision attack quite difficult to carry out, even if the raw output values from the hash are available to the attacker (they should not usually be).

I favor using the same algorithm between 32 and 64 bit builds for consistency of behavior. Developers would be startled to find that ordering stays consistent on a 64 bit build but varies on 32 bit builds. Additionally, the impracticality of attacking of 64 bit builds rests on the fact that these particular researchers didn't devise a way to do it. I'd hate to make this change and then have a clever mathematician publish some elegant point requiring us to go fix the problem all over again. 

I could be convinced either way on small strings. I like that they can't be used to attack the secret. At the same time, I worry that combining 2 different hashing routines into the same output space may introduce unexpected collisions and other difficult to debug edge-case conditions. It also means that the order of the hashes of long strings will vary while the order of short strings will not - another inconsistency which will encourage bugs.

Thank you Victor for the improvements to the python demonstration code. As Antoine said, it's only demo code, but better demo code is always good.

Antoine: That section of the manpage is referring to the overall security of a key generated using urandom. 256 bits is overkill for this application. We could take 256 bits and use them to generate a key using a cryptographically appropriate algorithm, but it's simpler to read more bits and use them directly as the key.

Additionally, that verbiage has been in the man page for urandom for quite some time (probably since the earliest version in the mid 90's). The PRNG has been improved since then.

Minimum length of r is a hard question. The shorter it is, the higher the correlation of the output. In my tests, 16kb was the amount necessary to generally do reasonably well on my test suite for randomness even with problematic input. Obviously our existing output isn't random, so it doesn't pass those tests at all. Using a fairly small value (4k) should not make the results much worse from a security perspective, but might be problematic from a collision/distribution standpoint. It's clear that we don't need cryptographically good randomness here, but passing the test suite is not a bad thing when considering the distribution.

When we settle on a C implementation, I'd like to run it through the smhasher set of tests to make sure we aren't making distribution worse, especially for very small values of r.
History
Date User Action Args
2012-01-04 06:00:38PaulMcMillansetrecipients: + PaulMcMillan, gvanrossum, barry, georg.brandl, terry.reedy, jcea, pitrou, vstinner, christian.heimes, benjamin.peterson, Arfrever, alex, dmalcolm, Zhiping.Deng
2012-01-04 06:00:38PaulMcMillansetmessageid: <1325656838.58.0.858793240155.issue13703@psf.upfronthosting.co.za>
2012-01-04 06:00:38PaulMcMillanlinkissue13703 messages
2012-01-04 06:00:36PaulMcMillancreate