Message 279753 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	Eric Appelt
Recipients	Eric Appelt, berker.peksag, martin.panter, rhettinger, serhiy.storchaka, tim.peters, vstinner
Date	2016-10-30.23:14:19
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1477869260.64.0.760616166482.issue26163@psf.upfronthosting.co.za>
In-reply-to

Content
Ugh... so I think I made a mental error (i.e. confused myself) in the last comment. The result looked a bit "to good to be true" and I think that there is at least one error in my thinking about the problem. I tried testing with the width set to 2 and then 1 as a check and noticed that even without "widening" the problem was still fixed and the hash distributions matched that of pseudorandom numbers. It turns out that just running the XORed result of the shuffled entry hashes through the hashing algorithm gets rid of any bit patterns picked up through the process. Currently there is an LCG that is used to disperse patterns but I don't think it really helps the problems arising in these tests: hash = hash * 69069U + 907133923UL; I'm not attaching any more plots as the result can be described in words, but when the LCG applied to the hash after XORing all entries is replaced with a hashing of the result using the standard python hashing algorithm, the test problem goes away. Specifically, when 128 distinct sets are hashed, the resulting hashes have a distribution of unique values across their last 7 digits matches the distribution from pseudorandom numbers. The fix is implemented in a small patch that I have attached.

Ugh... so I think I made a mental error (i.e. confused myself) in the last comment. The result looked a bit "to good to be true" and I think that there is at least one error in my thinking about the problem.

I tried testing with the width set to 2 and then 1 as a check and noticed that even without "widening" the problem was still fixed and the hash distributions matched that of pseudorandom numbers.

It turns out that just running the XORed result of the shuffled entry hashes through the hashing algorithm gets rid of any bit patterns picked up through the process. Currently there is an LCG that is used to disperse patterns but I don't think it really helps the problems arising in these tests:

    hash = hash * 69069U + 907133923UL;

I'm not attaching any more plots as the result can be described in words, but when the LCG applied to the hash after XORing all entries is replaced with a hashing of the result using the standard python hashing algorithm, the test problem goes away. Specifically, when 128 distinct sets are hashed, the resulting hashes have a distribution of unique values across their last 7 digits matches the distribution from pseudorandom numbers.

The fix is implemented in a small patch that I have attached.

History
Date	User	Action	Args
2016-10-30 23:14:20	Eric Appelt	set	recipients: + Eric Appelt, tim.peters, rhettinger, vstinner, berker.peksag, martin.panter, serhiy.storchaka
2016-10-30 23:14:20	Eric Appelt	set	messageid: <1477869260.64.0.760616166482.issue26163@psf.upfronthosting.co.za>
2016-10-30 23:14:20	Eric Appelt	link	issue26163 messages
2016-10-30 23:14:20	Eric Appelt	create