Author Eric Appelt
Recipients Eric Appelt, berker.peksag, martin.panter, rhettinger, serhiy.storchaka, tim.peters, vstinner
Date 2016-10-30.23:14:19
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1477869260.64.0.760616166482.issue26163@psf.upfronthosting.co.za>
In-reply-to
Content
Ugh... so I think I made a mental error (i.e. confused myself) in the last comment. The result looked a bit "to good to be true" and I think that there is at least one error in my thinking about the problem.

I tried testing with the width set to 2 and then 1 as a check and noticed that even without "widening" the problem was still fixed and the hash distributions matched that of pseudorandom numbers.

It turns out that just running the XORed result of the shuffled entry hashes through the hashing algorithm gets rid of any bit patterns picked up through the process. Currently there is an LCG that is used to disperse patterns but I don't think it really helps the problems arising in these tests:

    hash = hash * 69069U + 907133923UL;

I'm not attaching any more plots as the result can be described in words, but when the LCG applied to the hash after XORing all entries is replaced with a hashing of the result using the standard python hashing algorithm, the test problem goes away. Specifically, when 128 distinct sets are hashed, the resulting hashes have a distribution of unique values across their last 7 digits matches the distribution from pseudorandom numbers.

The fix is implemented in a small patch that I have attached.
History
Date User Action Args
2016-10-30 23:14:20Eric Appeltsetrecipients: + Eric Appelt, tim.peters, rhettinger, vstinner, berker.peksag, martin.panter, serhiy.storchaka
2016-10-30 23:14:20Eric Appeltsetmessageid: <1477869260.64.0.760616166482.issue26163@psf.upfronthosting.co.za>
2016-10-30 23:14:20Eric Appeltlinkissue26163 messages
2016-10-30 23:14:20Eric Appeltcreate