Author jdemeyer
Recipients eric.smith, jdemeyer, mark.dickinson, rhettinger, sir-sigurd, tim.peters
Date 2018-09-22.21:38:06
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1537652286.94.0.956365154283.issue34751@psf.upfronthosting.co.za>
In-reply-to
Content
> the made-up hacks Python used to worm around a class of gross flaws its prior DJBX33X approach suffered, taking DJBX33X out of its original context and applying it in an area it wasn't designed for.

But we know why the DJBX33A hash didn't work (nested tuples), so we can think of the best way to solve that. Python messes with the multiplier, which makes it quite a different hash. Surely, if you believe that the precise choice of multiplier matters a lot, then you should also agree that arbitrarily changing the multiplier in the loop is a bad idea.

My proposal instead is to keep the structure of the DJBX33A hash but change the hash of the individual items to be hashed. That's a much less invasive change to the known algorithm.

Finally, something that I haven't mentioned here: an additional advantage of my approach is that high-order bits become more important:

BEFORE:
>>> L = [n << 60 for n in range(100)]; T = [(a,b,c) for a in L for b in L for c in L]; len(set(hash(x) for x in T))
500000

AFTER:
>>> L = [n << 60 for n in range(100)]; T = [(a,b,c) for a in L for b in L for c in L]; len(set(hash(x) for x in T))
1000000

Again, I'm not claiming that this is a major issue. Just additional evidence that maybe my new hash might actually be slightly better than the existing hash.
History
Date User Action Args
2018-09-22 21:38:06jdemeyersetrecipients: + jdemeyer, tim.peters, rhettinger, mark.dickinson, eric.smith, sir-sigurd
2018-09-22 21:38:06jdemeyersetmessageid: <1537652286.94.0.956365154283.issue34751@psf.upfronthosting.co.za>
2018-09-22 21:38:06jdemeyerlinkissue34751 messages
2018-09-22 21:38:06jdemeyercreate