This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author tim.peters
Recipients eric.smith, jdemeyer, mark.dickinson, rhettinger, sir-sigurd, tim.peters
Date 2018-09-25.00:13:21
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1537834402.61.0.545547206417.issue34751@psf.upfronthosting.co.za>
In-reply-to
Content
> advantage of my approach is that high-order bits become more
> important:

I don't much care about high-order bits, beyond that we don't systematically _lose_ them.  The dict and set lookup routines have their own strategies for incorporating high-order bits into the probe sequence, and those routines are the primary reason hash() exists in Python.

So there's scant reason to try to let high bits influence low bits (but good reason _not_ to, because - as explained before - the low-order bits are already in great shape for the important tuple component types).

Something else appears to be going on in this specific example, which has a very touchy set of hash codes:

> BEFORE:
> >>> L = [n << 60 for n in range(100)]; T = [(a,b,c) for a in L for b in L for c in L]; len(set(hash(x) for x in T))
> 500000

Which is what I get too on (at least) a 64-bit build.

> AFTER:
> >>> L = [n << 60 for n in range(100)]; T = [(a,b,c) for a in L for b in L for c in L]; len(set(hash(x) for x in T))
> 1000000

Ditto.  However, I get exactly the same from the code I posted before, which has no right shifts at all.

More, if I take the code exactly as it exists today, and merely comment out the

    mult += (Py_hash_t)(82520UL + len + len);

line, I also get a million distinct hash codes.

Half of the hash codes have bit 2**60 set, and apart from those all the hash codes have no bits set higher than 2**6.  Any number of accidents could cause the 2**60 bits to wipe each other out, and merely changing the sequence of multipliers was demonstrably enough to stop that.

Indeed, merely changing 82520 to 82524 in the current code is enough to get a million unique hash codes in this case.
History
Date User Action Args
2018-09-25 00:13:22tim.peterssetrecipients: + tim.peters, rhettinger, mark.dickinson, eric.smith, jdemeyer, sir-sigurd
2018-09-25 00:13:22tim.peterssetmessageid: <1537834402.61.0.545547206417.issue34751@psf.upfronthosting.co.za>
2018-09-25 00:13:22tim.peterslinkissue34751 messages
2018-09-25 00:13:21tim.peterscreate