Message326194
Has anyone figured out the real source of the degeneration when mixing in negative integers? I have not. XOR always permutes the hash range - it's one-to-one. No possible outputs are lost, and XOR with a negative int isn't "obviously degenerate" in general:
>>> for i in range(-16, 17):
... print(i, i ^ -5)
-16 11
-15 10
-14 9
-13 8
-12 15
-11 14
-10 13
-9 12
-8 3
-7 2
-6 1
-5 0
-4 7
-3 6
-2 5
-1 4
0 -5
1 -6
2 -7
3 -8
4 -1
5 -2
6 -3
7 -4
8 -13
9 -14
10 -15
11 -16
12 -9
13 -10
14 -11
15 -12
16 -21
Clear enough? "Distinct integers in, distinct integers out", but with the twist that the sign bit flips. That's _seemingly_ minor to me. Yet it has massive effects on tuple hash distribution. Here's a function to show that:
def doit(cands, repeat=4):
from collections import Counter
from itertools import product
print(len(cands), "cands **", repeat, "=", len(cands)**repeat)
c1 = Counter(map(hash, product(cands, repeat=repeat)))
print(len(c1), "distinct hash codes")
c2 = Counter(c1.values())
for dups in sorted(c2):
print(dups, c2[dups])
Then an example we're proud of:
>>> doit(range(100))
100 cands ** 4 = 100000000
100000000 distinct hash codes
1 100000000
Much the same, mixing in negative ints, but _excluding_ the problematic -1 and -2 inputs:
>>> cands = list(range(-50, 51))
>>> cands.remove(-1) # hashes to -2
>>> cands.remove(-2) # for j odd, j ^ -2 == -j
>>> doit(cands)
99 cands ** 4 = 96059601
15736247 distinct hash codes
1 70827
2 1005882
3 228578
4 5000424
5 19728
6 1047762
8 8363046
What accounts for such a massive difference? It's not merely that we're using negative ints:
>>> doit(range(-101, -1)) # leave -2 in, for a change
100 cands ** 4 = 100000000
100000000 distinct hash codes
1 100000000
So, on their own, negative ints are as spectacularly well-behaved in this range as positive ints, and including -2 creates no problems.
I suspect it's related to that x ^ -(x + 1) == -1, but not in a way obvious enough to be ... well, obvious ;-) |
|
Date |
User |
Action |
Args |
2018-09-24 00:33:46 | tim.peters | set | recipients:
+ tim.peters, rhettinger, mark.dickinson, eric.smith, jdemeyer, sir-sigurd |
2018-09-24 00:33:46 | tim.peters | set | messageid: <1537749226.73.0.956365154283.issue34751@psf.upfronthosting.co.za> |
2018-09-24 00:33:46 | tim.peters | link | issue34751 messages |
2018-09-24 00:33:44 | tim.peters | create | |
|