Message326375
> Suppose that there is a hash collision, say hash((3, 3)) ==
> hash((-3, -3)) and you change the hashing algorithm to fix
> this collision.
There are _two_ hash functions at play in that collision: the tuple hash function, and the integer hash function. Regardless of whether the _tuple_ hash function does
t ^= t << shift;
or
t += t >> shift;
or anything else involving just `t`, that only directly affects the result of the _int_ hash function. Whose low order bits cannot be improved in general - they're already optimally spread out in the most important cases.
> If that change does NOT affect the lower N bits,
> then you still have a collision hash((3, 3)) % 2**N ==
> hash((-3, -3)) % 2**N. This is relevant because the dict
> implementation starts by taking the lower bits first. So
> ideally we want to avoid hash collisions in the lower
> bits too.
Which the int hash on its own already does a superb job of doing in the most important cases. If you feel you just have to mess with low-order bits, do it instead on the _tuple_ hash intermediate results, not on its inputs. Like
x += x >> 16;
after t is folded in to x. It's x you care about directly, not t. Damaging desirable properties in t to _indirectly_ gain something in x is too clever by half.
Mucking with t to avoid the nested-tuple catastrophes is worth it, but I prefer to do that with as little damage to t's desirable properties as reasonably possible.
> This may also be a reason to avoid the standard FNV
> multipliers: the 64-bit FNV multiplier was chosen with
> the full 64-bit hash range in mind. It was never meant
> to work well when truncated to some lower N bits.
Do you believe any other multiplier would work better toward that end? For any odd multiplier M, the last k bits of i*M are determined solely by the last k bits of i, and
[(last k bits of i*M) for i in range(anything, anything + 2**k)]
is a permutation of range(2**k). The high order bits of i have nothing to do with it, and any odd multiplier permutes the lower k bits over any stretch of 2**k consecutive multiplicands.
Extracting low-order bits is intended to be done by "xor folding" in FNV, and I _expect_ the same would be prudent for any other multiplier:
https://tools.ietf.org/html/draft-eastlake-fnv-15#page-7
The dict and set lookup functions already do something stronger than xor folding to bring high bits into play, but do so incrementally as the probe sequence unfolds. |
|
Date |
User |
Action |
Args |
2018-09-25 16:54:28 | tim.peters | set | recipients:
+ tim.peters, rhettinger, mark.dickinson, eric.smith, jdemeyer, sir-sigurd |
2018-09-25 16:54:28 | tim.peters | set | messageid: <1537894468.79.0.545547206417.issue34751@psf.upfronthosting.co.za> |
2018-09-25 16:54:28 | tim.peters | link | issue34751 messages |
2018-09-25 16:54:28 | tim.peters | create | |
|