Noting that the failure of high-order bits to propagate is one form of "avalanche" failure. A recent 64-bit hash, SeaHash, takes this approach which is said to have provably "perfect" avalanche behavior:
Py_uhash_t t = (Py_uhash_t)y;
t *= (Py_uhash_t)0x6eed0e9da4d94a4fULL;
t ^= (t >> 32) >> (t >> 60);
t *= (Py_uhash_t)0x6eed0e9da4d94a4fULL;
The giant constant is just a 63-bit "patternless" prime (for whatever reason, the author wants this transformation to be easily invertible, so a prime is necessary - this is NOT a "crypto" hash). The first multiply propagates low-order bits left. Then the next line uses high-order bits to change low-order bits. Extracting a variable shift count from the data itself is a clever idea taken from the PCG family of PRNGs - you have to work to contrive data where this doesn't "act kinda random". The last line then propagates the - now altered by the high-order bits - lower-order bits left again.
Followed by
x = (x * mult) + t;
this yields a tuple hash that passes all the tests I have. The only collisions are in the new tuple test, which suffers 14.
Better, add the "catastrophic" right-rotate
t = (t >> 3) | (t << 61);
at the start and it's still only the new tuple test that has a collision - it rises to 19 then, close to (but still under) its failure cutoff.
What I haven't tried: in context it would be nice to eliminate the second multiply by the giant constant. We're doing a multiply anyway to fold `t` into `x`, which will propagate the low-order bits left on the next iteration's `x * mult`. That would ruin SeaHash's provable guarantees, but I'm more concerned about saving some cycles than purity ;-) If that added enough collisions to push the second tuple test "not much" over the limit, I'd raise its limit.
Gonzo: "the real" SeaHash duplicates the code above and works on 4 inputs simultaneously, designed to keep a modern processor's instruction pipelines as busy as possible. I'd rather not bother (most tuples are short).
So this is principled and, if the SeaHash theory is right, immune to any simple kind of catastrophic failure. Is it worth it? I'd sleep better, yes ;-) Note that the first multiply by the giant constant can be active at the same time as the `x * mult` at the end, so I'm guessing the speed hit would be bearable.
There's no truly _cheap_ way to get good propagation from all bit positions. SeaHash has the fastest way to do that I've encountered. |