I should have spelled this out before: these are all permutations, so in general permuting the result space of `x * mult + y` (or any other permutation involving x and y) is exactly the same as not permuting it but applying a different permutation to y instead.
Specifically, the sequence:
x = x * mult + y (mod 2**N)
x = P(x) # P is any permutation of N-bit integers
is the same as (and this isn't "deep" - it's trivial):
x = x * mult + Q(y, x) (mod 2**N)
where Q(y, x) = P(x * mult + y) - x * mult (mod 2**N)
Q is a "richer" class of permutation though, because it's a different permutation for each fixed value of `x`, and uses multiplication to help disperse bits.
While it would take a lot of work to quantify this, in my experiments the tuple hash is significantly less touchy when permuting x after than when permuting y before, presumably because Q is richer.
The two tuple hash tests have many inputs whose tuple component hashes have very similar (even identical) bit patterns. There's only so much dirt-simple permutations (like "y ^= y << 1") can do to break the patterns. So, e.g., change a shift count, change the multiplier, ..., and at least one of those two tests fails depressingly often. Not spectacularly, but over the limit they allow. Q(y, x) does a far better job of magnifying small differences.
Something I haven't tried: use a richer permutation of y that _doesn't_ depend on x. For example, the frozenset hash has much the same underlying challenge, and uses this permutation to "blow up" small input differences:
static Py_uhash_t
_shuffle_bits(Py_uhash_t h)
{
return ((h ^ 89869747UL) ^ (h << 16)) * 3644798167UL;
}
But that's a lot more expensive than a single shift-xor, and the speed of tuple hashing is more important than of frozenset hashing. |