Author tim.peters
Recipients eric.smith, jdemeyer, mark.dickinson, rhettinger, sir-sigurd, tim.peters
Date 2018-09-27.18:55:59
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1538074559.23.0.545547206417.issue34751@psf.upfronthosting.co.za>
In-reply-to
Content
I should have spelled this out before:  these are all permutations, so in general permuting the result space of `x * mult + y` (or any other permutation involving x and y) is exactly the same as not permuting it but applying a different permutation to y instead.

Specifically, the sequence:

    x = x * mult + y  (mod 2**N)
    x = P(x)  # P is any permutation of N-bit integers

is the same as (and this isn't "deep" - it's trivial):

    x = x * mult + Q(y, x)  (mod 2**N)
    
where Q(y, x) = P(x * mult + y) - x * mult  (mod 2**N)

Q is a "richer" class of permutation though, because it's a different permutation for each fixed value of `x`, and uses multiplication to help disperse bits.

While it would take a lot of work to quantify this, in my experiments the tuple hash is significantly less touchy when permuting x after than when permuting y before, presumably because Q is richer.

The two tuple hash tests have many inputs whose tuple component hashes have very similar (even identical) bit patterns.  There's only so much dirt-simple permutations (like "y ^= y << 1") can do to break the patterns.  So, e.g., change a shift count, change the multiplier, ..., and at least one of those two tests fails depressingly often.  Not spectacularly, but over the limit they allow.  Q(y, x) does a far better job of magnifying small differences.

Something I haven't tried:  use a richer permutation of y that _doesn't_ depend on x.  For example, the frozenset hash has much the same underlying challenge, and uses this permutation to "blow up" small input differences:

static Py_uhash_t
_shuffle_bits(Py_uhash_t h)
{
    return ((h ^ 89869747UL) ^ (h << 16)) * 3644798167UL;
}

But that's a lot more expensive than a single shift-xor, and the speed of tuple hashing is more important than of frozenset hashing.
History
Date User Action Args
2018-09-27 18:55:59tim.peterssetrecipients: + tim.peters, rhettinger, mark.dickinson, eric.smith, jdemeyer, sir-sigurd
2018-09-27 18:55:59tim.peterssetmessageid: <1538074559.23.0.545547206417.issue34751@psf.upfronthosting.co.za>
2018-09-27 18:55:59tim.peterslinkissue34751 messages
2018-09-27 18:55:59tim.peterscreate