Message 326583 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	tim.peters
Recipients	eric.smith, jdemeyer, mark.dickinson, rhettinger, sir-sigurd, tim.peters
Date	2018-09-27.18:55:59
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1538074559.23.0.545547206417.issue34751@psf.upfronthosting.co.za>
In-reply-to

Content
I should have spelled this out before: these are all permutations, so in general permuting the result space of `x * mult + y` (or any other permutation involving x and y) is exactly the same as not permuting it but applying a different permutation to y instead. Specifically, the sequence: x = x * mult + y (mod 2*N) x = P(x) # P is any permutation of N-bit integers is the same as (and this isn't "deep" - it's trivial): x = x mult + Q(y, x) (mod 2*N) where Q(y, x) = P(x mult + y) - x * mult (mod 2*N) Q is a "richer" class of permutation though, because it's a different permutation for each fixed value of `x`, and uses multiplication to help disperse bits. While it would take a lot of work to quantify this, in my experiments the tuple hash is significantly less touchy when permuting x after than when permuting y before, presumably because Q is richer. The two tuple hash tests have many inputs whose tuple component hashes have very similar (even identical) bit patterns. There's only so much dirt-simple permutations (like "y ^= y << 1") can do to break the patterns. So, e.g., change a shift count, change the multiplier, ..., and at least one of those two tests fails depressingly often. Not spectacularly, but over the limit they allow. Q(y, x) does a far better job of magnifying small differences. Something I haven't tried: use a richer permutation of y that _doesn't_ depend on x. For example, the frozenset hash has much the same underlying challenge, and uses this permutation to "blow up" small input differences: static Py_uhash_t _shuffle_bits(Py_uhash_t h) { return ((h ^ 89869747UL) ^ (h << 16)) 3644798167UL; } But that's a lot more expensive than a single shift-xor, and the speed of tuple hashing is more important than of frozenset hashing.

I should have spelled this out before:  these are all permutations, so in general permuting the result space of `x * mult + y` (or any other permutation involving x and y) is exactly the same as not permuting it but applying a different permutation to y instead.

Specifically, the sequence:

    x = x * mult + y  (mod 2**N)
    x = P(x)  # P is any permutation of N-bit integers

is the same as (and this isn't "deep" - it's trivial):

    x = x * mult + Q(y, x)  (mod 2**N)
    
where Q(y, x) = P(x * mult + y) - x * mult  (mod 2**N)

Q is a "richer" class of permutation though, because it's a different permutation for each fixed value of `x`, and uses multiplication to help disperse bits.

While it would take a lot of work to quantify this, in my experiments the tuple hash is significantly less touchy when permuting x after than when permuting y before, presumably because Q is richer.

The two tuple hash tests have many inputs whose tuple component hashes have very similar (even identical) bit patterns.  There's only so much dirt-simple permutations (like "y ^= y << 1") can do to break the patterns.  So, e.g., change a shift count, change the multiplier, ..., and at least one of those two tests fails depressingly often.  Not spectacularly, but over the limit they allow.  Q(y, x) does a far better job of magnifying small differences.

Something I haven't tried:  use a richer permutation of y that _doesn't_ depend on x.  For example, the frozenset hash has much the same underlying challenge, and uses this permutation to "blow up" small input differences:

static Py_uhash_t
_shuffle_bits(Py_uhash_t h)
{
    return ((h ^ 89869747UL) ^ (h << 16)) * 3644798167UL;
}

But that's a lot more expensive than a single shift-xor, and the speed of tuple hashing is more important than of frozenset hashing.

History
Date	User	Action	Args
2018-09-27 18:55:59	tim.peters	set	recipients: + tim.peters, rhettinger, mark.dickinson, eric.smith, jdemeyer, sir-sigurd
2018-09-27 18:55:59	tim.peters	set	messageid: <1538074559.23.0.545547206417.issue34751@psf.upfronthosting.co.za>
2018-09-27 18:55:59	tim.peters	link	issue34751 messages
2018-09-27 18:55:59	tim.peters	create