This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author tim.peters
Recipients eric.smith, jdemeyer, mark.dickinson, rhettinger, sir-sigurd, tim.peters
Date 2018-10-01.18:47:34
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
If SeaHash is interesting to us (it is to me), I suggest dropping our DJB/FNV combining part entirely and using a purer form, like so:

    Py_uhash_t t = (Py_uhash_t)y;
    x ^= t ^ (t << 1);
    x *= (Py_uhash_t)0x6eed0e9da4d94a4fULL;
    x ^= (x >> 32) >> (x >> 60);
    x *= (Py_uhash_t)0x6eed0e9da4d94a4fULL;

The only collisions with that were 14 in the new tuple test.  Rotate t right by 3 first, and that drops to 13.  Much better than utter catastrophe ;-)

100% pure SeaHash does

    x ^= t;
at the start first, instead of `t ^ (t << 1)` on the RHS.  That fails the second tuple test, with 98 collisions.  Not catastrophic, just "too many".  But there's only so much we can expect from a cheap non-crypto-strength hash.  `t ^ (t << 1)` is a pragmatic tweek to get rid of masses of uselessly repeated sign bits, which do occur in realistic tuples (mixing positive and negative ints).  Adding that tweak shouldn't harm any of SeaHash's provable global properties, since `t ^ (t << 1)` is just a permutation of the input space.

Different constants would be needed for a 32-bit version (best guesses, untried:  a 31-bit random prime; s/32/16/; s/60/29/).

I don't yet know about potential SeaHash licensing issues.  It's part of Rust:
Date User Action Args
2018-10-01 18:47:34tim.peterssetrecipients: + tim.peters, rhettinger, mark.dickinson, eric.smith, jdemeyer, sir-sigurd
2018-10-01 18:47:34tim.peterssetmessageid: <>
2018-10-01 18:47:34tim.peterslinkissue34751 messages
2018-10-01 18:47:34tim.peterscreate