Author tim.peters
Recipients eric.smith, jdemeyer, mark.dickinson, rhettinger, sir-sigurd, tim.peters
Date 2018-10-06.22:24:15
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1538864655.8.0.545547206417.issue34751@psf.upfronthosting.co.za>
In-reply-to
Content
Thinking about the way xxHash works prompted me to try this initialization change:

    x = 0x345678UL + (Py_uhash_t)len; // adding in the length is new

That was a pure win in my tests, slashing the number of collisions in the new tuple test, 32-bit build, from 51 to 17.  In the 64-bit build, cut from 11 to 10, but when looking at the high 32 hash bits instead from 27 to 15.  There were also small improvements in other 32-bit build collision stats.

Full-blown xxHash (& SeaHash, and murmurhash, and ...) also fold in the length, but not inside their core loops.  It's useful info!

But they fold it in _after_ their core loops, not before.  They're apparently worried about this:  if their internal state ever reaches 0, that's a fixed point so long as all remaining incoming data is zeroes (in Python, e.g., picture adding one or more trailing integer or float zeroes or ..., which hash to 0).  Folding in the length at the end snaps it slightly out of zeroland.

I don't really care about that.  Folding in the length at the start quickly leads to large differences across iterations for tuples of different lengths (thanks to repeated mulitiplication and "propagate right" steps), which is especially helpful in things like the nested tuple tests where there's almost as much variation in tuple lengths as there is in the few little integers bottom-level tuples contain.
History
Date User Action Args
2018-10-06 22:24:15tim.peterssetrecipients: + tim.peters, rhettinger, mark.dickinson, eric.smith, jdemeyer, sir-sigurd
2018-10-06 22:24:15tim.peterssetmessageid: <1538864655.8.0.545547206417.issue34751@psf.upfronthosting.co.za>
2018-10-06 22:24:15tim.peterslinkissue34751 messages
2018-10-06 22:24:15tim.peterscreate