Author tim.peters
Recipients eric.smith, jdemeyer, mark.dickinson, rhettinger, sir-sigurd, tim.peters
Date 2018-09-24.04:16:30
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1537762591.54.0.956365154283.issue34751@psf.upfronthosting.co.za>
In-reply-to
Content
FYI, using this for the guts of the tuple hash works well on everything we've discussed.  In particular, no collisions in the current test_tuple hash test, and none either in the cases mixing negative and positive little ints.  This all remains so using the current multiplier, or "the standard" FNV-1a multipliers 16777619UL (32 bit) and 1099511628211UL (64 bit).  I've done no other tests:

while (--len >= 0) {
        y = PyObject_Hash(*p++);
        if (y == -1)
            return -1;
		Py_uhash_t t = (Py_uhash_t)y;
		t ^= t << 7;
		x = (x ^ t) * mult;
}

Note that the multiplier doesn't change anymore.  The twist is adding

		t ^= t << 7;

to _permute_ the native hash space (stuff like "t += t >> 16" is a many-to-one function, not a permutation).  Other than that, it's exactly FNV-1a.  I don't know that 7 is especially magical - it's just the first thing I tried.

In the absence of a real analysis, the intuition is simply that "t ^= t << 7" will clear masses of leading sign bits when hashing "small" negative integers.  For whatever reason(s), they appear to be the cause of the troubles.

However, since that line of code permutes the space, exactly the same statistics can be provoked by some other inputs.
History
Date User Action Args
2018-09-24 04:16:31tim.peterssetrecipients: + tim.peters, rhettinger, mark.dickinson, eric.smith, jdemeyer, sir-sigurd
2018-09-24 04:16:31tim.peterssetmessageid: <1537762591.54.0.956365154283.issue34751@psf.upfronthosting.co.za>
2018-09-24 04:16:31tim.peterslinkissue34751 messages
2018-09-24 04:16:30tim.peterscreate