Message 326200 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	tim.peters
Recipients	eric.smith, jdemeyer, mark.dickinson, rhettinger, sir-sigurd, tim.peters
Date	2018-09-24.04:16:30
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1537762591.54.0.956365154283.issue34751@psf.upfronthosting.co.za>
In-reply-to

Content
FYI, using this for the guts of the tuple hash works well on everything we've discussed. In particular, no collisions in the current test_tuple hash test, and none either in the cases mixing negative and positive little ints. This all remains so using the current multiplier, or "the standard" FNV-1a multipliers 16777619UL (32 bit) and 1099511628211UL (64 bit). I've done no other tests: while (--len >= 0) { y = PyObject_Hash(p++); if (y == -1) return -1; Py_uhash_t t = (Py_uhash_t)y; t ^= t << 7; x = (x ^ t) mult; } Note that the multiplier doesn't change anymore. The twist is adding t ^= t << 7; to _permute_ the native hash space (stuff like "t += t >> 16" is a many-to-one function, not a permutation). Other than that, it's exactly FNV-1a. I don't know that 7 is especially magical - it's just the first thing I tried. In the absence of a real analysis, the intuition is simply that "t ^= t << 7" will clear masses of leading sign bits when hashing "small" negative integers. For whatever reason(s), they appear to be the cause of the troubles. However, since that line of code permutes the space, exactly the same statistics can be provoked by some other inputs.

FYI, using this for the guts of the tuple hash works well on everything we've discussed.  In particular, no collisions in the current test_tuple hash test, and none either in the cases mixing negative and positive little ints.  This all remains so using the current multiplier, or "the standard" FNV-1a multipliers 16777619UL (32 bit) and 1099511628211UL (64 bit).  I've done no other tests:

while (--len >= 0) {
        y = PyObject_Hash(*p++);
        if (y == -1)
            return -1;
		Py_uhash_t t = (Py_uhash_t)y;
		t ^= t << 7;
		x = (x ^ t) * mult;
}

Note that the multiplier doesn't change anymore.  The twist is adding

		t ^= t << 7;

to _permute_ the native hash space (stuff like "t += t >> 16" is a many-to-one function, not a permutation).  Other than that, it's exactly FNV-1a.  I don't know that 7 is especially magical - it's just the first thing I tried.

In the absence of a real analysis, the intuition is simply that "t ^= t << 7" will clear masses of leading sign bits when hashing "small" negative integers.  For whatever reason(s), they appear to be the cause of the troubles.

However, since that line of code permutes the space, exactly the same statistics can be provoked by some other inputs.

History
Date	User	Action	Args
2018-09-24 04:16:31	tim.peters	set	recipients: + tim.peters, rhettinger, mark.dickinson, eric.smith, jdemeyer, sir-sigurd
2018-09-24 04:16:31	tim.peters	set	messageid: <1537762591.54.0.956365154283.issue34751@psf.upfronthosting.co.za>
2018-09-24 04:16:31	tim.peters	link	issue34751 messages
2018-09-24 04:16:30	tim.peters	create