Message 327038 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	jdemeyer
Recipients	eric.smith, jdemeyer, mark.dickinson, rhettinger, sir-sigurd, tim.peters
Date	2018-10-04.06:06:09
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1538633169.89.0.545547206417.issue34751@psf.upfronthosting.co.za>
In-reply-to

Content
> I've posted several SeaHash cores that suffer no collisions at all in any of our tests (including across every "bad example" in these 100+ messages), except for "the new" tuple test. Which it also passed, most recently with 7 collisions. That was under 64-bit builds, though, and from what follows I figure you're only looking at 32-bit builds for now. Note that I'm always considering parametrized versions of the hash functions that I'm testing. I'm replacing the fixed multiplier (all algorithms mentioned here have such a thing) by a random multiplier which is 3 mod 8. I keep the other constants. This allows me to look at the probability of passing the testsuite. That says a lot more than a simple yes/no answer for a single test. You don't want to pass the testsuite by random chance, you want to pass the testsuite because you have a high probability of passing it. And I'm testing 32-bit hashes indeed since we need to support that anyway and the probability of collisions is high enough to get interesting statistical data. For SeaHash, the probability of passing my new tuple test was only around 55%. For xxHash, this was about 85%. Adding some input mangling improved both scores, but the xxHash variant was still better than SeaHash.

> I've posted several SeaHash cores that suffer no collisions at all in any of our tests (including across every "bad example" in these 100+ messages), except for "the new" tuple test.  Which it also passed, most recently with 7 collisions.  That was under 64-bit builds, though, and from what follows I figure you're only looking at 32-bit builds for now.

Note that I'm always considering parametrized versions of the hash functions that I'm testing. I'm replacing the fixed multiplier (all algorithms mentioned here have such a thing) by a random multiplier which is 3 mod 8. I keep the other constants. This allows me to look at the *probability* of passing the testsuite. That says a lot more than a simple yes/no answer for a single test. You don't want to pass the testsuite by random chance, you want to pass the testsuite because you have a high probability of passing it.

And I'm testing 32-bit hashes indeed since we need to support that anyway and the probability of collisions is high enough to get interesting statistical data.

For SeaHash, the probability of passing my new tuple test was only around 55%. For xxHash, this was about 85%. Adding some input mangling improved both scores, but the xxHash variant was still better than SeaHash.

History
Date	User	Action	Args
2018-10-04 06:06:09	jdemeyer	set	recipients: + jdemeyer, tim.peters, rhettinger, mark.dickinson, eric.smith, sir-sigurd
2018-10-04 06:06:09	jdemeyer	set	messageid: <1538633169.89.0.545547206417.issue34751@psf.upfronthosting.co.za>
2018-10-04 06:06:09	jdemeyer	link	issue34751 messages
2018-10-04 06:06:09	jdemeyer	create