Message 326119 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	tim.peters
Recipients	eric.smith, jdemeyer, mark.dickinson, rhettinger, sir-sigurd, tim.peters
Date	2018-09-23.00:16:29
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1537661790.93.0.956365154283.issue34751@psf.upfronthosting.co.za>
In-reply-to

Content
I strive not to believe anything in the absence of evidence ;-) FNV-1a supplanted Bernstein's scheme in many projects because it works better. Indeed, Python itself used FNV for string hashing before the security wonks got exercised over collision attacks. It worked great. But "better" depends on what you value. For example, FNV-1a has far better "avalanche" behavior than Bernstein[1]. Please note that I don't _object_ to your code! It may work fine. Or it may not in some cases. The problem is that we have no way to tell. There's no theory here, just misleading appeals to experience with the algorithms in contexts they were _intended_ to be used in. Nobody expected some patterns of nested tuples to fail catastrophically before, and nobody expected mixed-sign integers to lead to poor (but not catastrophic) behavior after the former was "repaired". Nobody now expects the next 10 catastrophes either. We can only rely on contrived tests and bug reports. Python's current scheme is unbeatable on that count, because it's the only scheme for which over a decade of experience _in context_ exists. Which experience says there are no known catastophically bad real cases. Which is worth a whole lot in a project that's so widely used now. That said, the "repair" before was at least as much pure hackery as your proposed hackery, and - I agree - completely undercut the _theory_ FNV was based on (although, to be fair, I doubt anyone else _knew_ they were re-inventing a damaged FNV-1a at the time). So ... since FNV-1a is in fact better in its intended context than the Bernstein one-liners in the same context, how about adding your t += (t >> (4 * sizeof(t))); hack to the _current_ code (& delete the code changing the multiplier)? Then we could switch to the "official" FNV 32-bit and 64-bit multipliers too, and know at least that "almost everything" about the resulting algorithm is known to work exceedingly well in its original context. [1] https://sites.google.com/site/murmurhash/avalanche

I strive not to believe anything in the absence of evidence ;-)

FNV-1a supplanted Bernstein's scheme in many projects because it works better. Indeed, Python itself used FNV for string hashing before the security wonks got exercised over collision attacks. It worked great. But "better" depends on what you value. For example, FNV-1a has far better "avalanche" behavior than Bernstein[1].

Please note that I don't _object_ to your code! It may work fine. Or it may not in some cases. The problem is that we have no way to tell. There's no theory here, just misleading appeals to experience with the algorithms in contexts they were _intended_ to be used in. Nobody expected some patterns of nested tuples to fail catastrophically before, and nobody expected mixed-sign integers to lead to poor (but not catastrophic) behavior after the former was "repaired". Nobody now expects the next 10 catastrophes either.

We can only rely on contrived tests and bug reports. Python's current scheme is unbeatable on that count, because it's the only scheme for which over a decade of experience _in context_ exists.

Which experience says there are no known catastophically bad real cases. Which is worth a whole lot in a project that's so widely used now.

That said, the "repair" before was at least as much pure hackery as your proposed hackery, and - I agree - completely undercut the _theory_ FNV was based on (although, to be fair, I doubt anyone else _knew_ they were re-inventing a damaged FNV-1a at the time).

So ... since FNV-1a is in fact better in its intended context than the Bernstein one-liners in the same context, how about adding your

t += (t >> (4 * sizeof(t)));

hack to the _current_ code (& delete the code changing the multiplier)? Then we could switch to the "official" FNV 32-bit and 64-bit multipliers too, and know at least that "almost everything" about the resulting algorithm is known to work exceedingly well in its original context.

[1] https://sites.google.com/site/murmurhash/avalanche

History
Date	User	Action	Args
2018-09-23 00:16:31	tim.peters	set	recipients: + tim.peters, rhettinger, mark.dickinson, eric.smith, jdemeyer, sir-sigurd
2018-09-23 00:16:30	tim.peters	set	messageid: <1537661790.93.0.956365154283.issue34751@psf.upfronthosting.co.za>
2018-09-23 00:16:30	tim.peters	link	issue34751 messages
2018-09-23 00:16:29	tim.peters	create