Message 326110 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	tim.peters
Recipients	eric.smith, jdemeyer, mark.dickinson, rhettinger, sir-sigurd, tim.peters
Date	2018-09-22.19:26:30
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1537644391.14.0.956365154283.issue34751@psf.upfronthosting.co.za>
In-reply-to

Content
So you don't know of any directly relevant research either. "Offhand I can't see anything wrong" is better than nothing, but very far from "and we know it will be OK because [see references 1 and 2]". That Bernstein's DJBX33A has been widely used inspires no confidence whatsoever. As already explained, Python _did_ use DJBX33X (with a different multiplier), and it systematically screwed up catastrophically in some nested tuple cases already spelled out. Bernstein himself moved to DJBX33X (using xor instead of addition), and that would have screwed up too on a mix of smallish integers of both signs. What Python does now, except for changing the multiplier, is essentially version 1a of the _also_ very widely used - but more recent - Fowler/Noll/Vo hash family[1]: # Version 1a, recommended over version 1 (which does * before ^). hash = offset_basis for each octet_of_data to be hashed hash = hash xor octet_of_data hash = hash * FNV_prime return hash They did extensive studies, and very deliberately use a prime multiplier subject to a number of other constraints. Not based on "offhand I can't see why not", but based on analysis and running extensive tests. But, same as with Bernstein's variants (which predate FNV's work on picking "good" multipliers), they were developed in the context of hashing sequences of unsigned 8-bit integers. There's no a priori reason I can see to expect them to work well in contexts other than that. Indeed, FNV puts special constraints on the last 8 bits of the primes they pick, presumably because they're only concerned with hashing sequences of 8-bit values. FYI, for 32-bit hashes they use multiplier 16777619, and for 64-bit 1099511628211. In the absence of coherent analysis directly relevant to what Python actually needs here, we're all flying blind. What we do have is over a decade of positive real-world experience with the made-up hacks Python used to worm around a class of gross flaws its prior DJBX33X approach suffered, taking DJBX33X out of its original context and applying it in an area it wasn't designed for. Which made it close to FNV-1a, but also in a context _that_ wasn't designed for. However, it _is_ structurally close to FNV-1a, and using prime multipliers was a big deal to them. "But offhand I don't see why" isn't a good enough reason to abandon that. To the contrary, in the absence of _proof_ that it doesn't actually matter, I'd be happiest using the same multipliers (given above) and initial values "the standard" FNV-1a algorithms use. [1] http://www.isthe.com/chongo/tech/comp/fnv/index.html#FNV-1a

So you don't know of any directly relevant research either. "Offhand I can't see anything wrong" is better than nothing, but very far from "and we know it will be OK because [see references 1 and 2]".

That Bernstein's DJBX33A has been widely used inspires no confidence whatsoever. As already explained, Python _did_ use DJBX33X (with a different multiplier), and it systematically screwed up catastrophically in some nested tuple cases already spelled out. Bernstein himself moved to DJBX33X (using xor instead of addition), and that would have screwed up too on a mix of smallish integers of both signs.

What Python does now, except for changing the multiplier, is essentially version 1a of the _also_ very widely used - but more recent - Fowler/Noll/Vo hash family[1]:

# Version 1a, recommended over version 1 (which does * before ^).
hash = offset_basis
for each octet_of_data to be hashed
hash = hash xor octet_of_data
hash = hash * FNV_prime
return hash

They did extensive studies, and very deliberately use a prime multiplier subject to a number of other constraints. Not based on "offhand I can't see why not", but based on analysis and running extensive tests.

But, same as with Bernstein's variants (which predate FNV's work on picking "good" multipliers), they were developed in the context of hashing sequences of unsigned 8-bit integers. There's no a priori reason I can see to expect them to work well in contexts other than that. Indeed, FNV puts special constraints on the last 8 bits of the primes they pick, presumably because they're only concerned with hashing sequences of 8-bit values.

FYI, for 32-bit hashes they use multiplier 16777619, and for 64-bit 1099511628211.

In the absence of coherent analysis directly relevant to what Python actually needs here, we're all flying blind. What we do have is over a decade of positive real-world experience with the made-up hacks Python used to worm around a class of gross flaws its prior DJBX33X approach suffered, taking DJBX33X out of its original context and applying it in an area it wasn't designed for. Which made it close to FNV-1a, but also in a context _that_ wasn't designed for.

However, it _is_ structurally close to FNV-1a, and using prime multipliers was a big deal to them. "But offhand I don't see why" isn't a good enough reason to abandon that. To the contrary, in the absence of _proof_ that it doesn't actually matter, I'd be happiest using the same multipliers (given above) and initial values "the standard" FNV-1a algorithms use.

[1] http://www.isthe.com/chongo/tech/comp/fnv/index.html#FNV-1a

History
Date	User	Action	Args
2018-09-22 19:26:31	tim.peters	set	recipients: + tim.peters, rhettinger, mark.dickinson, eric.smith, jdemeyer, sir-sigurd
2018-09-22 19:26:31	tim.peters	set	messageid: <1537644391.14.0.956365154283.issue34751@psf.upfronthosting.co.za>
2018-09-22 19:26:31	tim.peters	link	issue34751 messages
2018-09-22 19:26:30	tim.peters	create