This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author tim.peters
Recipients eric.smith, jdemeyer, mark.dickinson, rhettinger, sir-sigurd, tim.peters
Date 2018-09-29.06:03:25
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1538201006.51.0.545547206417.issue34751@psf.upfronthosting.co.za>
In-reply-to
Content
Jeroen, thanks for helping us fly slightly less blind! ;-) It's a lot of work.

I'd say you may as well pick a prime.  It's folklore, but "a reason" is that you've discovered that regular bit patterns in multipliers can hurt, and sticking to primes eliminates worlds of simple & subtle bit patterns.

Another bit of folklore:  pick a multiplier such that for each byte, neither it nor its complement are close in value to any other byte of the multiplier.  Which may seem more valuable for byte-oriented hashes, yet the byte-oriented FNV violates it spectacularly:  all FNV multipliers have _no_ bits set higher than 2**8 except for their leading bit.  So the longer the FNV multiplier, the more all-0 bytes it contains.

Which appears to be a deliberate choice to limit how quickly each new input byte propagates to other lower-order state bits across iterations.  The DJB33 algorithms accomplish that by using a tiny multiplier (relative to the output hash width) instead.

As I explained in other msgs here, treating tuple component hashes as sequences of bytes seems to deliver excellent results with the unaltered byte-oriented versions of FNV-1[a] and DJBX33[AX] - too good for anything in my test collection to detect anything even suspicious, let alone "a problem" (although it would be easy to contrive problem cases for 33[AX] - 33 is way "too small" to avoid collisions even across tuples with a single component).

So part of our challenge appears to me to be that we're trying instead to produce a hash from inputs of the _same_ bit width.  DJB/FNV were designed to produce hashes of width at least 4 times their inputs' bit widths.  Each input has a relatively tiny effect on the internal state for them.  For us, each input can change the entire internal state.
History
Date User Action Args
2018-09-29 06:03:26tim.peterssetrecipients: + tim.peters, rhettinger, mark.dickinson, eric.smith, jdemeyer, sir-sigurd
2018-09-29 06:03:26tim.peterssetmessageid: <1538201006.51.0.545547206417.issue34751@psf.upfronthosting.co.za>
2018-09-29 06:03:26tim.peterslinkissue34751 messages
2018-09-29 06:03:25tim.peterscreate