Author jdemeyer
Recipients jdemeyer, rhettinger
Date 2018-09-20.15:45:11
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1537458311.74.0.956365154283.issue34751@psf.upfronthosting.co.za>
In-reply-to
Content
> Nor is it hard to come up with collisions for most simple hash functions.

The Bernstein hash algorithm is a simple algorithm for which it can be proven mathematically that collisions cannot be generated if the multiplier is unknown. That is an objective improvement over the current tuple hash. Of course, in practice the multiplier is not secret, but it suffices that it is random enough.

> Addition also has a relationship to multiplication.

Of course, but not in a way that can be used to generate hash collisions. In fact, the simple relation between addition and multiplication makes this an actual provable mathematical statement.

> That is more suited to character data and small hash ranges (as opposed to 64-bit).

Maybe you are thinking of the multiplier 33 which is classically used for character data? If you replace the multiplier 33 by a larger number like _PyHASH_MULTIPLIER == 1000003 (ideally it would be a truly random number), it works fine for arbitrary sequences and arbitrary bit lengths.

> The important thing is that it has passed our testing

That's a silly argument. If it passed testing, it is only because the testing is insufficient.

But really: what I am proposing is a better hash without obvious collisions which won't be slower than the existing hash. Why would you be against that? Why should we "roll our own" hash instead of using a known good algorithm?
History
Date User Action Args
2018-09-20 15:45:11jdemeyersetrecipients: + jdemeyer, rhettinger
2018-09-20 15:45:11jdemeyersetmessageid: <1537458311.74.0.956365154283.issue34751@psf.upfronthosting.co.za>
2018-09-20 15:45:11jdemeyerlinkissue34751 messages
2018-09-20 15:45:11jdemeyercreate