Author rhettinger
Recipients rhettinger
Date 2015-03-19.19:57:42
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1426795062.72.0.102774381262.issue23712@psf.upfronthosting.co.za>
In-reply-to
Content
This tracker item is for a thought experiment I'm running where I can collect the thoughts and discussions in one place.  It is not an active proposal for inclusion in Python.

The idea is to greatly speed-up the language for set/dict lookups of unicode value by skipping the exact comparison when the unicode type is exact and the 64-bit hash values are known to match.

Given the siphash and hash randomization, we get a 1 in 2**64 chance of a false positive (which is better than the error rate for non-ECC DRAM itself).  

However, since the siphash isn't cryptographically secure, presumably a malicious chooser of keys could generate a false positive on-purpose.

This technique is currently used by git and mercurial which use hash values for file and version graphs without checking for an exact match (because the chance of a false positive is vanishingly rare).

The Python test suite passes as does the test suites for a number of packages I have installed.
History
Date User Action Args
2015-03-19 19:57:42rhettingersetrecipients: + rhettinger
2015-03-19 19:57:42rhettingersetmessageid: <1426795062.72.0.102774381262.issue23712@psf.upfronthosting.co.za>
2015-03-19 19:57:42rhettingerlinkissue23712 messages
2015-03-19 19:57:42rhettingercreate