Message 238552 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	rhettinger
Recipients	rhettinger
Date	2015-03-19.19:57:42
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1426795062.72.0.102774381262.issue23712@psf.upfronthosting.co.za>
In-reply-to

Content
This tracker item is for a thought experiment I'm running where I can collect the thoughts and discussions in one place. It is not an active proposal for inclusion in Python. The idea is to greatly speed-up the language for set/dict lookups of unicode value by skipping the exact comparison when the unicode type is exact and the 64-bit hash values are known to match. Given the siphash and hash randomization, we get a 1 in 2**64 chance of a false positive (which is better than the error rate for non-ECC DRAM itself). However, since the siphash isn't cryptographically secure, presumably a malicious chooser of keys could generate a false positive on-purpose. This technique is currently used by git and mercurial which use hash values for file and version graphs without checking for an exact match (because the chance of a false positive is vanishingly rare). The Python test suite passes as does the test suites for a number of packages I have installed.

This tracker item is for a thought experiment I'm running where I can collect the thoughts and discussions in one place.  It is not an active proposal for inclusion in Python.

The idea is to greatly speed-up the language for set/dict lookups of unicode value by skipping the exact comparison when the unicode type is exact and the 64-bit hash values are known to match.

Given the siphash and hash randomization, we get a 1 in 2**64 chance of a false positive (which is better than the error rate for non-ECC DRAM itself).  

However, since the siphash isn't cryptographically secure, presumably a malicious chooser of keys could generate a false positive on-purpose.

This technique is currently used by git and mercurial which use hash values for file and version graphs without checking for an exact match (because the chance of a false positive is vanishingly rare).

The Python test suite passes as does the test suites for a number of packages I have installed.

History
Date	User	Action	Args
2015-03-19 19:57:42	rhettinger	set	recipients: + rhettinger
2015-03-19 19:57:42	rhettinger	set	messageid: <1426795062.72.0.102774381262.issue23712@psf.upfronthosting.co.za>
2015-03-19 19:57:42	rhettinger	link	issue23712 messages
2015-03-19 19:57:42	rhettinger	create