Author josh.r
Recipients dstufft, josh.r, rhettinger
Date 2015-03-20.00:47:47
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
Assuming Siphash is in fact cryptographically secure (in the sense you can't choose a desired result hash with better odds that brute force), and it appears to be so, with a keyspace of 64 bits, if it's evenly distributed (which a cryptographically secure hash should be), that implies to even have a 1% chance of any two keys in a set colliding, you'd need over 2**29 keys ( you can plug in your own numbers for the conditional calculation here ). Even at the one ten thousandth of a percent collision threshold, you're talking about a set of 6 million strings to find even one pair that match.

I'd still be leery of using such an approach for general purpose sets and dicts, where they could conceivably contain enough entries to pose a risk (vanishingly small, but not "heat death of the universe" small). But for Python implementation dictionaries (module, nested scope, class, and instance dictionaries), where we're talking about maybe a thousand attributes in extreme cases, which are almost never under the control of an "attacker" in any event, I could definitely see a low risk win. If you're assuming a dictionary with less than 10,000 keys, that's a hit would be literally one in a trillion; under a hundred and you're below one in a quadrillion chance, which I think is safe enough. If you wanted to make it "safe" you could conceivably use an approach that changed algorithms up front, depending on the size of the dictionary; less than a hundred entries, use hash only lookup, above 100, use "safe" lookup.
Date User Action Args
2015-03-20 00:47:47josh.rsetrecipients: + josh.r, rhettinger, dstufft
2015-03-20 00:47:47josh.rsetmessageid: <>
2015-03-20 00:47:47josh.rlinkissue23712 messages
2015-03-20 00:47:47josh.rcreate