This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author tim.peters
Recipients Dmitry Rubanovich, methane, rhettinger, serhiy.storchaka, tim.peters, xiang.zhang
Date 2017-06-18.04:25:53
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1497759954.9.0.107726647919.issue30671@psf.upfronthosting.co.za>
In-reply-to
Content
The attached hashsim.py pretty much convinces me there's nothing left to be gained here.  It shows that the current scheme essentially achieves the performance of theoretical "uniform hashing" for string keys, for both successful and unsuccessful searches, as measured by the average number of probes needed.  Uniform hashing is the head scheme in which, for each key, the probe sequence is chosen uniformly at random from the set of all table slot permutations.  Its advantage is that it's feasible to analyze - its disadvantage is that it can't actually be sanely implemented ;-)

Back when the current scheme was implemented, string keys sometimes behaved "better than random", for reasons explained in the code comments.  The string hash now is more like a high-quality PRNG, so the performance of uniform hashing is the best that can be hoped for.  Dicts with int keys can still enjoy better-than-random performance.

Back then I was using a 32-bit box, and I'm pleased to see that PERTURB_SHIFT=5 still appears to work well on a 64-bit box (note:  to a first approximation you can emulate a 32-bit box on a 64-bit box by setting M64 in the code to a 32-bit mask).  But the choice appears far less touchy on a 64-bit box, and it may help a little to increase PERTURB_SHIFT.  Not so much in the average case, but for pathological cases, like int keys all of which have a lot of trailing zeroes.  Before the bits that differ get shifted down far enough to matter, they all follow the same probe sequence.  Increasing PERTURB_SHIFT would reduce the number of iterations needed before that happens.  But I'd wait until someone complains before churning the code just for that ;-)
History
Date User Action Args
2017-06-18 04:25:54tim.peterssetrecipients: + tim.peters, rhettinger, methane, serhiy.storchaka, xiang.zhang, Dmitry Rubanovich
2017-06-18 04:25:54tim.peterssetmessageid: <1497759954.9.0.107726647919.issue30671@psf.upfronthosting.co.za>
2017-06-18 04:25:54tim.peterslinkissue30671 messages
2017-06-18 04:25:54tim.peterscreate