Author Dmitry Rubanovich
Recipients Dmitry Rubanovich, methane, rhettinger, serhiy.storchaka, xiang.zhang
Date 2017-06-15.07:25:26
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1497511526.4.0.248351219099.issue30671@psf.upfronthosting.co.za>
In-reply-to
Content
lookdict_index() (and the rest of the files in dictobject.c) are using unnecessarily complicated perturb mechanism.  And, in fact, it's slower than the simpler case.

Instead of this:

for (size_t perturb = hash;;) {
    perturb >>= PERTURB_SHIFT;
    i = mask & ((i << 2) + i + perturb + 1);
....

it should do this:

for (size_t perturb = hash;;) {
    i = mask & ((i << 1) + perturb + 1);
    perturb >>= PERTURB_SHIFT;
....

This would not only save an instruction (a minor issue), but it would also reduce collisions.

I've attached a file which calculates frequencies of collisions for demonstration purposes.  It shows that the calculation, as it stands right now, does not create a 1-1 mapping even on the 1st iteration through the loop.  Moving PERTURB_SHIFT to the line before the calculation does reduce the density of the collision space.  But using the calculation, which I proposed, eliminates collisions on the 1st iteration completely and reduces it on most subsequent iterations.
History
Date User Action Args
2017-06-15 07:25:26Dmitry Rubanovichsetrecipients: + Dmitry Rubanovich, rhettinger, methane, serhiy.storchaka, xiang.zhang
2017-06-15 07:25:26Dmitry Rubanovichsetmessageid: <1497511526.4.0.248351219099.issue30671@psf.upfronthosting.co.za>
2017-06-15 07:25:26Dmitry Rubanovichlinkissue30671 messages
2017-06-15 07:25:26Dmitry Rubanovichcreate