This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author methane
Recipients methane
Date 2016-09-19.04:25:11
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1474259113.17.0.570923386086.issue28201@psf.upfronthosting.co.za>
In-reply-to
Content
Current perturb shift code is like following:

    for (perturb = hash; ; perturb >>= PERTURB_SHIFT) {
        i = mask & ((i << 2) + i + perturb + 1);

This loop is start after first conflict. It means perturb == hash for first conflict.

The purpose of perturb shift is avoid long conflict chain when keys more
than two have hashes their lower-bit is same. So first perturb should be hash >> PERTURB_SHIFT.

example: Consider about ma_keys == 16 and keys are [1, 17, 33, 49, 65].
Current perturb
1. hash(1) & (16-1) == 1; 1 uses ix==1 slot.
2. hash(17) & (16-1) == 1; ix==1 conflicts; Next ix is mask & (3 + 17 + 1) == 5; use ix==5 slot.
3. hash(33) & (16-1) == 1; ix==1 conflicts; Next ix is mask & (3 + 33 + 1) == 5; ix==5 conflicts; ...

When first perturb = hash >> PERTURB_SHIFT:
1. hash(1) & (16-1) == 1; 1 uses ix==1 slot.
2. hash(17) & (16-1) == 1; ix==1 conflicts; Next ix is mask & (3 + (17>>5) + 1) == 4; use ix==4 slot.
3. hash(33) & (16-1) == 1; ix==1 conflicts; Next ix is mask & (3 + (33>>5) + 1) == 5; use ix==5 slot.


While it's difficult to see performance difference from benchmark, this should be decrease possibility of 2nd conflict.
History
Date User Action Args
2016-09-19 04:25:14methanesetrecipients: + methane
2016-09-19 04:25:13methanesetmessageid: <1474259113.17.0.570923386086.issue28201@psf.upfronthosting.co.za>
2016-09-19 04:25:12methanelinkissue28201 messages
2016-09-19 04:25:11methanecreate