classification
Title: dict: perturb shift should be done when first conflict
Type: performance Stage: patch review
Components: Interpreter Core Versions: Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: inada.naoki Nosy List: inada.naoki, josh.r, python-dev, rhettinger, serhiy.storchaka, tim.peters
Priority: normal Keywords: patch

Created on 2016-09-19 04:25 by inada.naoki, last changed 2017-03-31 16:36 by dstufft. This issue is now closed.

Files
File name Uploaded Description Edit
dict-perturb-shift.patch inada.naoki, 2016-09-20 17:09 review
dict-perturb-shift.patch inada.naoki, 2016-09-21 09:22 smaller diff review
dict-perturb-shift2.patch inada.naoki, 2016-10-05 04:42 review
dict-perturb-shift3.patch inada.naoki, 2016-10-06 05:14 review
dict-perturb-shift4.patch inada.naoki, 2016-10-06 05:20 review
Pull Requests
URL Status Linked Edit
PR 552 closed dstufft, 2017-03-31 16:36
Messages (14)
msg276936 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2016-09-19 04:25
Current perturb shift code is like following:

    for (perturb = hash; ; perturb >>= PERTURB_SHIFT) {
        i = mask & ((i << 2) + i + perturb + 1);

This loop is start after first conflict. It means perturb == hash for first conflict.

The purpose of perturb shift is avoid long conflict chain when keys more
than two have hashes their lower-bit is same. So first perturb should be hash >> PERTURB_SHIFT.

example: Consider about ma_keys == 16 and keys are [1, 17, 33, 49, 65].
Current perturb
1. hash(1) & (16-1) == 1; 1 uses ix==1 slot.
2. hash(17) & (16-1) == 1; ix==1 conflicts; Next ix is mask & (3 + 17 + 1) == 5; use ix==5 slot.
3. hash(33) & (16-1) == 1; ix==1 conflicts; Next ix is mask & (3 + 33 + 1) == 5; ix==5 conflicts; ...

When first perturb = hash >> PERTURB_SHIFT:
1. hash(1) & (16-1) == 1; 1 uses ix==1 slot.
2. hash(17) & (16-1) == 1; ix==1 conflicts; Next ix is mask & (3 + (17>>5) + 1) == 4; use ix==4 slot.
3. hash(33) & (16-1) == 1; ix==1 conflicts; Next ix is mask & (3 + (33>>5) + 1) == 5; use ix==5 slot.


While it's difficult to see performance difference from benchmark, this should be decrease possibility of 2nd conflict.
msg276937 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2016-09-19 04:39
Good catch!  I agree - and I wrote this code to begin with, so my opinion should count ;-)
msg276943 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-09-19 04:52
I agree and my opinion counts even more because I long ago made this change for setobjects ;-)
msg277088 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2016-09-21 04:18
General comment on the patch: I believe per PEP7, we're still sticking to ANSI C (aka C89), and specifically, "all declarations must be at the top of a block (not necessarily at the top of function".

The patch assumes lax standards compliance (or C99+ compliance), declaring variables in the for loop initializer section and midway through blocks. This should be changed to declare all variables at the top of blocks, and not in for loop initializer sections.
msg277089 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2016-09-21 04:21
Removing those unrelated changes looks like it would dramatically reduce the churn too, making review easier.
msg277105 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2016-09-21 07:31
josh.r:

>  I believe per PEP7, we're still sticking to ANSI C (aka C89), and specifically, "all declarations must be at the top of a block (not necessarily at the top of function".

Python 3.6 branch allows some C99 features.
https://www.python.org/dev/peps/pep-0007/#c-dialect

> Removing those unrelated changes looks like it would dramatically reduce the churn too, making review easier.

While I think refactoring around changes is practical [1], I agree that I made too much change.  I'll reduce diff size.

[1] Changing without refactoring is hard sometimes. Whole file refactoring
    without actual change may cause many conflicts against other patches.
msg277708 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2016-09-29 17:05
Could anyone review the patch?
I'm starting to #28199.  But it must conflict with this patch.
msg278101 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2016-10-05 04:42
Fixed conflict with current 3.6 branch, and added NEWS entry.
msg278170 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2016-10-06 04:33
While I think this patch is safe, I need LGTM from another committer
because I'm new committer.

Could Tim or Raymond review the patch before 3.6beta2?
msg278171 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2016-10-06 05:04
LGTM!  Ship it :-)
msg278172 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-10-06 05:13
Added comments on Rietveld.
msg278175 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-10-06 06:17
LGTM. But why you moved the declaration of perturb?
msg278176 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-10-06 06:23
New changeset cf2778fd7acb by INADA Naoki in branch '3.6':
Issue #28201: Dict reduces possibility of 2nd conflict in hash table.
https://hg.python.org/cpython/rev/cf2778fd7acb

New changeset 80b01cd94a63 by INADA Naoki in branch 'default':
Issue #28201: Dict reduces possibility of 2nd conflict in hash table.
https://hg.python.org/cpython/rev/80b01cd94a63
msg278177 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2016-10-06 06:34
Thank you, Tim and Serhiy.  My first commit has been pushed now!

Serhiy:
Since I prefer putting variable declaration near it's usage, and
PEP 7 permits it since Python 3.6.
History
Date User Action Args
2017-03-31 16:36:39dstufftsetpull_requests: + pull_request1110
2016-10-06 06:38:04inada.naokisetstatus: open -> closed
resolution: fixed
2016-10-06 06:34:23inada.naokisetmessages: + msg278177
2016-10-06 06:23:27python-devsetnosy: + python-dev
messages: + msg278176
2016-10-06 06:17:03serhiy.storchakasetmessages: + msg278175
2016-10-06 05:20:30inada.naokisetfiles: + dict-perturb-shift4.patch
2016-10-06 05:14:40inada.naokisetfiles: + dict-perturb-shift3.patch
2016-10-06 05:13:58serhiy.storchakasetmessages: + msg278172
2016-10-06 05:04:54tim.peterssetmessages: + msg278171
2016-10-06 04:33:03inada.naokisetmessages: + msg278170
2016-10-05 04:42:08inada.naokisetfiles: + dict-perturb-shift2.patch

messages: + msg278101
2016-10-01 08:34:31inada.naokisettype: performance
stage: patch review
2016-09-29 17:05:16inada.naokisetassignee: inada.naoki
messages: + msg277708
2016-09-21 09:22:37inada.naokisetfiles: + dict-perturb-shift.patch
2016-09-21 07:31:01inada.naokisetmessages: + msg277105
2016-09-21 04:21:50josh.rsetmessages: + msg277089
2016-09-21 04:18:59josh.rsetnosy: + josh.r
messages: + msg277088
2016-09-20 17:09:27inada.naokisetfiles: + dict-perturb-shift.patch
keywords: + patch
2016-09-19 04:52:37rhettingersetmessages: + msg276943
2016-09-19 04:39:13tim.peterssetnosy: + tim.peters
messages: + msg276937
2016-09-19 04:32:35serhiy.storchakasetnosy: + rhettinger, serhiy.storchaka
2016-09-19 04:25:13inada.naokicreate