This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Improve dict iteration
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: cheryl.sabella, methane, rhettinger, serhiy.storchaka
Priority: low Keywords: patch

Created on 2017-01-08 09:50 by rhettinger, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
dict_shorter_iteration.diff rhettinger, 2017-01-08 09:50 First few cases
dict_shorter_iteration.diff serhiy.storchaka, 2017-01-08 12:38 Regenerated for review review
Pull Requests
URL Status Linked Edit
PR 11900 merged cheryl.sabella, 2019-02-16 21:27
Messages (11)
msg284970 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-01-08 09:50
I haven't had much of a chance to comb through all the code it detail, but there are least a few places that loop over the whole entry table (size is dk_nentries) when fewer iterations would suffice (stop once mp->ma_used entries have been seen).  Since the table is usually dense packed to the left, this will give fewer loops and fewer memory accesses.
msg284974 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-08 10:40
I'm wondering why new dict implementation don't keep the array of items compact? Original Raymond's implementation did not preserve the order after deletions, but saved items in an array without gaps. This could simplify and speed up an iteration (no need to check values for NULL, and needed to iterate fewer elements), and could get rid of reallocations in often mutated dicts. I haven't found clear explanation of this.
msg284988 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2017-01-08 12:24
> I'm wondering why new dict implementation don't keep the array of items compact? Original Raymond's implementation did not preserve the order after deletions, but saved items in an array without gaps. This could simplify and speed up an iteration (no need to check values for NULL, and needed to iterate fewer elements), and could get rid of reallocations in often mutated dicts. I haven't found clear explanation of this.

As far as I remember, I choose it because:

1. easy to explain, easy to discussion.

"keep insertion order" is easier than "keep insertion order unless deletion".

I want to start discussion based on most simple behavior.
But my patch was reviewed by core developers in sprint, when right before 3.6b1.

2. I want to share same implementation with OrderedDict, like PyPy.

3. Make patch compact.

If we compact dk_entries when deletion, we need another dk_indices compaction.
Before sprint, my patch wasn't reviewed for months.  I was afraid that my patch wasn't reviewed by 3.6b1.


For now, "namespace dict keeps insertion order" is language spec.
`del` shouldn't break insertion order.
msg284989 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2017-01-08 12:31
patch LGTM.

> Since the table is usually dense packed to the left, this will give fewer loops and fewer memory accesses.

Since dk_nentries is "right end of active entries", this patch doesn't
affect for "dence packed to the left" dict.
But it can affect when dk_entries is sparce, and there are many deleted entries on the right end.
msg284990 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2017-01-08 12:45
While this isn't bugfix, I'm +1 to commit this to 3.6 branch to reduce
future maintenance cost.
msg284991 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-01-08 12:55
In general the idea LGTM. It it is slightly dangerous if dict structure is broken. j is incremented only when value != NULL, and if ma_used is not correct, this can cause reading out of the end of an array. Of course this should never happen. But if there is a chance to add some assertions or additional checks in debug mode, it would be nice. Unless this complicate the code to much.

The i counter is not used in some loops. It can be eliminated.
msg304824 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-23 17:06
Do you mind to create a pull request Raymond?
msg304855 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-10-24 03:20
This doesn't meet our criteria for backports.
msg335713 - (view) Author: Cheryl Sabella (cheryl.sabella) * (Python committer) Date: 2019-02-16 20:13
I can convert this to a pull request if there is still interest in the change.
msg335716 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-02-16 20:46
> I can convert this to a pull request if there is still interest in the change.

Yes, please do.
msg339488 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2019-04-05 10:08
New changeset f66e336f455b5a6bb0ca857d61c43be410d0df13 by Inada Naoki (Cheryl Sabella) in branch 'master':
bpo-29202: improve dict iteration (GH-11900)
https://github.com/python/cpython/commit/f66e336f455b5a6bb0ca857d61c43be410d0df13
History
Date User Action Args
2022-04-11 14:58:41adminsetgithub: 73388
2019-04-05 10:09:00methanesetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2019-04-05 10:08:52methanesetmessages: + msg339488
2019-02-16 21:27:18cheryl.sabellasetpull_requests: + pull_request11927
2019-02-16 20:46:14rhettingersetmessages: + msg335716
2019-02-16 20:13:01cheryl.sabellasetnosy: + cheryl.sabella

messages: + msg335713
versions: + Python 3.8
2017-10-24 03:20:17rhettingersetassignee: rhettinger -> serhiy.storchaka
messages: + msg304855
versions: - Python 3.6
2017-10-23 17:06:34serhiy.storchakasetmessages: + msg304824
2017-01-08 12:55:55serhiy.storchakasetassignee: serhiy.storchaka -> rhettinger
messages: + msg284991
stage: patch review
2017-01-08 12:45:02methanesetmessages: + msg284990
2017-01-08 12:38:41serhiy.storchakasetfiles: + dict_shorter_iteration.diff
2017-01-08 12:31:45methanesetmessages: + msg284989
2017-01-08 12:24:20methanesetmessages: + msg284988
2017-01-08 10:40:07serhiy.storchakasetmessages: + msg284974
2017-01-08 09:50:48rhettingersetpriority: normal -> low
2017-01-08 09:50:36rhettingercreate