classification
Title: Add a version number to dict keys.
Type: performance Stage: resolved
Components: Versions:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Mark.Shannon Nosy List: Mark.Shannon, brandtbucher, erlendaasland, methane, pablogsal, vstinner
Priority: release blocker Keywords: patch

Created on 2021-05-21 15:38 by Mark.Shannon, last changed 2021-08-02 13:54 by pablogsal. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 26333 merged Mark.Shannon, 2021-05-24 14:45
PR 26432 merged Mark.Shannon, 2021-05-28 14:19
PR 26440 merged pablogsal, 2021-05-28 23:01
PR 27542 merged Mark.Shannon, 2021-08-02 09:55
Messages (18)
msg394120 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2021-05-21 15:38
Add a version number to dict keys.

PEP 509 added a version number to dicts. Unfortunately this is no use for optimizing attribute loads and store on instances.
We need to know whether the keys are as expected, not the dict as that is likely to be different each time.

We can add a 32 bit version number and actually reduce memory use by taking advantage of the redundancy in the rest of the keys object.
msg394121 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2021-05-21 15:42
The memory saving comes from converting:

    Py_ssize_t dk_size;
    dict_lookup_func dk_lookup;

to:

   uint8_t dk_log2_size;
   uint8_t dk_loopup_kind; /* Only 3 possible values */
   uint32_t dk_version;

which saves 8 bytes on a 64 bit machine (no saving on a 32 bit machine).
msg394187 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-05-22 19:06
> Add a version number to dict keys.

Is the version on the key or on the value? Does the version change if a value is changed like d[1]="a"; d[1]="b"?
msg394252 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2021-05-24 14:39
The version is on the dict keys. It changes if the dict keys object changes in a meaningful way.

So, no it doesn't change if a value is changed.
Nor would it change if the keys were shared and a new key-value pair is added to a dict, but that key was already in the shared keys.
msg394253 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-05-24 14:51
I don't understand how such "key version" would be useful. Can you please elaborate?
msg394255 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2021-05-24 15:35
http://theses.gla.ac.uk/2975/1/2011shannonphd.pdf page 128.

It means we don't need to cache a pointer to the keys, just the version number.
The version number is half the size (for 64 bit machines) and using it means that we don't leak keys.
msg394278 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-05-25 00:06
Can't you please explain which kind of optimizations do you want to implement using such dict key version?
msg394301 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2021-05-25 08:09
For class dict, there is a method cache.
For instance dict, there is a LOAD_ATTR optimization.
https://github.com/python/cpython/blob/b11a951f16f0603d98de24fee5c023df83ea552c/Python/ceval.c#L3458

So I want to see the performance gain of the optimization using dk_version before adding dk_version.

Additionally, I am not sure why 32bit is enough for dk_version. It is another reason I want to see the optimization before adding dk_version.
msg394326 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2021-05-25 11:13
Which optimizations?

LOAD_GLOBAL:
Using a keys version instead of a whole dict version means that LOAD_GLOBAL won't leak references. It also means that we can (in the future) remove the PEP 509 version and save 8 bytes per dict.

LOAD_ATTR:
_PyDict_GetItemHint() still has to do quite a lot of work compared to a version check.
The hint approach can't quickly tell us whether a name is not in a dictionary, which is needed for optimizing non-descriptor class attributes.

LOAD_METHOD:
Because functions are non-overriding descriptors we need to quickly check that the instance does not have an attribute shadowing the method.


Why is 32 bits enough?

Because the version is reset to zero, whenever the dict keys changes, and only set to non-zero when we explicitly ask for it when optimizing. 4 billion optimization events is a lot.
It can't overflow, it just becomes useless when we reach UINT_MAX.
Using 64 bits would just waste memory.


Overall, versioning the dictionary's keys is more useful and more compact than versioning the dictionary as a whole.
msg394644 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2021-05-28 08:54
New changeset f8a95df84bcedebc0aa7132b3d1a4e8f000914bc by Mark Shannon in branch 'main':
bpo-44206: Add a version number to dictionary keys (GH-26333)
https://github.com/python/cpython/commit/f8a95df84bcedebc0aa7132b3d1a4e8f000914bc
msg394702 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-05-28 23:14
New changeset 1a672a5908736e44a8a25a99b3a116b085f12aa8 by Pablo Galindo in branch 'main':
bpo-44206: Fix compiler warnings in dictobject.c (GH-26440)
https://github.com/python/cpython/commit/1a672a5908736e44a8a25a99b3a116b085f12aa8
msg394783 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2021-05-31 00:28
Can we close this issue?
msg398677 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-08-01 02:54
Reopening this and marking as a release blocker. Seems that a buildbot has found some cases where an assert fails:

https://buildbot.python.org/all/#/builders/562/builds/110/steps/5/logs/stdio

..python: Objects/dictobject.c:350: dictkeys_set_index: Assertion `keys->dk_version == 0' failed.
Fatal Python error: Aborted
Current thread 0x00007f7dba340740 (most recent call first):
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-x86_64.refleak/build/Lib/test/test_importlib/test_api.py", line 290 in test_reload_location_changed
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-x86_64.refleak/build/Lib/unittest/case.py", line 549 in _callTestMethod
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-x86_64.refleak/build/Lib/unittest/case.py", line 592 in run
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-x86_64.refleak/build/Lib/unittest/case.py", line 652 in __call__
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-x86_64.refleak/build/Lib/unittest/suite.py", line 122 in run
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-x86_64.refleak/build/Lib/unittest/suite.py", line 84 in __call__
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-x86_64.refleak/build/Lib/unittest/suite.py", line 122 in run
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-x86_64.refleak/build/Lib/unittest/suite.py", line 84 in __call__
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-x86_64.refleak/build/Lib/unittest/suite.py", line 122 in run
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-x86_64.refleak/build/Lib/unittest/suite.py", line 84 in __call__
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-x86_64.refleak/build/Lib/unittest/suite.py", line 122 in run
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-x86_64.refleak/build/Lib/unittest/suite.py", line 84 in __call__
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-x86_64.refleak/build/Lib/unittest/suite.py", line 122 in run
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-x86_64.refleak/build/Lib/unittest/suite.py", line 84 in __call__
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-x86_64.refleak/build/Lib/unittest/runner.py", line 176 in run
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-x86_64.refleak/build/Lib/test/support/__init__.py", line 979 in _run_suite
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-x86_64.refleak/build/Lib/test/support/__init__.py", line 1104 in run_unittest
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-x86_64.refleak/build/Lib/test/libregrtest/runtest.py", line 261 in _test_module
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-x86_64.refleak/build/Lib/test/libregrtest/refleak.py", line 90 in dash_R
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-x86_64.refleak/build/Lib/test/libregrtest/runtest.py", line 295 in _runtest_inner2
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-x86_64.refleak/build/Lib/test/libregrtest/runtest.py", line 335 in _runtest_inner
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-x86_64.refleak/build/Lib/test/libregrtest/runtest.py", line 215 in _runtest
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-x86_64.refleak/build/Lib/test/libregrtest/runtest.py", line 245 in runtest
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-x86_64.refleak/build/Lib/test/libregrtest/main.py", line 334 in rerun_failed_tests
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-x86_64.refleak/build/Lib/test/libregrtest/main.py", line 712 in _main
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-x86_64.refleak/build/Lib/test/libregrtest/main.py", line 655 in main
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-x86_64.refleak/build/Lib/test/libregrtest/main.py", line 733 in main
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-x86_64.refleak/build/Lib/test/__main__.py", line 2 in <module>
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-x86_64.refleak/build/Lib/runpy.py", line 86 in _run_code
  File "/home/buildbot/buildarea/3.x.cstratak-RHEL7-x86_64.refleak/build/Lib/runpy.py", line 196 in _run_module_as_main
msg398735 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2021-08-02 09:17
Pablo,

There is another failure on that buildbot: test_inspect fails with
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mark/repos/cpython/Lib/inspect.py", line 1154, in walktree
    classes.sort(key=attrgetter('__module__', '__name__'))
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: '<' not supported between instances of 'str' and 'module'


I can reproduce that failure with a debug build and either of
./python -m test -R 3:3  test_inspect
./python -m test test_inspect -F

Known issue or should I file a new issue? This issue is also present in 3.10.
msg398740 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-08-02 09:45
> Known issue or should I file a new issue? This issue is also present in 3.10.

Can you open a different issue, please?
msg398746 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-08-02 09:56
Opened https://bugs.python.org/issue44808
msg398770 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-08-02 13:54
New changeset e06ae75e16dbf46b6626b6a41b62391ea1fdb319 by Mark Shannon in branch 'main':
bpo-44206: Make sure that dict-keys's version is set to zero when value is popped (GH-27542)
https://github.com/python/cpython/commit/e06ae75e16dbf46b6626b6a41b62391ea1fdb319
msg398771 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-08-02 13:54
Closing for now unless we see the error again
History
Date User Action Args
2021-08-02 13:54:55pablogsalsetstatus: open -> closed
resolution: fixed
messages: + msg398771
2021-08-02 13:54:40pablogsalsetmessages: + msg398770
2021-08-02 09:56:11pablogsalsetmessages: + msg398746
stage: patch review -> resolved
2021-08-02 09:55:33Mark.Shannonsetstage: resolved -> patch review
pull_requests: + pull_request26051
2021-08-02 09:45:55pablogsalsetmessages: + msg398740
2021-08-02 09:17:16Mark.Shannonsetmessages: + msg398735
2021-08-01 02:54:30pablogsalsetpriority: normal -> release blocker
status: closed -> open
resolution: fixed -> (no value)
messages: + msg398677
2021-06-09 13:29:00Mark.Shannonsetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2021-05-31 00:28:30methanesetmessages: + msg394783
2021-05-28 23:14:40pablogsalsetmessages: + msg394702
2021-05-28 23:01:06pablogsalsetnosy: + pablogsal
pull_requests: + pull_request25032
2021-05-28 14:19:01Mark.Shannonsetpull_requests: + pull_request25027
2021-05-28 08:54:25Mark.Shannonsetmessages: + msg394644
2021-05-25 11:13:20Mark.Shannonsetmessages: + msg394326
2021-05-25 08:09:51methanesetmessages: + msg394301
2021-05-25 00:06:01vstinnersetmessages: + msg394278
2021-05-24 15:53:20erlendaaslandsetnosy: + erlendaasland
2021-05-24 15:35:11Mark.Shannonsetmessages: + msg394255
2021-05-24 14:51:15vstinnersetmessages: + msg394253
2021-05-24 14:45:32Mark.Shannonsetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request24925
2021-05-24 14:39:54Mark.Shannonsetmessages: + msg394252
2021-05-22 19:06:04vstinnersetmessages: + msg394187
2021-05-21 21:50:16brandtbuchersetnosy: + brandtbucher
2021-05-21 15:42:19Mark.Shannonsetmessages: + msg394121
2021-05-21 15:38:13Mark.Shannoncreate