This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Resize dict on del/pop
Type: enhancement Stage: patch review
Components: Interpreter Core Versions: Python 3.8
process
Status: open Resolution:
Dependencies: 33205 Superseder:
Assigned To: Nosy List: gvanrossum, methane, rhettinger, serhiy.storchaka, tim.peters, xgdomingo, yselivanov
Priority: normal Keywords: patch

Created on 2018-01-22 16:37 by yselivanov, last changed 2022-04-11 14:58 by admin.

Pull Requests
URL Status Linked Edit
PR 5297 closed methane, 2018-01-24 12:57
PR 5300 closed methane, 2018-01-24 15:07
Messages (9)
msg310427 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-01-22 16:37
We should investigate whether we want dicts to compact themselves on del/pop operations.  Currently we have to rely on workarounds to have compactable dict.copy (see issue 31179) etc.
msg310433 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-01-22 16:57
Note: It was recently discussed if "del dict[key]" should keep the insertion order. If I understood correctly: yes, the order must be preserved on deletion.

https://mail.python.org/pipermail/python-dev/2017-November/150142.html
msg310548 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2018-01-24 01:53
For insert/pop loop, reduce table size aggressively on pop may cause
performance regression.  So reducing size should be conservative.

So my opinion is:

* When dict size become 0, make the dict shared-empty, like dict.clear()
* When (dict size < dk_size/8), call insertion_resize()
msg310570 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-01-24 09:29
I agree that an heuristic is needed to decide when a dict should be compacted.

> * When (dict size < dk_size/8), call insertion_resize()

In bpo-31179, I suggested to Yury to use 2/3 ratio... to avoid integer overflow :-) He first used 80%, but I dislike using the FPU in the dictobject.c. I'm not sure of the cost of switching from integers to floats, and more generally I hate rounding issues, so I prefer to use regular integers ;-)

+           (3) if 'mp' is non-compact ('del' operation does not resize dicts),
+               do fast-copy only if it has at most 1/3 non-used keys.
+
+           The last condition (3) is important to guard against a pathalogical
+           case when a large dict is almost emptied with multiple del/pop
+           operations and copied after that.  In cases like this, we defer to
+           PyDict_Merge, which produces a compacted copy.

By the way, if dict automatically compacts itself automatically, do we still need Yury's test "is the dict compact"?
msg310571 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-01-24 09:31
This issue is a matter of CPU vs memory tradeoff. It reminds me the PEP 393: str type doesn't make tradeoff anymore, CPython guarantee that str always use the most efficient storage (least memory) for characters.

But here the tradeoff is different. A dict is mutable, whereas str is immutable ;-)
msg310604 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2018-01-24 16:02
> * When dict size become 0, make the dict shared-empty, like dict.clear()

This will cause significant performance regression for `dict[a]=None; del dict[a]` loop.
del/pop shouldn't do clear().

> * When (dict size < dk_size/8), call insertion_resize()

This is bad too.
When ma_used=127 and dk_size=1024, new size will be 1024!
It's because current GROWTH_RATE is used*2 + size/2.

This GROWTH_RATE is set in issue17563.
We should understand it before changing anything.
msg310607 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-01-24 16:20
"This will cause significant performance regression for `dict[a]=None;
del dict[a]` loop. del/pop shouldn't do clear()."

Should we make sure that all dicts have at least MINSIZE entries?
msg310649 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2018-01-25 00:46
> Should we make sure that all dicts have at least MINSIZE entries?

I don't think so.  I think "allocate on first insert" is good idea for dict.clear()
And I think it's good idea for new dict too:
https://github.com/python/cpython/pull/1080
msg310654 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2018-01-25 02:07
I think I understand #17563, and I should fix GROWTH_RATE.

Before compact-ordered dict, we can avoid resizing in
"the number of deletions is on a par with the number of insertions."
scenario, by large GROWTH_RATE.
That's because new entry can reuse dummy entries.

But in compact-ordered dict, we can't do that.
We need resizing always, and resize is much faster than legacy dict.

I think GROWTH_RATE should be ma_used*3 for now.
History
Date User Action Args
2022-04-11 14:58:56adminsetgithub: 76804
2020-08-04 16:02:32vstinnersetnosy: - vstinner
2020-08-04 04:34:20rhettingersetnosy: + tim.peters, rhettinger
2020-08-04 03:34:42gvanrossumsetnosy: + gvanrossum
2018-04-02 11:56:14methanesetdependencies: + GROWTH_RATE prevents dict shrinking
2018-01-31 04:23:23xgdomingosetnosy: + xgdomingo
2018-01-25 02:07:15methanesetmessages: + msg310654
2018-01-25 00:46:17methanesetmessages: + msg310649
2018-01-24 16:20:32vstinnersetmessages: + msg310607
2018-01-24 16:02:09methanesetmessages: + msg310604
2018-01-24 15:07:04methanesetpull_requests: + pull_request5147
2018-01-24 12:57:18methanesetkeywords: + patch
stage: patch review
pull_requests: + pull_request5144
2018-01-24 09:31:37vstinnersetmessages: + msg310571
2018-01-24 09:29:40vstinnersetmessages: + msg310570
2018-01-24 01:53:52methanesetmessages: + msg310548
2018-01-22 16:57:06vstinnersetmessages: + msg310433
2018-01-22 16:37:38yselivanovcreate