Author inada.naoki
Recipients Mark.Shannon, Yury.Selivanov, eric.snow, inada.naoki, rhettinger, vstinner
Date 2018-04-02.11:55:50
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1522670151.04.0.467229070634.issue33205@psf.upfronthosting.co.za>
In-reply-to
Content
GROWTH_RATE is changed from (used*2) to (used*2 + dk_size/2) in #17563, at Python 3.4.
It was for avoid resizing dict for massive del/insert use case, by increasing possibility of overwriting DUMMY entry.

From Python 3.6, there are no DUMMY entry.  While there are dummy keys, we resize (repack) when entries are full.
So there are no benefit from "possibility of overwriting dummy entry".

(Of course, there are still benefit from slow repacking rate for new dict.
So I don't propose to change it back to (used*2), but (used*3).)


This GROWTH_RATE prevents dict is shrinked in insertion_resize().

For example, consider this dict:

>>> d = dict.fromkeys(range(10900))
>>> len(d)
10900
>>> sys.getsizeof(d)
295008
>>> for i in range(10900):
...     del d[i]
...
>>> len(d)
0
>>> sys.getsizeof(d)
295008

`del d[i]` doesn't shrink the dict.  This is another issue (#32623).

Current dk_size is 16384 and entries length is dk_size * 2 // 3 = 10922.
So dictresize will called when next 923 entries are added.

New dk_size is round_up_to_power_of_two(922 + 16384/2) = 16384.
So dict is not shrinked!

>>> for i in range(923):
...     d[i] = i
...
>>> sys.getsizeof(d)
295008

round_up_to_power_of_two(used + dk_size/2) means dict is shrinked only when used == 0.

I propose changing GROWTH_RATE again to `used*3` from `used*2 + dk_size/2`

In case of dict growing without deletion, dk_size is doubled for each resize as current behavior.
When there are deletion, dk_size is growing aggressively than Python 3.3 (used*2 -> used*3).  And it allows dict shrinking after massive deletions.

>>> import sys
>>> d = dict.fromkeys(range(10900))
>>> sys.getsizeof(d)
295008
>>> for i in range(10900):
...     del d[i]
...
>>> len(d)
0
>>> sys.getsizeof(d)
295008
>>> for i in range(923):
...     d[i] = i
...
>>> sys.getsizeof(d)
36968


I want to backport this change to Python 3.7.  While it's beta3, "dict can't be shrinked unless empty" behavior looks resource usage bug to me.
History
Date User Action Args
2018-04-02 11:55:51inada.naokisetrecipients: + inada.naoki, rhettinger, vstinner, Yury.Selivanov, Mark.Shannon, eric.snow
2018-04-02 11:55:51inada.naokisetmessageid: <1522670151.04.0.467229070634.issue33205@psf.upfronthosting.co.za>
2018-04-02 11:55:50inada.naokilinkissue33205 messages
2018-04-02 11:55:50inada.naokicreate