New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GROWTH_RATE prevents dict shrinking #77386
Comments
GROWTH_RATE is changed from (used*2) to (used*2 + dk_size/2) in bpo-17563, at Python 3.4. From Python 3.6, there are no DUMMY entry. While there are dummy keys, we resize (repack) when entries are full. (Of course, there are still benefit from slow repacking rate for new dict. This GROWTH_RATE prevents dict is shrinked in insertion_resize(). For example, consider this dict: >>> d = dict.fromkeys(range(10900))
>>> len(d)
10900
>>> sys.getsizeof(d)
295008
>>> for i in range(10900):
... del d[i]
...
>>> len(d)
0
>>> sys.getsizeof(d)
295008
Current dk_size is 16384 and entries length is dk_size * 2 // 3 = 10922. New dk_size is round_up_to_power_of_two(922 + 16384/2) = 16384. >>> for i in range(923):
... d[i] = i
...
>>> sys.getsizeof(d)
295008 round_up_to_power_of_two(used + dk_size/2) means dict is shrinked only when used == 0. I propose changing GROWTH_RATE again to In case of dict growing without deletion, dk_size is doubled for each resize as current behavior. >>> import sys
>>> d = dict.fromkeys(range(10900))
>>> sys.getsizeof(d)
295008
>>> for i in range(10900):
... del d[i]
...
>>> len(d)
0
>>> sys.getsizeof(d)
295008
>>> for i in range(923):
... d[i] = i
...
>>> sys.getsizeof(d)
36968 I want to backport this change to Python 3.7. While it's beta3, "dict can't be shrinked unless empty" behavior looks resource usage bug to me. |
# cap2.json is master (used*2 + dk_size/2)
Faster (1):
Benchmark hidden because not significant (5): rand_access(size=2), rand_access(size=5), rand_access(size=100), rand_access(size=200), rand_access(size=500) |
@Mark.Shannon, @rhettinger How do you think this? |
The capacity of the dict is 2/3 of its hashtable size: dk_usable < 2/3 * dk_size. Currently the dict grows if dk_usable > 1/4 * dk_size, and preserves the size if dk_usable < 1/4 * dk_size. Note that it it can grow twice if dk_usable > 1/2 * dk_size. With the proposed change the dict will grow only if dk_usable > 1/3 * dk_size, preserve the size if 1/6 * dk_size < dk_usable < 1/3 * dk_size, and shrink if dk_usable < 1/6 * dk_size. After growing once it will no need to grow again until the number of item be increased. This LGTM. |
Should this have been "filled*3" rather than "used*3"? The intent was to give a larger resize to dict that had a lot of dummy entries and a smaller resize to dicts without deletions. |
We do not have filled since Python 3.6. I chose
For example, when current dk_size == 16 and USABLE_FRACTION(dk_size) == 10, new dk_size is:
As you can see, dict is more sparse when there is dummy than when there is no dummy, except used=5/dummy=5 case. There may be a small room for improvement, especially for |
used*3
(GH-6350) #6503Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: