Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GROWTH_RATE prevents dict shrinking #77386

Closed
methane opened this issue Apr 2, 2018 · 8 comments
Closed

GROWTH_RATE prevents dict shrinking #77386

methane opened this issue Apr 2, 2018 · 8 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@methane
Copy link
Member

methane commented Apr 2, 2018

BPO 33205
Nosy @rhettinger, @methane, @markshannon, @ericsnowcurrently, @serhiy-storchaka, @miss-islington, @brandtbucher
PRs
  • bpo-33205: dict: Change GROWTH_RATE to (used*3) #6350
  • [3.7] bpo-33205: dict: Change GROWTH_RATE to used*3 (GH-6350) #6503
  • Files
  • dict_rand.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2018-04-02.11:55:50.970>
    labels = ['interpreter-core', '3.7', '3.8', 'performance']
    title = 'GROWTH_RATE prevents dict shrinking'
    updated_at = <Date 2022-01-26.18:08:32.273>
    user = 'https://github.com/methane'

    bugs.python.org fields:

    activity = <Date 2022-01-26.18:08:32.273>
    actor = 'brandtbucher'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Interpreter Core']
    creation = <Date 2018-04-02.11:55:50.970>
    creator = 'methane'
    dependencies = []
    files = ['47512']
    hgrepos = []
    issue_num = 33205
    keywords = ['patch']
    message_count = 8.0
    messages = ['314806', '314807', '314977', '315351', '315380', '315409', '411534', '411716']
    nosy_count = 8.0
    nosy_names = ['rhettinger', 'methane', 'Yury.Selivanov', 'Mark.Shannon', 'eric.snow', 'serhiy.storchaka', 'miss-islington', 'brandtbucher']
    pr_nums = ['6350', '6503']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'open'
    superseder = None
    type = 'resource usage'
    url = 'https://bugs.python.org/issue33205'
    versions = ['Python 3.7', 'Python 3.8']

    @methane
    Copy link
    Member Author

    methane commented Apr 2, 2018

    GROWTH_RATE is changed from (used*2) to (used*2 + dk_size/2) in bpo-17563, at Python 3.4.
    It was for avoid resizing dict for massive del/insert use case, by increasing possibility of overwriting DUMMY entry.

    From Python 3.6, there are no DUMMY entry. While there are dummy keys, we resize (repack) when entries are full.
    So there are no benefit from "possibility of overwriting dummy entry".

    (Of course, there are still benefit from slow repacking rate for new dict.
    So I don't propose to change it back to (used*2), but (used*3).)

    This GROWTH_RATE prevents dict is shrinked in insertion_resize().

    For example, consider this dict:

    >>> d = dict.fromkeys(range(10900))
    >>> len(d)
    10900
    >>> sys.getsizeof(d)
    295008
    >>> for i in range(10900):
    ...     del d[i]
    ...
    >>> len(d)
    0
    >>> sys.getsizeof(d)
    295008

    del d[i] doesn't shrink the dict. This is another issue (bpo-32623).

    Current dk_size is 16384 and entries length is dk_size * 2 // 3 = 10922.
    So dictresize will called when next 923 entries are added.

    New dk_size is round_up_to_power_of_two(922 + 16384/2) = 16384.
    So dict is not shrinked!

    >>> for i in range(923):
    ...     d[i] = i
    ...
    >>> sys.getsizeof(d)
    295008

    round_up_to_power_of_two(used + dk_size/2) means dict is shrinked only when used == 0.

    I propose changing GROWTH_RATE again to used*3 from used*2 + dk_size/2

    In case of dict growing without deletion, dk_size is doubled for each resize as current behavior.
    When there are deletion, dk_size is growing aggressively than Python 3.3 (used*2 -> used*3). And it allows dict shrinking after massive deletions.

    >>> import sys
    >>> d = dict.fromkeys(range(10900))
    >>> sys.getsizeof(d)
    295008
    >>> for i in range(10900):
    ...     del d[i]
    ...
    >>> len(d)
    0
    >>> sys.getsizeof(d)
    295008
    >>> for i in range(923):
    ...     d[i] = i
    ...
    >>> sys.getsizeof(d)
    36968

    I want to backport this change to Python 3.7. While it's beta3, "dict can't be shrinked unless empty" behavior looks resource usage bug to me.

    @methane methane added 3.7 (EOL) end of life 3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Apr 2, 2018
    @methane
    Copy link
    Member Author

    methane commented Apr 2, 2018

    # cap2.json is master (used*2 + dk_size/2)
    # used3.json is patched (used*3)
    $ ./python-master -m perf compare_to cap2.json used3.json -G
    Slower (2):

    • rand_access(size=20): 2.67 ms +- 0.01 ms -> 2.80 ms +- 0.04 ms: 1.05x slower (+5%)
    • rand_access(size=10): 2.70 ms +- 0.03 ms -> 2.80 ms +- 0.04 ms: 1.04x slower (+4%)

    Faster (1):

    • rand_access(size=50): 2.76 ms +- 0.06 ms -> 2.74 ms +- 0.02 ms: 1.01x faster (-1%)

    Benchmark hidden because not significant (5): rand_access(size=2), rand_access(size=5), rand_access(size=100), rand_access(size=200), rand_access(size=500)

    @methane
    Copy link
    Member Author

    methane commented Apr 5, 2018

    @Mark.Shannon, @rhettinger How do you think this?

    @serhiy-storchaka
    Copy link
    Member

    The capacity of the dict is 2/3 of its hashtable size: dk_usable < 2/3 * dk_size.

    Currently the dict grows if dk_usable > 1/4 * dk_size, and preserves the size if dk_usable < 1/4 * dk_size. Note that it it can grow twice if dk_usable > 1/2 * dk_size.

    With the proposed change the dict will grow only if dk_usable > 1/3 * dk_size, preserve the size if 1/6 * dk_size < dk_usable < 1/3 * dk_size, and shrink if dk_usable < 1/6 * dk_size. After growing once it will no need to grow again until the number of item be increased.

    This LGTM.

    @methane
    Copy link
    Member Author

    methane commented Apr 17, 2018

    New changeset 5fbc511 by INADA Naoki in branch 'master':
    bpo-33205: dict: Change GROWTH_RATE to used*3 (GH-6350)
    5fbc511

    @miss-islington
    Copy link
    Contributor

    New changeset 902bb62 by Miss Islington (bot) in branch '3.7':
    bpo-33205: dict: Change GROWTH_RATE to used*3 (GH-6350)
    902bb62

    @methane methane closed this as completed Apr 17, 2018
    @rhettinger
    Copy link
    Contributor

    Should this have been "filled*3" rather than "used*3"?

    The intent was to give a larger resize to dict that had a lot of dummy entries and a smaller resize to dicts without deletions.

    @rhettinger rhettinger reopened this Jan 26, 2022
    @methane
    Copy link
    Member Author

    methane commented Jan 26, 2022

    We do not have filled since Python 3.6.
    There is a dk_nentries instead. But when insertion_resize() is called, dk_nentries is equal to USABLE_FRACTION(dk_size) (dk_size is 1 << dk_log2_size for now). So it is different from filled in the old dict.

    I chose dk_used*3 as GROWTH_RATE because it reserves more spaces when there are dummies than when there is no dummy, as I described in the same comment:

    In case of dict growing without deletion, dk_size is doubled for each resize as current behavior.
    When there are deletion, dk_size is growing aggressively than Python 3.3 (used*2 -> used*3). And it allows dict shrinking after massive deletions.

    For example, when current dk_size == 16 and USABLE_FRACTION(dk_size) == 10, new dk_size is:

    • used = 10 (dummy=0) -> 32 (31.25%)
    • used = 9 (dummy=1) -> 32 (28.125%)
      (snip)
    • used = 6 (dummy=4) -> 32 (18.75%)
    • used = 5 (dummy=5) -> 16 (31.25%)
    • used = 4 (dummy=6) -> 16 (25%)
      (snip)
    • used = 2 (dummy=8) -> 8 (25%)

    As you can see, dict is more sparse when there is dummy than when there is no dummy, except used=5/dummy=5 case.

    There may be a small room for improvement, especially for used=5/dummy=5 case. But I am not sure it is worth enough to use more complex GROWTH_RATE than used*3.
    Any good idea?

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @methane methane closed this as completed May 6, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants