Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive resizing of dicts when used as a cache #61763

Closed
markshannon opened this issue Mar 27, 2013 · 2 comments
Closed

Excessive resizing of dicts when used as a cache #61763

markshannon opened this issue Mar 27, 2013 · 2 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@markshannon
Copy link
Member

BPO 17563
Nosy @rhettinger, @vstinner, @markshannon, @ericsnowcurrently
Files
  • resize.patch: Patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/rhettinger'
    closed_at = <Date 2013-05-17.10:29:11.087>
    created_at = <Date 2013-03-27.21:33:59.598>
    labels = ['interpreter-core', 'performance']
    title = 'Excessive resizing of dicts when used as a cache'
    updated_at = <Date 2013-05-17.10:29:11.086>
    user = 'https://github.com/markshannon'

    bugs.python.org fields:

    activity = <Date 2013-05-17.10:29:11.086>
    actor = 'rhettinger'
    assignee = 'rhettinger'
    closed = True
    closed_date = <Date 2013-05-17.10:29:11.087>
    closer = 'rhettinger'
    components = ['Interpreter Core']
    creation = <Date 2013-03-27.21:33:59.598>
    creator = 'Mark.Shannon'
    dependencies = []
    files = ['29598']
    hgrepos = []
    issue_num = 17563
    keywords = ['patch']
    message_count = 2.0
    messages = ['185378', '189443']
    nosy_count = 5.0
    nosy_names = ['rhettinger', 'vstinner', 'Mark.Shannon', 'python-dev', 'eric.snow']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue17563'
    versions = ['Python 3.3']

    @markshannon
    Copy link
    Member Author

    If a dict is used a cache, e.g. in functools.lru_cache,
    the reduced resize factor in 3.3 can cause excessive resizing.
    This can lead to a significant performance regression.

    When the the number of deletions and insertions is roughly in balance
    the reduced head room in the dict (compare to 3.2) causes a large increase in the number of resizes.

    The reason for this above-linear increase is that with fewer dummy keys, the chance of a dummy being overwritten is reduced *and* is there is less overhead as well.
    A dictionary with 128 items will have a capacity of 256 and only 43 dummy keys. If it had a capacity of 512 (as it would have done in 3.2) then it will have 214 keys, making a resize at least 10 times less frequent.

    Changing the growth function from round_up_to_power_2(used*2) to
    round_up_to_power_2(used*2+capacity/2) the desirable property of only doubling in size when growing can be preserved, yet ensuring sufficient overhead when used as a cache.

    Consider a dict which grows to n items and then remains that size, with frequent deletions and insertions, using the proposed growth function:

    Items Capacity Steady state Capacity
    on reaching n capacity under 3.2
    2 8 8 8
    4 8 16 16
    6 16 32 32
    8 16 32 32
    10 16 64 64
    12 32 64 64
    15 32 64 64
    20 32 128 128
    30 64 128 128
    50 128 256 256
    80 128 512 512
    128 256 512 512

    Thanks to Raymond Hettinger for bringing this to my attention.

    Patch attached.

    @markshannon markshannon added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Mar 27, 2013
    @rhettinger rhettinger self-assigned this Mar 28, 2013
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 17, 2013

    New changeset cd2457463eeb by Raymond Hettinger in branch '3.3':
    Issue bpo-17563: Fix dict resize performance regression.
    http://hg.python.org/cpython/rev/cd2457463eeb

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants