Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resize dict on del/pop #76804

Open
1st1 opened this issue Jan 22, 2018 · 9 comments
Open

Resize dict on del/pop #76804

1st1 opened this issue Jan 22, 2018 · 9 comments
Labels
3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@1st1
Copy link
Member

1st1 commented Jan 22, 2018

BPO 32623
Nosy @gvanrossum, @tim-one, @rhettinger, @methane, @serhiy-storchaka, @1st1, @xgid
PRs
  • bpo-32623: Resize dict after del/pop #5297
  • bpo-32623: Resize dict after del/pop #5300
  • Dependencies
  • bpo-33205: GROWTH_RATE prevents dict shrinking
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2018-01-22.16:37:38.096>
    labels = ['interpreter-core', 'type-feature', '3.8']
    title = 'Resize dict on del/pop'
    updated_at = <Date 2020-08-04.16:02:32.429>
    user = 'https://github.com/1st1'

    bugs.python.org fields:

    activity = <Date 2020-08-04.16:02:32.429>
    actor = 'vstinner'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Interpreter Core']
    creation = <Date 2018-01-22.16:37:38.096>
    creator = 'yselivanov'
    dependencies = ['33205']
    files = []
    hgrepos = []
    issue_num = 32623
    keywords = ['patch']
    message_count = 9.0
    messages = ['310427', '310433', '310548', '310570', '310571', '310604', '310607', '310649', '310654']
    nosy_count = 7.0
    nosy_names = ['gvanrossum', 'tim.peters', 'rhettinger', 'methane', 'serhiy.storchaka', 'yselivanov', 'xgdomingo']
    pr_nums = ['5297', '5300']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue32623'
    versions = ['Python 3.8']

    @1st1
    Copy link
    Member Author

    1st1 commented Jan 22, 2018

    We should investigate whether we want dicts to compact themselves on del/pop operations. Currently we have to rely on workarounds to have compactable dict.copy (see bpo-31179) etc.

    @1st1 1st1 added 3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Jan 22, 2018
    @vstinner
    Copy link
    Member

    Note: It was recently discussed if "del dict[key]" should keep the insertion order. If I understood correctly: yes, the order must be preserved on deletion.

    https://mail.python.org/pipermail/python-dev/2017-November/150142.html

    @methane
    Copy link
    Member

    methane commented Jan 24, 2018

    For insert/pop loop, reduce table size aggressively on pop may cause
    performance regression. So reducing size should be conservative.

    So my opinion is:

    • When dict size become 0, make the dict shared-empty, like dict.clear()
    • When (dict size < dk_size/8), call insertion_resize()

    @vstinner
    Copy link
    Member

    I agree that an heuristic is needed to decide when a dict should be compacted.

    • When (dict size < dk_size/8), call insertion_resize()

    In bpo-31179, I suggested to Yury to use 2/3 ratio... to avoid integer overflow :-) He first used 80%, but I dislike using the FPU in the dictobject.c. I'm not sure of the cost of switching from integers to floats, and more generally I hate rounding issues, so I prefer to use regular integers ;-)

    + (3) if 'mp' is non-compact ('del' operation does not resize dicts),
    + do fast-copy only if it has at most 1/3 non-used keys.
    +
    + The last condition (3) is important to guard against a pathalogical
    + case when a large dict is almost emptied with multiple del/pop
    + operations and copied after that. In cases like this, we defer to
    + PyDict_Merge, which produces a compacted copy.

    By the way, if dict automatically compacts itself automatically, do we still need Yury's test "is the dict compact"?

    @vstinner
    Copy link
    Member

    This issue is a matter of CPU vs memory tradeoff. It reminds me the PEP-393: str type doesn't make tradeoff anymore, CPython guarantee that str always use the most efficient storage (least memory) for characters.

    But here the tradeoff is different. A dict is mutable, whereas str is immutable ;-)

    @methane
    Copy link
    Member

    methane commented Jan 24, 2018

    • When dict size become 0, make the dict shared-empty, like dict.clear()

    This will cause significant performance regression for dict[a]=None; del dict[a] loop.
    del/pop shouldn't do clear().

    • When (dict size < dk_size/8), call insertion_resize()

    This is bad too.
    When ma_used=127 and dk_size=1024, new size will be 1024!
    It's because current GROWTH_RATE is used*2 + size/2.

    This GROWTH_RATE is set in bpo-17563.
    We should understand it before changing anything.

    @vstinner
    Copy link
    Member

    "This will cause significant performance regression for dict[a]=None; del dict[a] loop. del/pop shouldn't do clear()."

    Should we make sure that all dicts have at least MINSIZE entries?

    @methane
    Copy link
    Member

    methane commented Jan 25, 2018

    Should we make sure that all dicts have at least MINSIZE entries?

    I don't think so. I think "allocate on first insert" is good idea for dict.clear()
    And I think it's good idea for new dict too:
    #1080

    @methane
    Copy link
    Member

    methane commented Jan 25, 2018

    I think I understand bpo-17563, and I should fix GROWTH_RATE.

    Before compact-ordered dict, we can avoid resizing in
    "the number of deletions is on a par with the number of insertions."
    scenario, by large GROWTH_RATE.
    That's because new entry can reuse dummy entries.

    But in compact-ordered dict, we can't do that.
    We need resizing always, and resize is much faster than legacy dict.

    I think GROWTH_RATE should be ma_used*3 for now.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants