Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve list() pre-sizing for inputs with known lengths #77415

Closed
rhettinger opened this issue Apr 5, 2018 · 14 comments
Closed

Improve list() pre-sizing for inputs with known lengths #77415

rhettinger opened this issue Apr 5, 2018 · 14 comments
Labels
3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@rhettinger
Copy link
Contributor

BPO 33234
Nosy @tim-one, @rhettinger, @vstinner, @ericsnowcurrently, @serhiy-storchaka, @pablogsal, @miss-islington
PRs
  • bpo-33234 Improve list() pre-sizing for inputs with known lengths #6493
  • bpo-33234 Improve list() pre-sizing for inputs with known lengths (no __length_hint__)  #9846
  • bpo-33234: Add exact allocation optimization to lists in What's New #10200
  • bpo-33234: Simplify list_preallocate_exact() #11220
  • bpo-33234: Add another attribution in Whatsnew #11899
  • bpo-26828: Add __length_hint__() to builtin map iterator #14432
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2018-12-29.22:33:35.758>
    created_at = <Date 2018-04-05.21:46:53.136>
    labels = ['interpreter-core', '3.8', 'performance']
    title = 'Improve list() pre-sizing for inputs with known lengths'
    updated_at = <Date 2020-03-07.09:22:57.549>
    user = 'https://github.com/rhettinger'

    bugs.python.org fields:

    activity = <Date 2020-03-07.09:22:57.549>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2018-12-29.22:33:35.758>
    closer = 'pablogsal'
    components = ['Interpreter Core']
    creation = <Date 2018-04-05.21:46:53.136>
    creator = 'rhettinger'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 33234
    keywords = ['patch']
    message_count = 14.0
    messages = ['315006', '315412', '315420', '315940', '315941', '315973', '328738', '328741', '328748', '328762', '332737', '335717', '363187', '363587']
    nosy_count = 7.0
    nosy_names = ['tim.peters', 'rhettinger', 'vstinner', 'eric.snow', 'serhiy.storchaka', 'pablogsal', 'miss-islington']
    pr_nums = ['6493', '9846', '10200', '11220', '11899', '14432']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue33234'
    versions = ['Python 3.8']

    @rhettinger
    Copy link
    Contributor Author

    The list() constructor isn't taking full advantage of known input lengths or length hints. Ideally, it should pre-size and not over-allocate when the input size is known or can be reasonably estimated.

    Python 3.8.0a0 (heads/master:091e95e900, Apr  5 2018, 09:48:33)
    [Clang 9.1.0 (clang-902.0.39.1)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from sys import getsizeof
    >>> getsizeof([0] * 10)
    144
    >>> getsizeof(list([0] * 10))
    200
    >>> getsizeof(list(range(10)))
    200

    @rhettinger rhettinger added 3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Apr 5, 2018
    @serhiy-storchaka
    Copy link
    Member

    Calling PyObject_LengthHint() adds an overhead. It is significant for short sequences. I work on a patch that reduces it. PR 6493 adds the second call of PyObject_LengthHint() and increases the overhead.

    As for this issue, in-place repeat overallocates too.

    >>> a = [0]; a *= 10; getsizeof(a)
    200

    I think it would be better to make it not preallocating.

    And maybe it would be worth to avoid overallocating if newsize > allocated + allocated/8 or something like.

    @pablogsal
    Copy link
    Member

    Microbenchmarked with @vstinner's perf module:

    python -m perf timeit "list([0]*10)" -o new.json

    checkout master and build

    python -m perf timeit "list([0]*10)" -o old.json
    python -m perf compare_to new.json old.json
    Mean +- std dev: [new] 241 ns +- 3 ns -> [old] 243 ns +- 18 ns: 1.01x slower (+1%)
    Not significant!

    @vstinner
    Copy link
    Member

    @vstinner
    Copy link
    Member

    Should we shrink the list of the length hint was way too big? For example, if the length hint was 100 but the final list of only 10. Should we shrink in that case?

    I'm asking because it seems like Pablo's PR doesn't shrink the list in that case.

    I wasn't sure so I checked, del shrinks the list if needed:

    >>> x=list(range(1000))
    >>> import sys
    >>> sys.getsizeof(x)
    9112
    >>> del x[1:]
    >>> sys.getsizeof(x)
    96

    @pablogsal
    Copy link
    Member

    @serhiy.storchaka @rhettinger @vstinner Should we better make the pre-allocation if the length of the iterable is known (so we call PyObject_Length and not PyObject_LengthHint)? This will keep the logic simpler (so not shrinking if PyObject_LengthHint gives more than the real length) and it will not be as expensive as calling PyObject_LengthHint.

    @pablogsal
    Copy link
    Member

    New changeset 372d705 by Pablo Galindo in branch 'master':
    bpo-33234 Improve list() pre-sizing for inputs with known lengths (GH-9846)
    372d705

    @vstinner
    Copy link
    Member

    bpo-33234 Improve list() pre-sizing for inputs with known lengths (GH-9846)

    Oh. Can you please document the optimization in the following doc as well?
    https://docs.python.org/dev/whatsnew/3.8.html#optimizations

    @pablogsal
    Copy link
    Member

    Oh. Can you please document the optimization in the following doc as well?

    Sure! Opened #10200 to address that. :)

    @pablogsal
    Copy link
    Member

    New changeset c61e229 by Pablo Galindo in branch 'master':
    bpo-33234: Add exact allocation optimization to lists in What's New (GH-10200)
    c61e229

    @pablogsal
    Copy link
    Member

    New changeset 0e5f771 by Pablo Galindo (Sergey Fedoseev) in branch 'master':
    bpo-33234: Simplify list_preallocate_exact() (GH-11220)
    0e5f771

    @miss-islington
    Copy link
    Contributor

    New changeset e182318 by Miss Islington (bot) (Raymond Hettinger) in branch 'master':
    bpo-33234: Add another attribution in Whatsnew (GH-11899)
    e182318

    @ericsnowcurrently
    Copy link
    Member

    Possible backward incompatibility caused by this issue: bpo-39829

    @vstinner
    Copy link
    Member

    vstinner commented Mar 7, 2020

    See also bpo-14126: "Speed up list comprehensions by preallocating the list where possible".

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants