Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use small object allocator for dict key storage #67789

Closed
jtaylor mannequin opened this issue Mar 7, 2015 · 24 comments
Closed

use small object allocator for dict key storage #67789

jtaylor mannequin opened this issue Mar 7, 2015 · 24 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@jtaylor
Copy link
Mannequin

jtaylor mannequin commented Mar 7, 2015

BPO 23601
Nosy @malemburg, @tim-one, @rhettinger, @pitrou, @scoder, @vstinner, @markshannon, @serhiy-storchaka
Files
  • 0001-use-small-object-allocator-for-keys-object.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/rhettinger'
    closed_at = <Date 2016-01-31.16:57:34.492>
    created_at = <Date 2015-03-07.11:24:41.023>
    labels = ['interpreter-core', 'performance']
    title = 'use small object allocator for dict key storage'
    updated_at = <Date 2016-01-31.17:48:49.215>
    user = 'https://bugs.python.org/jtaylor'

    bugs.python.org fields:

    activity = <Date 2016-01-31.17:48:49.215>
    actor = 'vstinner'
    assignee = 'rhettinger'
    closed = True
    closed_date = <Date 2016-01-31.16:57:34.492>
    closer = 'rhettinger'
    components = ['Interpreter Core']
    creation = <Date 2015-03-07.11:24:41.023>
    creator = 'jtaylor'
    dependencies = []
    files = ['38371']
    hgrepos = []
    issue_num = 23601
    keywords = ['patch']
    message_count = 24.0
    messages = ['237439', '237442', '237444', '237445', '237448', '237473', '237477', '246520', '246522', '246526', '246544', '246546', '246571', '246603', '246791', '246822', '246825', '246829', '259217', '259218', '259236', '259284', '259286', '259291']
    nosy_count = 10.0
    nosy_names = ['lemburg', 'tim.peters', 'rhettinger', 'pitrou', 'scoder', 'vstinner', 'Mark.Shannon', 'python-dev', 'jtaylor', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue23601'
    versions = ['Python 3.6']

    @jtaylor
    Copy link
    Mannequin Author

    jtaylor mannequin commented Mar 7, 2015

    dictionary creation spends a not insignificant amount of time in malloc allocating keys objects. Python has a nice small object allocator that avoids a lot of this overhead and falls back to malloc for larger allocations.
    Is there a reason the dictionary does not use that allocator for its keys objects?
    doing so e.g. via attached incomplete patch improves small dict creation performance by 15%.

    import  timeit
    print(timeit.repeat("dict(a=5, b=2)"))
    
    with change:
    [0.42825599999923725, 0.4272580000015296, 0.4362329999985377]
    without
    [0.5160610000002634, 0.5181720000000496, 0.518421999999191]

    or is there something I am overlooking and the use of PyMem_Malloc instead of PyObject_Malloc is an intentional design decision?

    @jtaylor jtaylor mannequin added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Mar 7, 2015
    @serhiy-storchaka
    Copy link
    Member

    See also bpo-16465.

    @pitrou
    Copy link
    Member

    pitrou commented Mar 7, 2015

    Interesting. I can reproduce the speedup on 64-bit Linux (Ubuntu 14.10).

    @pitrou pitrou added the performance Performance or resource usage label Mar 7, 2015
    @serhiy-storchaka
    Copy link
    Member

    $ ./python -m timeit "dict(a=5, b=2)"
    Unpatched:         100000 loops, best of 3: 2.5 usec per loop
    issue16465 patch: 1000000 loops, best of 3: 1.87 usec per loop
    issue23601 patch: 1000000 loops, best of 3: 1.98 usec per loop

    @malemburg
    Copy link
    Member

    There seem to be quite a few other places where this simple optimization could make sense as well, perhaps even going as far
    as converting all uses of PyMem_MALLOC to PyObject_MALLOC.

    There was a time when the small memory allocator did not return
    free arenas to the system allocator, but those are long ago.

    The only detail to watch out for is not mixing the malloc/free APIs. In particular, memory which is allocated in the interpreter and then deallocated elsewhere in extensions needs to continue using the PyMem_* APIs.

    @markshannon
    Copy link
    Member

    I don't remember why PyMem_Malloc rather than PyObject_MALLOC was used, it may have been inherited from the allocation of dict tables in the earlier implementation.

    My only concern is that the benchmark only tests performance for very small dictionaries. The small object allocator is limited to 512 bytes before it falls back on malloc, so for larger dict keys the speed up would vanish.

    A benchmark with a range of sizes would be more convincing, if only to show that it is no slower for larger dicts.

    @jtaylor
    Copy link
    Mannequin Author

    jtaylor mannequin commented Mar 7, 2015

    PyObject_Malloc just calls malloc above the threshold so there is no problem for larger dicts.
    For larger dicts the performance of malloc is also irrelevant as the time will be spent elsewhere.

    @markshannon
    Copy link
    Member

    I still think that this is a good idea, but I would like to see a small speed test for large objects. Just to be sure that it is no slower.

    @jtaylor
    Copy link
    Mannequin Author

    jtaylor mannequin commented Jul 9, 2015

    Large objects are just if size > 512: return malloc(size) there is no reason it should be slower.
    Also for large objects allocation speed does not matter as much.

    @markshannon
    Copy link
    Member

    Indeed there is no *obvious* reason why they should be slower.
    But when it comes to optimisation, *never* trust your (or anyone else's) intuition.

    Running a simple check is always worth the effort.

    @scoder
    Copy link
    Contributor

    scoder commented Jul 10, 2015

    It's generally worth running the benchmark suite for this kind of optimisation. Being mostly Python code, it should benefit quite clearly from dictionary improvements, but it should also give an idea of how much of an improvement actual Python code (and not just micro-benchmarks) can show. And it can help detecting unexpected regressions that would not necessarily be revealed by micro-benchmarks.

    https://hg.python.org/benchmarks/

    And I'm with Mark: when it comes to performance optimisations, repeating even a firm intuition doesn't save us from validating that this intuition actually matches reality. Anything that seems obvious at first sight may still be proven wrong by benchmarks, and has often enough been so in the past.

    @jtaylor
    Copy link
    Mannequin Author

    jtaylor mannequin commented Jul 10, 2015

    Your benchmarks are not affected by this change see the other issue. They are also not representative of every workload out there.

    I can at least see the argument why you didn't want to put the other variant of this change in as it made the code a tiny bit more complicated, but I do not understand the reluctance for this variant. It doesn't change the complexity of the code one bit.
    If you doubt the performance of pythons own small object allocator, python should maybe stop using it alltogether?

    @scoder
    Copy link
    Contributor

    scoder commented Jul 10, 2015

    Your benchmarks are not affected by this change see the other issue.

    Then you ran them, I assume? Could you show the output here that you got?

    They are also not representative of every workload out there.

    Certainly not, but they do provide a certain variety of workloads that
    reduce the likelihood of not spotting a regression that this change might
    introduce.

    Note that people seem to be rather in favour of your proposed change. If
    you can show that there are no visible drawbacks, one of them will most
    likely apply it for you.

    @jtaylor
    Copy link
    Mannequin Author

    jtaylor mannequin commented Jul 11, 2015

    ok I ran it again, but note the machine was under use the full time so the results are likely have no meaning.

    python perf.py -r -b default /tmp/normal/bin/python3 /tmp/opt/bin/python3

    Min: 0.399279 -> 0.376527: 1.06x faster
    Avg: 0.410819 -> 0.383315: 1.07x faster
    Significant (t=49.29)
    Stddev: 0.00450 -> 0.00330: 1.3631x smaller

    ### etree_iterparse ###
    Min: 0.639638 -> 0.630989: 1.01x faster
    Avg: 0.658744 -> 0.641842: 1.03x faster
    Significant (t=14.82)
    Stddev: 0.00959 -> 0.00617: 1.5557x smaller

    ### etree_parse ###
    Min: 0.433050 -> 0.377830: 1.15x faster
    Avg: 0.444014 -> 0.389695: 1.14x faster
    Significant (t=43.28)
    Stddev: 0.01010 -> 0.00745: 1.3570x smaller

    ### tornado_http ###
    Min: 0.335834 -> 0.326492: 1.03x faster
    Avg: 0.346100 -> 0.334186: 1.04x faster
    Significant (t=13.66)
    Stddev: 0.01024 -> 0.00689: 1.4864x smaller

    The following not significant results are hidden, use -v to show them:
    2to3, django_v2, etree_process, fastpickle, fastunpickle, json_dump_v2, json_load, nbody, regex_v8.

    /tmp$ /tmp/normal/bin/python3 -c 'import timeit; print(timeit.repeat("dict(a=5, b=2)"))'
    [0.5112445619997743, 0.5141109460000735, 0.5185121280010208]
    /tmp$ /tmp/opt/bin/python3 -c 'import timeit; print(timeit.repeat("dict(a=5, b=2)"))'
    [0.4426167189994885, 0.4465744609988178, 0.4467797579982289]

    @scoder
    Copy link
    Contributor

    scoder commented Jul 16, 2015

    Benchmark results look good to me (although a header line is missing) and seem to match my expectations. Sounds like we should allow this change.

    @markshannon
    Copy link
    Member

    +1 from me.

    @serhiy-storchaka
    Copy link
    Member

    If use small object allocator for dict key storage, why not use it for dict value storage?

    @markshannon
    Copy link
    Member

    Yes, but that shouldn't block this issue.
    I've opened bpo-24648 instead.

    @jtaylor
    Copy link
    Mannequin Author

    jtaylor mannequin commented Jan 29, 2016

    ping, this has been sitting for 4 years and two python releases. Its about time this stupidly simple thing gets merged.

    @tim-one
    Copy link
    Member

    tim-one commented Jan 29, 2016

    +1 from me. Julian, you have the patience of a saint ;-)

    @rhettinger
    Copy link
    Contributor

    If there are no objects, I'll apply the patch tomorrow.

    @rhettinger rhettinger self-assigned this Jan 30, 2016
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 31, 2016

    New changeset 425fec4a79c7 by Raymond Hettinger in branch 'default':
    Issue bpo-23601: Use small object allocator for dict key objects
    https://hg.python.org/cpython/rev/425fec4a79c7

    @rhettinger
    Copy link
    Contributor

    Thanks for the patch and for your patience.

    @vstinner
    Copy link
    Member

    Nice change. I opened the issue bpo-26249 to continue the investigation.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    8 participants