Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUILD_MAP_UNPACK_WITH_CALL is slow #71545

Closed
serprex mannequin opened this issue Jun 21, 2016 · 17 comments
Closed

BUILD_MAP_UNPACK_WITH_CALL is slow #71545

serprex mannequin opened this issue Jun 21, 2016 · 17 comments
Assignees
Labels
3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@serprex
Copy link
Mannequin

serprex mannequin commented Jun 21, 2016

BPO 27358
Nosy @rhettinger, @vstinner, @berkerpeksag, @serhiy-storchaka, @Vgr255, @serprex
PRs
  • [Do Not Merge] Convert Misc/NEWS so that it is managed by towncrier #552
  • Files
  • mapaca.patch
  • mapaca2.patch
  • faster_build_map_unpack_with_call.patch
  • faster_build_map_unpack_with_call2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2016-10-02.08:39:02.607>
    created_at = <Date 2016-06-21.01:46:01.118>
    labels = ['interpreter-core', '3.7', 'performance']
    title = 'BUILD_MAP_UNPACK_WITH_CALL is slow'
    updated_at = <Date 2017-03-31.16:36:38.606>
    user = 'https://github.com/serprex'

    bugs.python.org fields:

    activity = <Date 2017-03-31.16:36:38.606>
    actor = 'dstufft'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2016-10-02.08:39:02.607>
    closer = 'serhiy.storchaka'
    components = ['Interpreter Core']
    creation = <Date 2016-06-21.01:46:01.118>
    creator = 'Demur Rumed'
    dependencies = []
    files = ['43494', '43498', '44540', '44799']
    hgrepos = []
    issue_num = 27358
    keywords = ['patch']
    message_count = 17.0
    messages = ['268951', '268967', '268989', '268990', '268992', '268997', '273680', '275710', '277279', '277283', '277285', '277287', '277297', '277859', '277866', '277877', '277878']
    nosy_count = 7.0
    nosy_names = ['rhettinger', 'vstinner', 'python-dev', 'berker.peksag', 'serhiy.storchaka', 'abarry', 'Demur Rumed']
    pr_nums = ['552']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue27358'
    versions = ['Python 3.6', 'Python 3.7']

    @serprex
    Copy link
    Mannequin Author

    serprex mannequin commented Jun 21, 2016

    BUILD_MAP_UNPACK_WITH_CALL is _really_ slow, wasting much of its time asserting that keys are non overlapping. This patch optimizes a fast path for distinct dicts, especially useful for bpo-27213 where BUILD_MAP_UNPACK_WITH_CALL is generated for a single **kw rather than needing **kw1,**kw2 to hit this slow opcode

    This patch tracks size of dictionary, if size doesn't increase by same size as the dict we updated sum with, then we scan to find where collision is. Further optimization can be done by scanning size of dicts to preallocate dictionary

    Microbenchmark:
    from timeit import timeit
    def f(x):return x
    timeit(lambda:f(
    {'a':2},**{'b':2}))

    Unpatched takes ~15s
    Patched takes ~5s

    @serprex serprex mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Jun 21, 2016
    @serhiy-storchaka
    Copy link
    Member

    The problem is that var-keyword argument can be not a dict, but general mapping. This optimization is applicable only if it is a dict.

    @serprex
    Copy link
    Mannequin Author

    serprex mannequin commented Jun 21, 2016

    mapaca2 heavy handily deals with the must-work-with-all-mappings by converting any non dict mappings on the stack with dicts when with_call is true

    I'm not sure if it'd be better to seek a less opcode centric fix-- ie introduce a method to dictobject which returns None if no collision occurs otherwise it returns the first key which collides and stops updating at that point. PyDict_DistinctUpdate

    @serprex
    Copy link
    Mannequin Author

    serprex mannequin commented Jun 21, 2016

    (returning None wouldn't work because that may be the key, but something like returning the dict itself (ie an unhashable) or keeping this as a C API & returning NULL would work)

    @Vgr255
    Copy link
    Mannequin

    Vgr255 mannequin commented Jun 21, 2016

    dict subclasses can be hashable - it's a very bad idea, but not guarded against. If you were to do this, I'd suggest going the exception way, à la StopIteration.

    I wonder how feasible it would be for e.g. dict.__setitem__ to check if a key already exists; it would be a special path that's not taken normally, but filling the mapping on call enables it, and it raises if it sees a duplicate. I think that inserting the check in there would reduce complexity and probably have a negligible impact on performance. I don't know if that's a good idea, just throwing this out here.

    @serhiy-storchaka
    Copy link
    Member

    I think this kills the optimization effect for non-dicts.

    See on PyDict_Merge(). It takes the boolean parameter that controls the behavior in case of matching keys. I think the best would be to rename it to say _PyDict_MergeEx(), extend the boolean parameter to ternary parameter, and raise an exception if it is in the third state and matching keys are found. PyDict_Merge() would be implemented as a simple wrapper around _PyDict_MergeEx(). We should check wherever this affects the performance of dict.update().

    @vstinner
    Copy link
    Member

    See also issue bpo-27845: "Optimize update_keyword_args() function".

    @serhiy-storchaka
    Copy link
    Member

    Here is a patch that gets rid of calculating intersections. I didn't make benchmarking still.

    @serhiy-storchaka
    Copy link
    Member

    The side effect of the last patch is fixing a regression in 3.6 and a bug in 3.5 (bpo-28257).

    @serhiy-storchaka
    Copy link
    Member

    Microbenchmarks:

    $ ./python -m timeit -s "def f(**kw): pass" -s "b = {'b': 2}" -- "f(a=1, **b)"
    Unpatched:  100000 loops, best of 3: 7.64 usec per loop
    Patched:    100000 loops, best of 3: 3.14 usec per loop
    
    $ ./python -m timeit -s "def f(**kw): pass" -s "a = {'a': 1}; b = {'b': 2}" -- "f(**a, **b)"
    Unpatched:  100000 loops, best of 3: 6.93 usec per loop
    Patched:    100000 loops, best of 3: 2.66 usec per loop
    
    $ ./python -m timeit -s "def f(a=None, b=None): pass" -s "b = {'b': 2}" -- "f(a=1, **b)"
    Unpatched:  100000 loops, best of 3: 7.27 usec per loop
    Patched:    100000 loops, best of 3: 2.83 usec per loop
    
    $ ./python -m timeit -s "def f(a=None, b=None): pass" -s "a = {'a': 1}; b = {'b': 2}" -- "f(**a, **b)"
    Unpatched:  100000 loops, best of 3: 6.47 usec per loop
    Patched:    100000 loops, best of 3: 2.31 usec per loop

    @vstinner
    Copy link
    Member

    $ ./python -m timeit -s "def f(**kw): pass" -s "b = {'b': 2}" -- "f(a=1, **b)"
    Unpatched: 100000 loops, best of 3: 7.64 usec per loop
    Patched: 100000 loops, best of 3: 3.14 usec per loop

    Impressive numbers but... is it a perf regression compared to Python 3.5 (and even 2.7)? Or a perf speedup compared to Python 3.5?

    @serhiy-storchaka
    Copy link
    Member

    Yes, it is expected perf regression compared to Python 3.5 and 2.7 when pass keyword arguments and single var-keyword argument (because 3.6 uses BUILD_MAP_UNPACK_WITH_CALL, while 2.7 and 3.5 don't need it for this case). In case of multiple var-keyword arguments (all versions need BUILD_MAP_UNPACK_WITH_CALL) even unpatched 3.6 is faster than 3.5.

    $ ./python -m timeit -s "def f(**kw): pass" -s "b = {'b': 2}" -- "f(a=1, **b)"
    Python 2.7:            100000 loops, best of 3: 2.21 usec per loop
    Python 3.5:            100000 loops, best of 3: 4.31 usec per loop
    Python 3.6 unpatched:  100000 loops, best of 3: 7.64 usec per loop
    Python 3.6 patched:    100000 loops, best of 3: 3.14 usec per loop
    
    $ ./python -m timeit -s "def f(**kw): pass" -s "a = {'a': 1}; b = {'b': 2}" -- "f(**a, **b)"
    Python 3.5:            100000 loops, best of 3: 11.6 usec per loop
    Python 3.6 unpatched:  100000 loops, best of 3: 6.93 usec per loop
    Python 3.6 patched:    100000 loops, best of 3: 2.66 usec per loop
    
    $ ./python -m timeit -s "def f(a=None, b=None): pass" -s "b = {'b': 2}" -- "f(a=1, **b)"
    Python 2.7:            100000 loops, best of 3: 1.97 usec per loop
    Python 3.5:            100000 loops, best of 3: 3.75 usec per loop
    Python 3.6 unpatched:  100000 loops, best of 3: 7.27 usec per loop
    Python 3.6 patched:    100000 loops, best of 3: 2.83 usec per loop
    
    $ ./python -m timeit -s "def f(a=None, b=None): pass" -s "a = {'a': 1}; b = {'b': 2}" -- "f(**a, **b)"
    Python 3.5:            100000 loops, best of 3: 10.6 usec per loop
    Python 3.6 unpatched:  100000 loops, best of 3: 6.47 usec per loop
    Python 3.6 patched:    100000 loops, best of 3: 2.31 usec per loop

    @serhiy-storchaka
    Copy link
    Member

    Updated patch addresses Victor's comments.

    @serhiy-storchaka serhiy-storchaka added the 3.7 (EOL) end of life label Oct 2, 2016
    @serhiy-storchaka serhiy-storchaka self-assigned this Oct 2, 2016
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 2, 2016

    New changeset 148172f75d43 by Serhiy Storchaka in branch '3.6':
    Issue bpo-27358: Optimized merging var-keyword arguments and improved error
    https://hg.python.org/cpython/rev/148172f75d43

    New changeset 489ad68b35c0 by Serhiy Storchaka in branch 'default':
    Issue bpo-27358: Optimized merging var-keyword arguments and improved error
    https://hg.python.org/cpython/rev/489ad68b35c0

    New changeset ec84d815e90f by Serhiy Storchaka in branch '3.5':
    Issue bpo-27358: Backported tests.
    https://hg.python.org/cpython/rev/ec84d815e90f

    New changeset 29a658d47ae8 by Serhiy Storchaka in branch '2.7':
    Issue bpo-27358: Backported tests.
    https://hg.python.org/cpython/rev/29a658d47ae8

    @berkerpeksag
    Copy link
    Member

    test_unpack_ex fails on several buildbots:

    http://buildbot.python.org/all/builders/x86-64%20Ubuntu%2015.10%20Skylake%20CPU%203.6/builds/120/steps/test/logs/stdio

    test test_unpack_ex failed
    **********************************************************************
    File "/home/buildbot/buildarea/3.6.intel-ubuntu-skylake/build/Lib/test/test_unpack_ex.py", line ?, in test.test_unpack_ex.__test__.doctests
    Failed example:
        {**1}
    Expected:
        Traceback (most recent call last):
        ...
        TypeError: 'int' object is not a mapping
    Got:
        Traceback (most recent call last):
          File "/home/buildbot/buildarea/3.6.intel-ubuntu-skylake/build/Lib/doctest.py", line 1330, in __run
            compileflags, 1), test.globs)
          File "<doctest test.test_unpack_ex.__test__.doctests[38]>", line 1, in <module>
            {**1}
        TypeError: 'int' object is not a mapping1
    **********************************************************************
    File "/home/buildbot/buildarea/3.6.intel-ubuntu-skylake/build/Lib/test/test_unpack_ex.py", line ?, in test.test_unpack_ex.__test__.doctests
    Failed example:
        {**[]}
    Expected:
        Traceback (most recent call last):
        ...
        TypeError: 'list' object is not a mapping
    Got:
        Traceback (most recent call last):
          File "/home/buildbot/buildarea/3.6.intel-ubuntu-skylake/build/Lib/doctest.py", line 1330, in __run
            compileflags, 1), test.globs)
          File "<doctest test.test_unpack_ex.__test__.doctests[39]>", line 1, in <module>
            {**[]}
        TypeError: 'list' object is not a mapping1
    **********************************************************************
    1 items had failures:
       2 of  85 in test.test_unpack_ex.__test__.doctests
    ***Test Failed*** 2 failures.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 2, 2016

    New changeset e2d4e077cfb2 by Berker Peksag in branch '3.6':
    Issue bpo-27358: Fix typo in error message
    https://hg.python.org/cpython/rev/e2d4e077cfb2

    New changeset 9abb316f1593 by Berker Peksag in branch 'default':
    Issue bpo-27358: Merge from 3.6
    https://hg.python.org/cpython/rev/9abb316f1593

    @serhiy-storchaka
    Copy link
    Member

    Thanks Berker!

    This was temporary change for debugging. I forgot to remove it.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants