Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use PEP 590 vectorcall to speed up calls to range(), list() and dict() #81388

Closed
markshannon opened this issue Jun 9, 2019 · 31 comments
Closed
Labels
3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@markshannon
Copy link
Member

BPO 37207
Nosy @vstinner, @phsilva, @encukou, @methane, @ambv, @markshannon, @jdemeyer, @corona10, @miss-islington
PRs
  • bpo-37207: Use PEP 590 vectorcall to speed up range(), list() and dict() by about 30% #13930
  • bpo-37207: enable vectorcall for type.__call__ #14588
  • bpo-37207: Use vectorcall for range() #18464
  • bpo-37207: Use vertorcall for list() #18928
  • bpo-37207: Use PEP 590 vectorcall to speed up tuple() #18936
  • bpo-37207: Add _PyArg_NoKwnames helper function. #18980
  • bpo-37207: Use _PyArg_CheckPositional for tuple vectorcall #18986
  • bpo-37207: Use PEP 590 vectorcall to speed up set() #19019
  • bpo-37207: Use PEP 590 vectorcall to speed up frozenset() #19053
  • bpo-37207: Use PEP 590 vectorcall to speed up dict() #19280
  • bpo-37207: Update whatsnews for 3.9 #21337
  • [3.9] bpo-37207: Update whatsnews for 3.9 (GH-21337) #21347
  • [3.9] bpo-37207: Update whatsnews for 3.9 (GH-21337) #21350
  • Files
  • bench_dict_empty.py
  • bench_dict_kwnames.py
  • bench_dict_update.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-04-05.05:15:46.334>
    created_at = <Date 2019-06-09.09:23:47.228>
    labels = ['interpreter-core', '3.9', 'performance']
    title = 'Use PEP 590 vectorcall to speed up calls to range(), list() and dict()'
    updated_at = <Date 2020-07-06.13:32:13.147>
    user = 'https://github.com/markshannon'

    bugs.python.org fields:

    activity = <Date 2020-07-06.13:32:13.147>
    actor = 'corona10'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-04-05.05:15:46.334>
    closer = 'corona10'
    components = ['Interpreter Core']
    creation = <Date 2019-06-09.09:23:47.228>
    creator = 'Mark.Shannon'
    dependencies = []
    files = ['49018', '49019', '49020']
    hgrepos = []
    issue_num = 37207
    keywords = ['patch']
    message_count = 31.0
    messages = ['345077', '347272', '347336', '349809', '352133', '362219', '364095', '364322', '364324', '364340', '364428', '364447', '364538', '364808', '365307', '365309', '365385', '365387', '365448', '365452', '365488', '365489', '365490', '365491', '365545', '365546', '365553', '365811', '365907', '373095', '373116']
    nosy_count = 9.0
    nosy_names = ['vstinner', 'phsilva', 'petr.viktorin', 'methane', 'lukasz.langa', 'Mark.Shannon', 'jdemeyer', 'corona10', 'miss-islington']
    pr_nums = ['13930', '14588', '18464', '18928', '18936', '18980', '18986', '19019', '19053', '19280', '21337', '21347', '21350']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue37207'
    versions = ['Python 3.9']

    @markshannon
    Copy link
    Member Author

    PEP-590 allows us the short circuit the __new__, __init__ slow path for commonly created builtin types.
    As an initial step, we can speed up calls to range, list and dict by about 30%.
    See https://gist.github.com/markshannon/5cef3a74369391f6ef937d52cca9bfc8

    @markshannon markshannon added type-feature A feature request or enhancement interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Jun 9, 2019
    @methane
    Copy link
    Member

    methane commented Jul 4, 2019

    Can we call tp_call instead of vectorcall when kwargs is not empty?

    cpython/Objects/call.c

    Lines 209 to 219 in 7f41c8e

    /* Convert arguments & call */
    PyObject *const *args;
    PyObject *kwnames;
    args = _PyStack_UnpackDict(_PyTuple_ITEMS(tuple), nargs, kwargs, &kwnames);
    if (args == NULL) {
    return NULL;
    }
    PyObject *result = func(callable, args,
    nargs | PY_VECTORCALL_ARGUMENTS_OFFSET, kwnames);
    _PyStack_UnpackDict_Free(args, nargs, kwnames);
    return result;

    For example, dict_init may be faster than dict_vectorcall when d2 = dict(**d1).

    @jdemeyer
    Copy link
    Contributor

    jdemeyer commented Jul 5, 2019

    One thing that keeps bothering me when using vectorcall for type.__call__ is that we would have two completely independent code paths for constructing an object: the new one using vectorcall and the old one using tp_call, which in turn calls tp_new and tp_init.

    In typical vectorcall usages, there is no need to support the old way any longer: we can set tp_call = PyVectorcall_Call and that's it. But for "type", we still need to support tp_new and tp_init because there may be C code out there that calls tp_new/tp_init directly. To give one concrete example: collections.defaultdict calls PyDict_Type.tp_init

    One solution is to keep the old code for tp_new/tp_init. This is what Mark did in PR 13930. But this leads to duplication of functionality and is therefore error-prone (different code paths may have subtly different behaviour).

    Since we don't want to break Python code calling dict.__new__ or dict.__init__, not implementing those is not an option. But to be compatible with the vectorcall signature, ideally we want to implement __init__ using METH_FASTCALL, so __init__ would need to be a normal method instead of a slot wrapper of tp_init (similar to Python classes). This would work, but it needs some support in typeobject.c

    @miss-islington
    Copy link
    Contributor

    New changeset 37806f4 by Miss Islington (bot) (Jeroen Demeyer) in branch 'master':
    bpo-37207: enable vectorcall for type.__call__ (GH-14588)
    37806f4

    @methane
    Copy link
    Member

    methane commented Sep 12, 2019

    $ ./python -m pyperf timeit --compare-to ./python-master 'dict()'
    python-master: ..................... 89.9 ns +- 1.2 ns
    python: ..................... 72.5 ns +- 1.6 ns

    Mean +- std dev: [python-master] 89.9 ns +- 1.2 ns -> [python] 72.5 ns +- 1.6 ns: 1.24x faster (-19%)

    $ ./python -m pyperf timeit --compare-to ./python-master -s 'import string; a=dict.fromkeys(string.ascii_lowercase); b=dict.fromkeys(string.ascii_uppercase)' -- 'dict(a, **b)'
    python-master: ..................... 1.41 us +- 0.04 us
    python: ..................... 1.53 us +- 0.04 us

    Mean +- std dev: [python-master] 1.41 us +- 0.04 us -> [python] 1.53 us +- 0.04 us: 1.09x slower (+9%)

    ---

    There is some overhead in old dict merging idiom. But it seems reasonable compared to the benefit. LGTM.

    @miss-islington
    Copy link
    Contributor

    New changeset 6e35da9 by Petr Viktorin in branch 'master':
    bpo-37207: Use vectorcall for range() (GH-18464)
    6e35da9

    @vstinner
    Copy link
    Member

    New changeset 9ee88cd by Dong-hee Na in branch 'master':
    bpo-37207: Use PEP-590 vectorcall to speed up tuple() (GH-18936)
    9ee88cd

    @vstinner
    Copy link
    Member

    New changeset c98f87f by Dong-hee Na in branch 'master':
    bpo-37207: Use _PyArg_CheckPositional() for tuple vectorcall (GH-18986)
    c98f87f

    @vstinner
    Copy link
    Member

    New changeset 87ec86c by Dong-hee Na in branch 'master':
    bpo-37207: Add _PyArg_NoKwnames() helper function (GH-18980)
    87ec86c

    @vstinner
    Copy link
    Member

    New changeset 6ff79f6 by Dong-hee Na in branch 'master':
    bpo-37207: Use PEP-590 vectorcall to speed up set() constructor (GH-19019)
    6ff79f6

    @corona10
    Copy link
    Member

    Victor,

    frozenset is the last basic builtin collection which is not applied to this improvement yet.
    frozenset also show similar performance improvement by using vectorcall

    pyperf compare_to master.json bpo-37207.json
    Mean +- std dev: [master] 2.26 us +- 0.06 us -> [bpo-37207] 2.06 us +- 0.05 us: 1.09x faster (-9%)

    What I mean is that vectorcall should not be used for everything

    I definitely agree with this opinion. So I ask your opinion before submit the patch.
    frozenset is not frequently used than the list/set/dict.
    but frozenset is also the basic builtin collection, IMHO it is okay to apply vectorcall.

    What do you think?

    @vstinner
    Copy link
    Member

    What do you think?

    I would prefer to see a PR to give my opinion :)

    @vstinner
    Copy link
    Member

    New changeset 1c60567 by Dong-hee Na in branch 'master':
    bpo-37207: Use PEP-590 vectorcall to speed up frozenset() (GH-19053)
    1c60567

    @vstinner
    Copy link
    Member

    Remaining issue: optimize list(iterable), PR 18928. I reviewed the PR and I'm waiting for Petr.

    @vstinner
    Copy link
    Member

    New changeset ce10554 by Petr Viktorin in branch 'master':
    bpo-37207: Use vectorcall for list() (GH-18928)
    ce10554

    @vstinner
    Copy link
    Member

    All PRs are now merged. Thanks to everybody who was involved in this issue. It's a nice speedup which is always good to take ;-)

    @vstinner vstinner added the 3.9 only security fixes label Mar 30, 2020
    @vstinner vstinner added performance Performance or resource usage and removed type-feature A feature request or enhancement labels Mar 30, 2020
    @encukou
    Copy link
    Member

    encukou commented Mar 31, 2020

    The change to dict() was not covered by the smaller PRs.
    That one will need more thought, but AFAIK it wasn't yet rejected.

    @encukou encukou reopened this Mar 31, 2020
    @vstinner
    Copy link
    Member

    Oh sorry, I missed the dict.

    @corona10
    Copy link
    Member

    corona10 commented Apr 1, 2020

    @vstinner @petr.viktorin

    I 'd like to experiment dict vector call and finalize the work.
    Can I proceed it?

    @encukou
    Copy link
    Member

    encukou commented Apr 1, 2020

    Definitely!

    @corona10
    Copy link
    Member

    corona10 commented Apr 1, 2020

    +------------------+-------------------+-----------------------------+
    | Benchmark | master-dict-empty | bpo-37207-dict-empty |
    +==================+===================+=============================+
    | bench dict empty | 502 ns | 443 ns: 1.13x faster (-12%) |
    +------------------+-------------------+-----------------------------+

    +------------------+--------------------+-----------------------------+
    | Benchmark | master-dict-update | bpo-37207-dict-update |
    +==================+====================+=============================+
    | bench dict empty | 497 ns | 425 ns: 1.17x faster (-15%) |
    +------------------+--------------------+-----------------------------+

    +--------------------+---------------------+-----------------------------+
    | Benchmark | master-dict-kwnames | bpo-37207-dict-kwnames |
    +====================+=====================+=============================+
    | bench dict kwnames | 1.38 us | 917 ns: 1.51x faster (-34%) |
    +--------------------+---------------------+-----------------------------+

    @corona10
    Copy link
    Member

    corona10 commented Apr 1, 2020

    @vstinner @petr.viktorin

    Looks like benchmark showing very impressive result.
    Can I submit the patch?

    @encukou
    Copy link
    Member

    encukou commented Apr 1, 2020

    Can I submit the patch?

    Yes!

    If you think a patch is ready for review, just submit it. There's not much we can comment on before we see the code :)

    (I hope that doesn't contradict what your mentor says...)

    @vstinner
    Copy link
    Member

    vstinner commented Apr 1, 2020

    When I designed the FASTCALL calling convention, I experimented a new tp_fastcall slot to PyTypeObject to optimize __call__() method: bpo-29259.

    Results on the pyperformance benchmark suite were not really convincing and I had technical issues (decide if tp_call or tp_fastcall should be called, handle ABI compatibility and backward compatibility, etc.). I decided to give up on this idea.

    I'm happy to see that PEP-590 managed to find its way into Python internals and actually make Python faster ;-)

    @vstinner
    Copy link
    Member

    vstinner commented Apr 2, 2020

    New changeset e27916b by Dong-hee Na in branch 'master':
    bpo-37207: Use PEP-590 vectorcall to speed up dict() (GH-19280)
    e27916b

    @vstinner
    Copy link
    Member

    vstinner commented Apr 2, 2020

    Can we now close this issue? Or does someone plan to push further optimizations. Maybe new issues can be opened for next optimizations?

    @vstinner
    Copy link
    Member

    vstinner commented Apr 2, 2020

    When I designed the FASTCALL calling convention, I experimented a new tp_fastcall slot to PyTypeObject to optimize __call__() method: bpo-29259.

    Ah, by the way, I also made an attempt to use the FASTCALL calling convention for tp_new and tp_init: bpo-29358. Again, the speedup wasn't obvious and the implementation was quite complicated with many corner cases. So I gave up on this one. It didn't seem to be really worth it.

    @corona10
    Copy link
    Member

    corona10 commented Apr 5, 2020

    IMHO, we can close this PR.

    Summary:
    The PEP-590 vectorcall is applied to list, tuple, dict, set, frozenset and range

    If someone wants to apply PEP-590 to other cases.
    Please open a new issue for it!

    Thank you, Mark, Jeroen, Petr and everyone who works for this issue.

    @corona10 corona10 closed this as completed Apr 5, 2020
    @encukou
    Copy link
    Member

    encukou commented Apr 7, 2020

    As discussed briefly in Mark's PR, benchmarks like this are now slower:

        ret = dict(**{'a': 2, 'b': 4, 'c': 6, 'd': 8})

    Python 3.8: Mean +- std dev: 281 ns +- 9 ns
    master: Mean +- std dev: 456 ns +- 14 ns

    @ambv
    Copy link
    Contributor

    ambv commented Jul 6, 2020

    New changeset b4a9263 by Dong-hee Na in branch 'master':
    bpo-37207: Update whatsnews for 3.9 (GH-21337)
    b4a9263

    @corona10
    Copy link
    Member

    corona10 commented Jul 6, 2020

    New changeset 97558d6 by Dong-hee Na in branch '3.9':
    [3.9] bpo-37207: Update whatsnews for 3.9 (GH-21337)
    97558d6

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    8 participants