Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use FASTCALL for collections.deque methods: index, insert, rotate #73638

Closed
vstinner opened this issue Feb 5, 2017 · 11 comments
Closed

Use FASTCALL for collections.deque methods: index, insert, rotate #73638

vstinner opened this issue Feb 5, 2017 · 11 comments
Assignees
Labels
3.7 (EOL) end of life performance Performance or resource usage

Comments

@vstinner
Copy link
Member

vstinner commented Feb 5, 2017

BPO 29452
Nosy @rhettinger, @vstinner, @serhiy-storchaka
Files
  • deque.patch
  • deque-2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/rhettinger'
    closed_at = <Date 2017-02-06.15:10:37.902>
    created_at = <Date 2017-02-05.10:09:27.482>
    labels = ['3.7', 'performance']
    title = 'Use FASTCALL for collections.deque methods: index, insert, rotate'
    updated_at = <Date 2017-02-06.16:00:26.933>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2017-02-06.16:00:26.933>
    actor = 'python-dev'
    assignee = 'rhettinger'
    closed = True
    closed_date = <Date 2017-02-06.15:10:37.902>
    closer = 'vstinner'
    components = []
    creation = <Date 2017-02-05.10:09:27.482>
    creator = 'vstinner'
    dependencies = []
    files = ['46523', '46533']
    hgrepos = []
    issue_num = 29452
    keywords = ['patch']
    message_count = 11.0
    messages = ['287044', '287046', '287068', '287083', '287091', '287100', '287117', '287140', '287141', '287142', '287147']
    nosy_count = 4.0
    nosy_names = ['rhettinger', 'vstinner', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue29452'
    versions = ['Python 3.7']

    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 5, 2017

    Attached patch changes index(), insert() and rotate() functions of the collections.deque type to use FASTCALL calling convention. I chose to only modify these functions since they use METH_VARARGS which requires to create a temporary tuple, whereas other functions use METH_NOARGS or METH_O which is already fast ;-)

    I know that Raymond, maintainer of the collections module, is not a big fan of Argument Clinic ;-) So I wrote the minimum change and chose to not use Argument Clinic yet. By the way, the index() method has the signature "D.index(value, [start, [stop]])" which is not supported by Argument Clinic yet: see issue bpo-29299. For these reasons, I propose to wait to convert collections.deque to Argument Clinic, it can be done later.

    Ok, now the cool part: it makes these methods faster ;-)

    • d.rotate(): 1.10x faster
    • d.rotate(1): 1.24x faster
    • d.insert(): 1.18x faster
    • d.index(): 1.24x faster
    $ ./python -m perf timeit -s 'import collections; d=collections.deque()' 'd.rotate()' --compare-to=../default-ref/python 

    Median +- std dev: [ref] 70.5 ns +- 0.9 ns -> [patch] 64.2 ns +- 0.3 ns: 1.10x faster (-9%)

    $ ./python -m perf timeit -s 'import collections; d=collections.deque()' 'd.rotate(1)' --compare-to=../default-ref/python

    Median +- std dev: [ref] 107 ns +- 1 ns -> [patch] 86.2 ns +- 1.1 ns: 1.24x faster (-20%)

    $ ./python -m perf timeit -s 'import collections' 'd=collections.deque(); d.insert(0, None); d.insert(1, None); d.insert(2, None); d.insert(3, None); d.insert(4, None)' --compare-to=../default-ref/python -p3

    Median +- std dev: [ref] 699 ns +- 6 ns -> [patch] 591 ns +- 5 ns: 1.18x faster (-15%)

    $ ./python -m perf timeit -s 'import collections; d=collections.deque((None,))' 'd.index(None)' --compare-to=../default-ref/python 

    Median +- std dev: [ref] 115 ns +- 1 ns -> [patch] 92.5 ns +- 0.8 ns: 1.24x faster (-19%)

    @vstinner vstinner added 3.7 (EOL) end of life performance Performance or resource usage labels Feb 5, 2017
    @serhiy-storchaka
    Copy link
    Member

    I think _PyArg_NoStackKeywords() should be called before _PyArg_ParseStack(), otherwise this can cause not correct error messages.

    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 6, 2017

    deque-2.patch calls _PyArg_NoStackKeywords() before _PyArg_ParseStack().

    @serhiy-storchaka
    Copy link
    Member

    The patch is simple and technically it looks correct, but I'm not a fan of using FASTCALL outside of Argument Clinic at this stage. The API still can be changed. Fortunately only three deque methods could have a benefit from using FASTCALL.

    @rhettinger
    Copy link
    Contributor

    Over this looks good. Just one other minor tweak (one that has served me well elsewhere) would be to bypass the cross-module function call with a cheap (near zero cost) register variable test:

        if (kwnames != NULL && !_PyArg_NoStackKeywords("rotate", kwnames)) {
            return NULL;
        }

    @serhiy-storchaka
    Copy link
    Member

    This already was discussed in bpo-26822. bpo-29460 makes _PyArg_NoStackKeywords() cheaper.

    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 6, 2017

    Over this looks good. Just one other minor tweak (one that has served me well elsewhere) would be to bypass the cross-module function call with a cheap (near zero cost) register variable test: (...)

    This has just been optimized by Serhiy, change 82d1c8d15e18.

    So, is deque-2.patch good now?

    @rhettinger
    Copy link
    Contributor

    Yes, go ahead and apply.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Feb 6, 2017

    New changeset 1c048539200c by Victor Stinner in branch 'default':
    Optimize deque index, insert and rotate() methods
    https://hg.python.org/cpython/rev/1c048539200c

    @vstinner
    Copy link
    Member Author

    vstinner commented Feb 6, 2017

    Raymond: "Yes, go ahead and apply."

    Great, done. Thanks for the reviews Serhiy and Raymond.

    As I wrote, you can consider to use Argument Clinic later, but there is no urgency for that ;-) I close the issue.

    @vstinner vstinner closed this as completed Feb 6, 2017
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Feb 6, 2017

    New changeset 55949f988dc1b943796d9852cc4d588c58cc4255 by Victor Stinner in branch 'master':
    Optimize deque index, insert and rotate() methods
    55949f9

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants