classification
Title: Use FASTCALL for collections.deque methods: index, insert, rotate
Type: performance Stage: resolved
Components: Versions: Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: haypo, python-dev, rhettinger, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2017-02-05 10:09 by haypo, last changed 2017-02-06 16:00 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
deque.patch haypo, 2017-02-05 10:09 review
deque-2.patch haypo, 2017-02-06 00:00 review
Messages (11)
msg287044 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-02-05 10:09
Attached patch changes index(), insert() and rotate() functions of the collections.deque type to use FASTCALL calling convention. I chose to only modify these functions since they use METH_VARARGS which requires to create a temporary tuple, whereas other functions use METH_NOARGS or METH_O which is already fast ;-)

I know that Raymond, maintainer of the collections module, is not a big fan of Argument Clinic ;-) So I wrote the minimum change and chose to not use Argument Clinic yet. By the way, the index() method has the signature "D.index(value, [start, [stop]])" which is not supported by Argument Clinic yet: see issue #29299. For these reasons, I propose to wait to convert collections.deque to Argument Clinic, it can be done later.
 
Ok, now the cool part: it makes these methods faster ;-)

* d.rotate(): 1.10x faster
* d.rotate(1): 1.24x faster
* d.insert(): 1.18x faster
* d.index(): 1.24x faster

$ ./python -m perf timeit -s 'import collections; d=collections.deque()' 'd.rotate()' --compare-to=../default-ref/python 

Median +- std dev: [ref] 70.5 ns +- 0.9 ns -> [patch] 64.2 ns +- 0.3 ns: 1.10x faster (-9%)

$ ./python -m perf timeit -s 'import collections; d=collections.deque()' 'd.rotate(1)' --compare-to=../default-ref/python

Median +- std dev: [ref] 107 ns +- 1 ns -> [patch] 86.2 ns +- 1.1 ns: 1.24x faster (-20%)

$ ./python -m perf timeit -s 'import collections' 'd=collections.deque(); d.insert(0, None); d.insert(1, None); d.insert(2, None); d.insert(3, None); d.insert(4, None)' --compare-to=../default-ref/python -p3

Median +- std dev: [ref] 699 ns +- 6 ns -> [patch] 591 ns +- 5 ns: 1.18x faster (-15%)

$ ./python -m perf timeit -s 'import collections; d=collections.deque((None,))' 'd.index(None)' --compare-to=../default-ref/python 

Median +- std dev: [ref] 115 ns +- 1 ns -> [patch] 92.5 ns +- 0.8 ns: 1.24x faster (-19%)
msg287046 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-02-05 12:30
I think _PyArg_NoStackKeywords() should be called before _PyArg_ParseStack(), otherwise this can cause not correct error messages.
msg287068 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-02-06 00:00
deque-2.patch calls _PyArg_NoStackKeywords() before _PyArg_ParseStack().
msg287083 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-02-06 06:45
The patch is simple and technically it looks correct, but I'm not a fan of using FASTCALL outside of Argument Clinic at this stage. The API still can be changed. Fortunately only three deque methods could have a benefit from using FASTCALL.
msg287091 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-02-06 07:32
Over this looks good.  Just one other minor tweak (one that has served me well elsewhere) would be to bypass the cross-module function call with a cheap (near zero cost) register variable test:

    if (kwnames != NULL && !_PyArg_NoStackKeywords("rotate", kwnames)) {
        return NULL;
    }
msg287100 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-02-06 08:12
This already was discussed in issue26822. Issue29460 makes _PyArg_NoStackKeywords() cheaper.
msg287117 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-02-06 10:21
> Over this looks good.  Just one other minor tweak (one that has served me well elsewhere) would be to bypass the cross-module function call with a cheap (near zero cost) register variable test: (...)

This has just been optimized by Serhiy, change 82d1c8d15e18.

So, is deque-2.patch good now?
msg287140 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-02-06 14:57
Yes, go ahead and apply.
msg287141 - (view) Author: Roundup Robot (python-dev) Date: 2017-02-06 15:08
New changeset 1c048539200c by Victor Stinner in branch 'default':
Optimize deque index, insert and rotate() methods
https://hg.python.org/cpython/rev/1c048539200c
msg287142 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-02-06 15:10
Raymond: "Yes, go ahead and apply."

Great, done. Thanks for the reviews Serhiy and Raymond.

As I wrote, you can consider to use Argument Clinic later, but there is no urgency for that ;-) I close the issue.
msg287147 - (view) Author: Roundup Robot (python-dev) Date: 2017-02-06 16:00
New changeset 55949f988dc1b943796d9852cc4d588c58cc4255 by Victor Stinner in branch 'master':
Optimize deque index, insert and rotate() methods
https://github.com/python/cpython/commit/55949f988dc1b943796d9852cc4d588c58cc4255
History
Date User Action Args
2017-02-06 16:00:26python-devsetmessages: + msg287147
2017-02-06 15:10:37hayposetstatus: open -> closed
resolution: fixed
messages: + msg287142

stage: resolved
2017-02-06 15:08:07python-devsetnosy: + python-dev
messages: + msg287141
2017-02-06 14:57:07rhettingersetmessages: + msg287140
2017-02-06 10:21:29hayposetmessages: + msg287117
2017-02-06 08:12:45serhiy.storchakasetmessages: + msg287100
2017-02-06 07:32:56rhettingersetmessages: + msg287091
2017-02-06 06:45:55serhiy.storchakasetmessages: + msg287083
2017-02-06 00:00:59hayposetfiles: + deque-2.patch

messages: + msg287068
2017-02-05 12:30:26serhiy.storchakasetmessages: + msg287046
2017-02-05 10:10:01hayposetassignee: rhettinger
2017-02-05 10:09:27haypocreate