classification
Title: Use PEP 590 vectorcall to speed up calls to range(), list() and dict()
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Mark.Shannon, corona10, jdemeyer, lukasz.langa, methane, miss-islington, petr.viktorin, phsilva, vstinner
Priority: normal Keywords: patch

Created on 2019-06-09 09:23 by Mark.Shannon, last changed 2020-07-06 13:32 by corona10. This issue is now closed.

Files
File name Uploaded Description Edit
bench_dict_empty.py corona10, 2020-04-01 15:39
bench_dict_kwnames.py corona10, 2020-04-01 15:39
bench_dict_update.py corona10, 2020-04-01 15:39
Pull Requests
URL Status Linked Edit
PR 13930 closed Mark.Shannon, 2019-06-09 09:40
PR 14588 merged jdemeyer, 2019-07-04 13:44
PR 18464 merged petr.viktorin, 2020-02-11 16:37
PR 18928 merged petr.viktorin, 2020-03-11 15:11
PR 18936 merged corona10, 2020-03-11 17:42
PR 18980 merged corona10, 2020-03-13 16:56
PR 18986 merged corona10, 2020-03-14 08:26
PR 19019 merged corona10, 2020-03-15 14:27
PR 19053 merged corona10, 2020-03-18 01:34
PR 19280 merged corona10, 2020-04-01 15:57
PR 21337 merged corona10, 2020-07-05 15:20
PR 21347 closed miss-islington, 2020-07-06 11:22
PR 21350 merged corona10, 2020-07-06 12:59
Messages (31)
msg345077 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2019-06-09 09:23
PEP 590 allows us the short circuit the __new__, __init__ slow path for commonly created builtin types.
As an initial step, we can speed up calls to range, list and dict by about 30%.
See https://gist.github.com/markshannon/5cef3a74369391f6ef937d52cca9bfc8
msg347272 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2019-07-04 11:11
Can we call tp_call instead of vectorcall when kwargs is not empty?
https://github.com/python/cpython/blob/7f41c8e0dd237d1f3f0a1d2ba2f3ee4e4bd400a7/Objects/call.c#L209-L219

For example, dict_init may be faster than dict_vectorcall when `d2 = dict(**d1)`.
msg347336 - (view) Author: Jeroen Demeyer (jdemeyer) * (Python triager) Date: 2019-07-05 12:31
One thing that keeps bothering me when using vectorcall for type.__call__ is that we would have two completely independent code paths for constructing an object: the new one using vectorcall and the old one using tp_call, which in turn calls tp_new and tp_init.

In typical vectorcall usages, there is no need to support the old way any longer: we can set tp_call = PyVectorcall_Call and that's it. But for "type", we still need to support tp_new and tp_init because there may be C code out there that calls tp_new/tp_init directly. To give one concrete example: collections.defaultdict calls PyDict_Type.tp_init

One solution is to keep the old code for tp_new/tp_init. This is what Mark did in PR 13930. But this leads to duplication of functionality and is therefore error-prone (different code paths may have subtly different behaviour).

Since we don't want to break Python code calling dict.__new__ or dict.__init__, not implementing those is not an option. But to be compatible with the vectorcall signature, ideally we want to implement __init__ using METH_FASTCALL, so __init__ would need to be a normal method instead of a slot wrapper of tp_init (similar to Python classes). This would work, but it needs some support in typeobject.c
msg349809 - (view) Author: miss-islington (miss-islington) Date: 2019-08-15 15:49
New changeset 37806f404f57b234902f0c8de9a04647ad01b7f1 by Miss Islington (bot) (Jeroen Demeyer) in branch 'master':
bpo-37207: enable vectorcall for type.__call__ (GH-14588)
https://github.com/python/cpython/commit/37806f404f57b234902f0c8de9a04647ad01b7f1
msg352133 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2019-09-12 11:48
$ ./python -m pyperf timeit --compare-to ./python-master 'dict()'
python-master: ..................... 89.9 ns +- 1.2 ns
python: ..................... 72.5 ns +- 1.6 ns

Mean +- std dev: [python-master] 89.9 ns +- 1.2 ns -> [python] 72.5 ns +- 1.6 ns: 1.24x faster (-19%)

$ ./python -m pyperf timeit --compare-to ./python-master -s 'import string; a=dict.fromkeys(string.ascii_lowercase); b=dict.fromkeys(string.ascii_uppercase)' -- 'dict(a, **b)'
python-master: ..................... 1.41 us +- 0.04 us
python: ..................... 1.53 us +- 0.04 us

Mean +- std dev: [python-master] 1.41 us +- 0.04 us -> [python] 1.53 us +- 0.04 us: 1.09x slower (+9%)

---

There is some overhead in old dict merging idiom.  But it seems reasonable compared to the benefit. LGTM.
msg362219 - (view) Author: miss-islington (miss-islington) Date: 2020-02-18 15:13
New changeset 6e35da976370e7c2e028165c65d7d7d42772a71f by Petr Viktorin in branch 'master':
bpo-37207: Use vectorcall for range() (GH-18464)
https://github.com/python/cpython/commit/6e35da976370e7c2e028165c65d7d7d42772a71f
msg364095 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-13 13:57
New changeset 9ee88cde1abf7f274cc55a0571b1c2cdb1263743 by Dong-hee Na in branch 'master':
bpo-37207: Use PEP 590 vectorcall to speed up tuple() (GH-18936)
https://github.com/python/cpython/commit/9ee88cde1abf7f274cc55a0571b1c2cdb1263743
msg364322 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-16 14:04
New changeset c98f87fc330eb40fbcff627dfc50958785a44f35 by Dong-hee Na in branch 'master':
bpo-37207: Use _PyArg_CheckPositional() for tuple vectorcall (GH-18986)
https://github.com/python/cpython/commit/c98f87fc330eb40fbcff627dfc50958785a44f35
msg364324 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-16 14:06
New changeset 87ec86c425a5cd3ad41b831b54c0ce1a0c363f4b by Dong-hee Na in branch 'master':
bpo-37207: Add _PyArg_NoKwnames() helper function (GH-18980)
https://github.com/python/cpython/commit/87ec86c425a5cd3ad41b831b54c0ce1a0c363f4b
msg364340 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-16 17:17
New changeset 6ff79f65820031b219622faea8425edaec9a43f3 by Dong-hee Na in branch 'master':
bpo-37207: Use PEP 590 vectorcall to speed up set() constructor (GH-19019)
https://github.com/python/cpython/commit/6ff79f65820031b219622faea8425edaec9a43f3
msg364428 - (view) Author: Dong-hee Na (corona10) * (Python committer) Date: 2020-03-17 13:58
Victor,

frozenset is the last basic builtin collection which is not applied to this improvement yet.
frozenset also show similar performance improvement by using vectorcall

pyperf compare_to master.json bpo-37207.json
Mean +- std dev: [master] 2.26 us +- 0.06 us -> [bpo-37207] 2.06 us +- 0.05 us: 1.09x faster (-9%) 

> What I mean is that vectorcall should not be used for everything

I definitely agree with this opinion. So I ask your opinion before submit the patch.
frozenset is not frequently used than the list/set/dict.
but frozenset is also the basic builtin collection, IMHO it is okay to apply vectorcall.

What do you think?
msg364447 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-17 16:55
> What do you think?

I would prefer to see a PR to give my opinion :)
msg364538 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-18 17:30
New changeset 1c60567b9a4c8f77e730de9d22690d8e68d7e5f6 by Dong-hee Na in branch 'master':
bpo-37207: Use PEP 590 vectorcall to speed up frozenset() (GH-19053)
https://github.com/python/cpython/commit/1c60567b9a4c8f77e730de9d22690d8e68d7e5f6
msg364808 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-22 16:03
Remaining issue: optimize list(iterable), PR 18928. I reviewed the PR and I'm waiting for Petr.
msg365307 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-30 12:16
New changeset ce105541f8ebcf2dffcadedfdeffdb698a0edb44 by Petr Viktorin in branch 'master':
bpo-37207: Use vectorcall for list() (GH-18928)
https://github.com/python/cpython/commit/ce105541f8ebcf2dffcadedfdeffdb698a0edb44
msg365309 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-30 12:18
All PRs are now merged. Thanks to everybody who was involved in this issue. It's a nice speedup which is always good to take ;-)
msg365385 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2020-03-31 12:43
The change to dict() was not covered by the smaller PRs.
That one will need more thought, but AFAIK it wasn't yet rejected.
msg365387 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-31 12:44
Oh sorry, I missed the dict.
msg365448 - (view) Author: Dong-hee Na (corona10) * (Python committer) Date: 2020-04-01 03:24
@vstinner @petr.viktorin

I 'd like to experiment dict vector call and finalize the work.
Can I proceed it?
msg365452 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2020-04-01 08:24
Definitely!
msg365488 - (view) Author: Dong-hee Na (corona10) * (Python committer) Date: 2020-04-01 15:39
+------------------+-------------------+-----------------------------+
| Benchmark        | master-dict-empty | bpo-37207-dict-empty        |
+==================+===================+=============================+
| bench dict empty | 502 ns            | 443 ns: 1.13x faster (-12%) |
+------------------+-------------------+-----------------------------+

+------------------+--------------------+-----------------------------+
| Benchmark        | master-dict-update | bpo-37207-dict-update       |
+==================+====================+=============================+
| bench dict empty | 497 ns             | 425 ns: 1.17x faster (-15%) |
+------------------+--------------------+-----------------------------+

+--------------------+---------------------+-----------------------------+
| Benchmark          | master-dict-kwnames | bpo-37207-dict-kwnames      |
+====================+=====================+=============================+
| bench dict kwnames | 1.38 us             | 917 ns: 1.51x faster (-34%) |
+--------------------+---------------------+-----------------------------+
msg365489 - (view) Author: Dong-hee Na (corona10) * (Python committer) Date: 2020-04-01 15:40
@vstinner @petr.viktorin

Looks like benchmark showing very impressive result.
Can I submit the patch?
msg365490 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2020-04-01 15:45
> Can I submit the patch?

Yes!

If you think a patch is ready for review, just submit it. There's not much we can comment on before we see the code :)

(I hope that doesn't contradict what your mentor says...)
msg365491 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-01 15:48
When I designed the FASTCALL calling convention, I experimented a new tp_fastcall slot to PyTypeObject to optimize __call__() method: bpo-29259.

Results on the pyperformance benchmark suite were not really convincing and I had technical issues (decide if tp_call or tp_fastcall should be called, handle ABI compatibility and backward compatibility, etc.). I decided to give up on this idea.

I'm happy to see that PEP 590 managed to find its way into Python internals and actually make Python faster ;-)
msg365545 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-02 00:55
New changeset e27916b1fc0364e3627438df48550c16f0b80b82 by Dong-hee Na in branch 'master':
bpo-37207: Use PEP 590 vectorcall to speed up dict() (GH-19280)
https://github.com/python/cpython/commit/e27916b1fc0364e3627438df48550c16f0b80b82
msg365546 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-02 00:56
Can we now close this issue? Or does someone plan to push further optimizations. Maybe new issues can be opened for next optimizations?
msg365553 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-02 01:31
> When I designed the FASTCALL calling convention, I experimented a new tp_fastcall slot to PyTypeObject to optimize __call__() method: bpo-29259.

Ah, by the way, I also made an attempt to use the FASTCALL calling convention for tp_new and tp_init: bpo-29358. Again, the speedup wasn't obvious and the implementation was quite complicated with many corner cases. So I gave up on this one. It didn't seem to be really worth it.
msg365811 - (view) Author: Dong-hee Na (corona10) * (Python committer) Date: 2020-04-05 05:15
IMHO, we can close this PR.

Summary:
The PEP 590 vectorcall is applied to list, tuple, dict, set, frozenset and range

If someone wants to apply PEP 590 to other cases.
Please open a new issue for it!

Thank you, Mark, Jeroen, Petr and everyone who works for this issue.
msg365907 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2020-04-07 14:44
As discussed briefly in Mark's PR, benchmarks like this are now slower:

    ret = dict(**{'a': 2, 'b': 4, 'c': 6, 'd': 8})


Python 3.8: Mean +- std dev: 281 ns +- 9 ns
    master: Mean +- std dev: 456 ns +- 14 ns
msg373095 - (view) Author: Ɓukasz Langa (lukasz.langa) * (Python committer) Date: 2020-07-06 11:22
New changeset b4a9263708cc67c98c4d53b16933f6e5dd07990f by Dong-hee Na in branch 'master':
bpo-37207: Update whatsnews for 3.9 (GH-21337)
https://github.com/python/cpython/commit/b4a9263708cc67c98c4d53b16933f6e5dd07990f
msg373116 - (view) Author: Dong-hee Na (corona10) * (Python committer) Date: 2020-07-06 13:32
New changeset 97558d6b08a656eae209d49b206f703cee0359a2 by Dong-hee Na in branch '3.9':
[3.9] bpo-37207: Update whatsnews for 3.9 (GH-21337)
https://github.com/python/cpython/commit/97558d6b08a656eae209d49b206f703cee0359a2
History
Date User Action Args
2020-07-06 13:32:13corona10setmessages: + msg373116
2020-07-06 12:59:40corona10setpull_requests: + pull_request20496
2020-07-06 11:22:31miss-islingtonsetpull_requests: + pull_request20494
2020-07-06 11:22:11lukasz.langasetnosy: + lukasz.langa
messages: + msg373095
2020-07-05 15:20:26corona10setpull_requests: + pull_request20485
2020-04-07 14:44:27petr.viktorinsetmessages: + msg365907
2020-04-05 05:15:46corona10setstatus: open -> closed
resolution: fixed
messages: + msg365811

stage: patch review -> resolved
2020-04-02 01:31:24vstinnersetmessages: + msg365553
2020-04-02 00:56:53vstinnersetmessages: + msg365546
2020-04-02 00:55:47vstinnersetmessages: + msg365545
2020-04-01 15:57:53corona10setstage: resolved -> patch review
pull_requests: + pull_request18637
2020-04-01 15:48:51vstinnersetmessages: + msg365491
2020-04-01 15:45:57petr.viktorinsetmessages: + msg365490
2020-04-01 15:40:56corona10setmessages: + msg365489
2020-04-01 15:39:44corona10setfiles: + bench_dict_update.py
2020-04-01 15:39:37corona10setfiles: + bench_dict_kwnames.py
2020-04-01 15:39:29corona10setfiles: + bench_dict_empty.py
2020-04-01 15:39:16corona10setmessages: + msg365488
2020-04-01 08:24:07petr.viktorinsetmessages: + msg365452
2020-04-01 03:24:32corona10setmessages: + msg365448
2020-03-31 12:44:38vstinnersetresolution: fixed -> (no value)
messages: + msg365387
2020-03-31 12:43:46petr.viktorinsetstatus: closed -> open

messages: + msg365385
2020-03-30 12:18:57vstinnersetstatus: open -> closed
versions: + Python 3.9
type: enhancement -> performance
messages: + msg365309

resolution: fixed
stage: patch review -> resolved
2020-03-30 12:16:25vstinnersetmessages: + msg365307
2020-03-22 16:03:11vstinnersetmessages: + msg364808
2020-03-22 04:57:09phsilvasetnosy: + phsilva
2020-03-18 17:30:53vstinnersetmessages: + msg364538
2020-03-18 01:34:59corona10setpull_requests: + pull_request18404
2020-03-17 16:55:43vstinnersetmessages: + msg364447
2020-03-17 13:58:06corona10setmessages: + msg364428
2020-03-16 17:17:41vstinnersetmessages: + msg364340
2020-03-16 14:06:32vstinnersetmessages: + msg364324
2020-03-16 14:04:24vstinnersetmessages: + msg364322
2020-03-15 14:27:55corona10setpull_requests: + pull_request18368
2020-03-14 08:26:35corona10setpull_requests: + pull_request18334
2020-03-13 16:56:57corona10setpull_requests: + pull_request18328
2020-03-13 13:57:15vstinnersetnosy: + vstinner
messages: + msg364095
2020-03-11 17:42:50corona10setnosy: + corona10
pull_requests: + pull_request18288
2020-03-11 15:11:38petr.viktorinsetnosy: + petr.viktorin
pull_requests: + pull_request18280
2020-02-18 15:13:24miss-islingtonsetmessages: + msg362219
2020-02-11 16:37:44petr.viktorinsetpull_requests: + pull_request17837
2019-09-12 11:48:13methanesetmessages: + msg352133
2019-08-15 15:49:52miss-islingtonsetnosy: + miss-islington
messages: + msg349809
2019-07-05 12:31:43jdemeyersetnosy: + jdemeyer
messages: + msg347336
2019-07-04 13:44:15jdemeyersetpull_requests: + pull_request14406
2019-07-04 11:11:10methanesetnosy: + methane
messages: + msg347272
2019-06-09 09:40:27Mark.Shannonsetkeywords: + patch
stage: patch review
pull_requests: + pull_request13796
2019-06-09 09:23:47Mark.Shannoncreate