This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Title: Optimize wrapper descriptors using FASTCALL
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.7
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: methane, pitrou, rhettinger, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2017-09-21 13:25 by vstinner, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 3685 closed vstinner, 2017-09-21 13:27
Messages (8)
msg302692 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-09-21 13:25
Attached pull request adds a fastpath for wrapper descriptors to use the FASTCALL calling convention. It's a follow up of bpo-31410 and all my work on FASTCALL during Python 3.6 and 3.7 development cycles.


./python -m perf timeit -s 'import array; obj=array.array("b"); wrap=array.array.__len__' 'wrap(obj)'


haypo@selma$ ./python -m perf compare_to ref.json patch.json 
Mean +- std dev: [ref] 59.2 ns +- 0.6 ns -> [patch] 28.2 ns +- 0.9 ns: 2.10x faster (-52%)

It removes 31 nanoseconds on such very fast C function, array_length().

Attached PR is still a work-in-progress. First I would like to know if it's worth it because working on polishing the actual code.
msg302697 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-09-21 15:44
> First I would like to know if it's worth it

I don't know. What's the point of optimizing `array.array.__len__(obj)`? People usually call `len(obj)` for that...
msg302709 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-09-21 16:46
> I don't know. What's the point of optimizing `array.array.__len__(obj)`? People usually call `len(obj)` for that...

Right, type.method(self) is less than common than self.method().

I looked at the stdlib. I found that the following method are called using wrapper descriptors:

* object.__repr__(self)
* object.__getattribute__(self, name)
* _asyncio.Future.__del__(self)
* int.__str__(obj)
* etc.

But it seems like such calls are rare compared to other kinds of function calls.


By the way, _PyMethodDescr_FastCallKeywords() is only called from call_function() in Python/ceval.c. It's not used in Objects/call.c. Maybe we should use it there as well? It seems like this is a question about tracing. But maybe we can copy/paste the code from call_function()?
msg302710 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-09-21 16:52
It optimizes the same cases as bpo-31410. bpo-31410 removed a half of the overhead for wrapper descriptors, Victor's patch removes the remaining half. Actually it makes calling the descriptor faster than calling the corresponding builtin! But bpo-31410 changes were simple, just few lines, and PR 3685 looks much more complex.

Are non-fast wrappers still needed? If the patch replaces the code instead of adding it this would decrease its cost. AFAIK the only descriptors that should support non-fast calling convention are __new__, __init__ and __call__.

As for practicality, this change should slightly speed up the code that directly calls __eq__, __lt__. For example classes decorated with total_ordering. Victor, can you provide microbenchmarks with more practical code?
msg302717 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-09-21 19:40
This seems like a straight-forward win.  I don't think we really need to see microbenchmarks before going forward with this one.
msg302724 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-09-21 21:09
The most complex slots are __new__, __init__ and __call__ because they
accept keywords. It's hard to design a calling convention which optimize
all cases because of the backward compatibility. The risk is to convert a
dict to args + kwnames (tuple) and then back to a dict: two expensive and
useless conversions.

It's my 3rd or 4th attempt to optimize new/init/call :-) My previous
attempt was to add new tp_fastXXX slots to types, but again thr backward
compatibilty was a major pain.

Here the scope is much small and backward compatibilty shouldn't affect us.

I will try to find a way to optimize these slots as well and complete my
msg302733 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-09-21 22:01
These slots are the only slots that accept keyword arguments and arbitrary number of positional parameters. Other slots can use fast calls.
msg306865 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-11-24 01:10
It seems like these specific descriptors are rare, so the added complexity is not worth it. I close my issue as rejected.
Date User Action Args
2022-04-11 14:58:52adminsetgithub: 75724
2017-11-24 01:10:38vstinnersetstatus: open -> closed
resolution: rejected
messages: + msg306865

stage: patch review -> resolved
2017-09-21 22:01:12serhiy.storchakasetmessages: + msg302733
2017-09-21 21:09:36vstinnersetmessages: + msg302724
2017-09-21 19:40:37rhettingersetnosy: + rhettinger
messages: + msg302717
2017-09-21 16:52:11serhiy.storchakasetmessages: + msg302710
2017-09-21 16:46:42vstinnersetmessages: + msg302709
2017-09-21 15:44:16pitrousetnosy: + pitrou
messages: + msg302697
2017-09-21 13:27:17vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request3674
2017-09-21 13:25:31vstinnercreate