New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tp_fastnew and tp_fastinit to PyTypeObject, 15-20% faster object instanciation #73544
Comments
After bpo-29259 "Add tp_fastcall to PyTypeObject: support FASTCALL calling convention for all callable objects", the two last slots which still use the (args: tuple, kwargs: dict) calling convention are tp_new and tp_init, two major slots to instanciate objects. I implemented tp_fastnew/tp_fastinit on top of the issue bpo-29259 pull request (tp_fastcall). The implementation is a WIP, it's just complete enough to start benchmarking the code. Example of benchmarks on the two types currently optimized in my WIP fast_init branch, list and _asyncio.Future: haypo@smithers$ ./python -m perf timeit -s 'List=list' 'List((1,2,3))' --duplicate=100 --compare-to=../default-ref/python haypo@smithers$ ./python -m perf timeit -s 'import _asyncio, asyncio; Future=_asyncio.Future; loop=asyncio.get_event_loop()' 'Future(loop=loop)' --compare-to=../default-ref/python The speedup of tp_fastnew + tp_fastinit is between 1.16x faster and 1.28x faster. The question is now if it is worth it. Warning: The code is not fully optimized and is likely to have subtle bugs. The pull request is not ready for a review, but you may take a look if you would like to have an idea of the required changes. The most tricky changes are made in typeobject.c to support backward compatibility (call tp_new if tp_fastnew is not set) and stable API (support Python 3.6 PyTypeObject without the two slots). Note: tp_fastnew and tp_fastinit slots are the expected end of my large FASTCALL optimization project. |
Serhiy Storchaka: "Calling _PyStack_AsDict() with non-string or non-unique keywords means that the var-keyword argument was first unpacked to arrays of keyword names and values and then converted back to a dict. Seems something is done non-efficiently." (msg286159 of the issue bpo-29360) Python code from test_dict: dict(**invalid) Bytecode:
Call stack:
Oh right, there are two conversions using a temporary FASTCALL format for (keyword) arguments. This code was not upstream yet, it comes from the pull request of this issue. Maybe we need two flavors of type_call(): type_call(args: tuple, kwargs: tuple) if tp_fastinit isn't set, type_fastcall(stack, nargs, kwnames) (FASTCALL) if tp_fastinit is set. But it seems that the logic should be implemented in PyObject_Call() and _PyObject_FastCallDict(), it cannot be implemented in type_call(), it's too late. For best performances (avoid any kind of conversion and avoid any temporary object), we should implement something like that. |
Great. Can you share the Richards benchmark results? It specifically stresses class instantiation, so we should be able to see the improvement. |
So just to be clear, the issue with non-string or non-unique keywords is purely about performance, right, not correctness/non-crashiness? Non-string keywords, while technically accepted by CPython, are at best barely legal by the language standard. The language standard for calls specifies that even using the **expression syntax "expression must evaluate to a mapping, the contents of which are treated as additional keyword arguments" and keywords_arguments only allows identifier=expression, where identifier is a legal variable name (and therefore, a str). If misuse of ** unpacking is slower, but still works, and correct usage is significantly faster, I wouldn't consider an inability to fix the performance for the misuse case a blocking issue; nice to have, but making common, good code faster is worth making rare, bad code slower. |
First benchmark result on the fast_init compared to the default branch. Warning: the patch is still a WIP, there are maybe performance issues (in this case, this benchmark should help to identify them).
Faster (19):
Benchmark hidden because not significant (34): (...) Reminder: The fast_init branch is based on the issue bpo-29259 (tp_fastcall), so the result combines tp_fastcall and tp_fastnew/fastinit. |
I implemented more optimizations:
As a result, create an _asyncio.Future is now 1.45x faster: haypo@smithers$ ./python -m perf timeit -s 'import _asyncio, asyncio; Future=_asyncio.Future; loop=asyncio.get_event_loop()' 'Future(loop=loop)' --compare-to=../default-ref/python Yury> Great. Can you share the Richards benchmark results? It specifically stresses class instantiation, so we should be able to see the improvement. While bm_richards.py doesn't call _PyStack_AsTuple() anyone in the benchmark (it seems like it full uses FASTCALL), it seems like there is simply zero impact on performance :-( $ ./python -m perf compare_to richards_ref.json richards_fastinit.json
Benchmark hidden because not significant (1): richards |
New changeset 8feb6a5ce4c6 by Victor Stinner in branch 'default': |
While they are obvious speedup on microbenchmarks, it doesn't speedup "macro" benchmarks from performance as much as I expected. The changes required in typeobject.c to support "tp_new or tp_fastnew" and "tp_init or tp_fastinit" are large and very complex. I'm not sure that my code is correct. update_one_slot(), add_operators() and PyType_Ready() functions are complex beast, sadly almost not documented, except such comment...
How am I supposed to be confident in front of such coment :-D I expect that calling a functions (tp_call) is a more common operation than instanciating a new object (tp_new + tp_init). So I don't think that the overall change is worth it. For all these reaons, I close the issue as REJECTED. |
Serhiy Storchaka:
Oh, thanks for the pointers! Now I understand much better these bugs. I'm quite sure that they are still flaws in this code when a type is modified after PyType_Ready(), like sysmodule.c::
But each time I had to dig into typeobject.c, my head is going to explode :-D I may try to understand one more time, but not today ;-) |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: