This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients methane, serhiy.storchaka, vstinner
Date 2017-02-07.22:42:43
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1486507364.36.0.535614769835.issue29465@psf.upfronthosting.co.za>
In-reply-to
Content
pyobject_fastcall-4.patch combines a lot of changes. I wrote it to experiment _PyObject_FastCall().

I will rewrite these changes as a patch serie with smaller changes.

The design of PyObject_FastCall*() is to be short and simple.

Changes:

* Add _PyObject_FastCall(), _PyFunction_FastCall(), _PyCFunction_FastCall(): similar to the _PyObject_FastCallKeywords() but without keyword arguments. Without keyword arguments, the checks on keyword arguments can be removed, and it allows to reduce the stack usage: one less parameter to C functions.

* Move the slow path out of _PyObject_FastCallDict() and _PyObject_FastCallKeywords() in a new subfunction which is tagged with _Py_NO_INLINE to reduce the stack consumption of _PyObject_FastCallDict() and _PyObject_FastCallKeywords() in their fast paths, which are the most code paths.

* Move all "call" functions into a new Objects/call.c function. This change should help code placement to enhance the usage of the CPU caches (especially the CPU L1 instruction cache). In my long experience with benchmarking last year, I notice huge performance differences caused by code placement. See my blog post:
https://haypo.github.io/analysis-python-performance-issue.html
Sadly, _Py_HOT_FUNCTION didn't fix the issue completely.

* _PyObject_FastCallDict() and _PyObject_FastCallKeywords() "call" _PyObject_FastCall(). In fact, _PyObject_FastCall() is inlined manually in these functions, against, to minimize the stack usage. Similar change in _PyCFunction_FastCallDict() and PyCFunction_Call().

* Since the slow path is moved to a subfunction, I removed _Py_NO_INLINE from _PyStack_AsTuple() to allow the compiler to inline it if it wants to ;-) Since the stack usage is better with the patch, it seems like this strange has no negative impact on the stack usage.

* Optimize PyMethodDescr_Type (call _PyMethodDescr_FastCallKeywords()) in _PyObject_FastCallKeywords()

* I moved Py_EnterRecursiveCall() closer to the final function call, to simplify error handling. It would be nice to fix the issue #29306 before :-/

* I had to copy/paste the null_error() function from Objects/abstract.c. I don't think that it's worth it to add a _PyErr_NullError() function shared by abstract.c and call.c. Compilers are even able to merge duplicated functions ;-)

* A few other changes.
History
Date User Action Args
2017-02-07 22:42:44vstinnersetrecipients: + vstinner, methane, serhiy.storchaka
2017-02-07 22:42:44vstinnersetmessageid: <1486507364.36.0.535614769835.issue29465@psf.upfronthosting.co.za>
2017-02-07 22:42:44vstinnerlinkissue29465 messages
2017-02-07 22:42:43vstinnercreate