Message263995
Changes of my current implementation, ad4a53ed1fbf.diff.
The good thing is that all changes are internals (really?). Even if you don't modify your C extensions (nor your Python code), you should benefit of the new fast call is *a lot* of cases.
IMHO the best tricky part are changes on the PyTypeObject. Is it ok to add a new tp_fastcall slot? Should we add even more slots using the fast call convention like tp_fastnew and tp_fastinit? How should we handle the inheritance of types with that?
(*) Add 2 new public functions:
PyObject* PyObject_CallNoArg(PyObject *func);
PyObject* PyObject_CallArg1(PyObject *func, PyObject *arg);
(*) Add 1 new private function:
PyObject* _PyObject_FastCall(PyObject *func, PyObject **stack, int na, int nk);
_PyObject_FastCall() is the root of the new feature.
(*) type: add a new "tp_fastcall" field to the PyTypeObject structure.
It's unclear to me how inheritance is handled here. Maybe it's simply broken, but it's strange because it looks like it works :-) Maybe it's very rare that tp_call is overidden in a child class?
TODO: maybe reuse the "tp_call" field? (risk of major backward incompatibility...)
(*) slots: add a new "fastwrapper" field to the wrappercase structure. Add a fast wrapper to all slots (really all? i should check).
I don't think that consumers of the C API are of this change, or maybe only a few projects.
TODO: maybe remove "fastwrapper" and reuse the "wrapper" field? (low risk of backward compatibility?)
(*) Implement fast call for Python function (_PyFunction_FastCall) and C functions (PyCFunction_FastCall)
(*) Add a new METH_FASTCALL calling convention for C functions. Right now, it is used for 4 builtin functions: sorted(), getattr(), iter(), next().
Argument Clinic should be modified to emit C code using this new fast calling convention.
(*) Implement fast call in the following functions (types):
- method()
- method_descriptor()
- wrapper_descriptor()
- method_wrapper()
- operator.itemgetter => used by collections.namedtuple to get an item by its name
(*) Modify PyObject_Call*() functins to reuse internally the fast call. "tp_fastcall" is preferred over "tp_call" (FIXME: is it really useful to do that?).
The following functions are able to avoid temporary tuple/dict without having to modify the code calling them:
- PyObject_CallFunction()
- PyObject_CallMethod(), _PyObject_CallMethodId()
- PyObject_CallFunctionObjArgs(), PyObject_CallMethodObjArgs()
It's not required to modify code using these functions to use the 3 new shiny functions (PyObject_CallNoArg, PyObject_CallArg1, _PyObject_FastCall). For example, replacing PyObject_CallFunctionObjArgs(func, NULL) with PyObject_CallNoArg(func) is just a micro-optimization, the tuple is already avoided. But PyObject_CallNoArg() should use less memory of the C stack and be a "little bit" faster.
(*) Add new helpers: new Include/pystack.h file, Py_VaBuildStack(), etc.
Please ignore unrelated changes. |
|
Date |
User |
Action |
Args |
2016-04-22 11:10:16 | vstinner | set | recipients:
+ vstinner, rhettinger, larry, serhiy.storchaka, yselivanov |
2016-04-22 11:10:16 | vstinner | set | messageid: <1461323416.21.0.672697562575.issue26814@psf.upfronthosting.co.za> |
2016-04-22 11:10:16 | vstinner | link | issue26814 messages |
2016-04-22 11:10:16 | vstinner | create | |
|