Message286314
"While I feel your work is great, performance benefit seems very small,
compared complexity of this patch."
I have to agree. I spent a lot of times on benhchmarking these tp_fast* changes. While one or two benchmarks are faster, it's not really the case for the others.
I also agree with the complexity. In Python 3.6, most FASTCALL changes were internals. For example, using PyObject_CallFunctionObjArgs() now uses FASTCALL internally, without having to modify callers of the API. I tried to only use _PyObject_FastCallDict/Keywords() in a few places where the speedup was significant.
The main visible change of Python 3.6 FASTCALL is the new METH_CALL calling convention for C function. Your change modifying print() to use METH_CALL has a significant impact on the telco benchmark, without no drawback. I tested further changes to use METH_FASTCALL in struct and decimal modules, and they optimize telco even more.
To continue the optimization work, I guess that using METH_CALL in more cases, using Argument Clinic whenever possible, would have a more concrete and measurable impact on performances, than this big tp_fastcall patch.
But I'm not ready to abandon the whole approach yet, so I change the status to Pending. I may come back in one or two months, to check if I didn't miss anything obvious to unlock even more optimizations ;-) |
|
Date |
User |
Action |
Args |
2017-01-26 14:08:49 | vstinner | set | recipients:
+ vstinner, methane, python-dev, serhiy.storchaka |
2017-01-26 14:08:49 | vstinner | set | messageid: <1485439729.14.0.66968452831.issue29259@psf.upfronthosting.co.za> |
2017-01-26 14:08:49 | vstinner | link | issue29259 messages |
2017-01-26 14:08:48 | vstinner | create | |
|