This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients methane, serhiy.storchaka, vstinner
Date 2017-01-13.12:33:20
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1484310801.82.0.291994874073.issue29259@psf.upfronthosting.co.za>
In-reply-to
Content
I started to work on FASTCALL, because I dislike the "cached tuple" hack used in some performance critical code, and the hack causes various kinds of tricky but severe issues (can lead to segfault).

Thanks to tp_fastcall, it becomes possible to drop the "cached tuple" hack from property_descr_get() *and* keep good performances.

First, a benchmark to show the performance gain of using "cached tuple". I modified property_descr_get() to use Python 3.4 code which doesn't have the optimization:

$ ./python -m perf compare_to py34.json ref.json 
Median +- std dev: [py34] 75.0 ns +- 1.7 ns -> [ref] 50.0 ns +- 0.9 ns: 1.50x faster (-33%)

It's MUCH faster, good job. But it requires complex and fragile code. Ok, let's see with operator.itemgetter() supporting tp_fastcall, Python modified to use tp_fastcall and without the "cached arg" hack:

$ ./python -m perf compare_to ref.json fastcall_wrapper.json 
Median +- std dev: [ref] 50.0 ns +- 0.9 ns -> [fastcall_wrapper] 48.2 ns +- 1.5 ns: 1.04x faster (-4%)

It's a little bit faster, but that's not the point. The point is that it isn't slower and it doesn't require to modify C code to benefit of the optimization! Just to be clear, another benchmark result on property_descr_get() without "cache args", without fastcall (py34) and with fastcall ("fastcall_wrapper"):

$ ./python -m perf compare_to py34.json fastcall_wrapper.json 
Median +- std dev: [py34] 75.0 ns +- 1.7 ns -> [fastcall_wrapper] 48.2 ns +- 1.5 ns: 1.56x faster (-36%)

Summary:

* tp_fastcall avoids to remove the "cached args" hack which will fix severe issue in corner cases
* tp_fastcall makes existing code faster for free. I mean, builtin types should be modified to support tp_fastcall, most all code *calling* these types don't need any change.
History
Date User Action Args
2017-01-13 12:33:21vstinnersetrecipients: + vstinner, methane, serhiy.storchaka
2017-01-13 12:33:21vstinnersetmessageid: <1484310801.82.0.291994874073.issue29259@psf.upfronthosting.co.za>
2017-01-13 12:33:21vstinnerlinkissue29259 messages
2017-01-13 12:33:20vstinnercreate