Message280097
When analyzing results of Python performance benchmarks, I noticed that call_method was 70% slower (!) between revisions 83877018ef97 (Oct 18) and 3e073e7b4460 (Oct 22), including these revisions, on the speed-python server.
On these revisions, the CPU L1 instruction cache is less efficient: 8% cache misses, whereas it was only 0.06% before and after these revisions.
Since the two mentioned revisions have no obvious impact on the call_method() benchmark, I understand that the performance difference by a different layout of the machine code, maybe the exact location of functions.
IMO the best solution to such compilation issue is to use PGO compilation. Problem: PGO doesn't work on Ubuntu 14.04, the OS used by speed-python (the server runining benchmarks for http://speed.python.org/).
I propose to decorate manually the "hot" functions using the GCC __attribute__((hot)) decorator:
https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes
(search for "hot")
Attached patch adds Py_HOT_FUNCTION and decorates the following functions:
* _PyEval_EvalFrameDefault()
* PyFrame_New()
* call_function()
* lookdict_unicode_nodummy()
* _PyFunction_FastCall()
* frame_dealloc()
These functions are the top 6 according to the Linux perf tool when running the call_simple benchmark of the performance project:
32,66%: _PyEval_EvalFrameDefault
13,09%: PyFrame_New
12,78%: call_function
12,24%: lookdict_unicode_nodummy
9,85%: _PyFunction_FastCall
8,47%: frame_dealloc |
|
Date |
User |
Action |
Args |
2016-11-05 00:29:04 | vstinner | set | recipients:
+ vstinner |
2016-11-05 00:29:04 | vstinner | set | messageid: <1478305744.67.0.856588872906.issue28618@psf.upfronthosting.co.za> |
2016-11-05 00:29:04 | vstinner | link | issue28618 messages |
2016-11-05 00:29:04 | vstinner | create | |
|