This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients vstinner
Date 2016-11-05.00:29:03
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
When analyzing results of Python performance benchmarks, I noticed that call_method was 70% slower (!) between revisions 83877018ef97 (Oct 18) and 3e073e7b4460 (Oct 22), including these revisions, on the speed-python server.

On these revisions, the CPU L1 instruction cache is less efficient: 8% cache misses, whereas it was only 0.06% before and after these revisions.

Since the two mentioned revisions have no obvious impact on the call_method() benchmark, I understand that the performance difference by a different layout of the machine code, maybe the exact location of functions.

IMO the best solution to such compilation issue is to use PGO compilation. Problem: PGO doesn't work on Ubuntu 14.04, the OS used by speed-python (the server runining benchmarks for

I propose to decorate manually the "hot" functions using the GCC __attribute__((hot)) decorator:
(search for "hot")

Attached patch adds Py_HOT_FUNCTION and decorates the following functions:

* _PyEval_EvalFrameDefault()
* PyFrame_New()
* call_function()
* lookdict_unicode_nodummy()
* _PyFunction_FastCall()
* frame_dealloc()

These functions are the top 6 according to the Linux perf tool when running the call_simple benchmark of the performance project:

32,66%: _PyEval_EvalFrameDefault
13,09%: PyFrame_New
12,78%: call_function
12,24%: lookdict_unicode_nodummy
 9,85%: _PyFunction_FastCall
 8,47%: frame_dealloc
Date User Action Args
2016-11-05 00:29:04vstinnersetrecipients: + vstinner
2016-11-05 00:29:04vstinnersetmessageid: <>
2016-11-05 00:29:04vstinnerlinkissue28618 messages
2016-11-05 00:29:04vstinnercreate