Message357856
The Fedora packaging has been modified to compile libpython with -fno-semantic-interposition flag: it makes Python up to 1.3x faster without having to touch any line of the C code! See pyperformance results:
https://fedoraproject.org/wiki/Changes/PythonNoSemanticInterpositionSpeedup#Benefit_to_Fedora
The main drawback is that -fno-semantic-interposition prevents to override Python symbols using a custom library preloaded by LD_PRELOAD. For example, override PyErr_Occurred() function.
We (authors of the Fedora change) failed to find any use case for LD_PRELOAD.
To be honest, I found *one* user in the last 10 years who used LD_PRELOAD to track memory allocations in Python 2.7. This use case is no longer relevant in Python 3 with PEP 445 which provides a supported C API to override Python memory allocators or to install hooks on Python memory allocators. Moreover, tracemalloc is a nice way to track memory allocations.
Is there anyone aware of any special use of LD_PRELOAD for libpython?
To be clear: -fno-semantic-interposition only impacts libpython. All other libraries still respect LD_PRELOAD. For example, it is still possible to override glibc malloc/free.
Why -fno-semantic-interposition makes Python faster? There are multiple reasons. For of all, libpython makes a lot of function calls to libpython. Like really a lot, especially in the hot code paths. Without -fno-semantic-interposition, function calls to libpython requires to get through "interposition": for example "Procedure Linkage Table" (PLT) indirection on Linux. It prevents function inlining which has a major impact on performance (missed optimization). In short, even with PGO and LTO, libpython function calls have two performance "penalities":
* indirect function calls (PLT)
* no inlining
I'm comparing Python performance of "statically linked Python" (Debian/Ubuntu choice: don't use ./configure --enable-shared, python is not linked to libpython) to "dynamically linked Python" (Fedora choice: use "./configure --enable-shared", python is dynamically linked to libpython).
With -fno-semantic-interposition, function calls are direct and can be inlined when appropriate. You don't have to trust me, look at pyperformance benchmark results ;-)
When using ./configure --enable-shared (libpython), the "python" binary is exactly one function call and that's all:
int main(int argc, char **argv)
{ return Py_BytesMain(argc, argv); }
So 100% of the time is only spent in libpython.
For a longer rationale, see the accepted Fedora change:
https://fedoraproject.org/wiki/Changes/PythonNoSemanticInterpositionSpeedup |
|
Date |
User |
Action |
Args |
2019-12-05 16:00:31 | vstinner | set | recipients:
+ vstinner, methane, serhiy.storchaka, pablogsal |
2019-12-05 16:00:31 | vstinner | set | messageid: <1575561631.45.0.417730594345.issue38980@roundup.psfhosted.org> |
2019-12-05 16:00:31 | vstinner | link | issue38980 messages |
2019-12-05 16:00:30 | vstinner | create | |
|