Author vstinner
Recipients methane, pablogsal, serhiy.storchaka, vstinner
Date 2019-12-05.16:00:30
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1575561631.45.0.417730594345.issue38980@roundup.psfhosted.org>
In-reply-to
Content
The Fedora packaging has been modified to compile libpython with -fno-semantic-interposition flag: it makes Python up to 1.3x faster without having to touch any line of the C code! See pyperformance results:
https://fedoraproject.org/wiki/Changes/PythonNoSemanticInterpositionSpeedup#Benefit_to_Fedora

The main drawback is that -fno-semantic-interposition prevents to override Python symbols using a custom library preloaded by LD_PRELOAD. For example, override PyErr_Occurred() function.

We (authors of the Fedora change) failed to find any use case for LD_PRELOAD.

To be honest, I found *one* user in the last 10 years who used LD_PRELOAD to track memory allocations in Python 2.7. This use case is no longer relevant in Python 3 with PEP 445 which provides a supported C API to override Python memory allocators or to install hooks on Python memory allocators. Moreover, tracemalloc is a nice way to track memory allocations.

Is there anyone aware of any special use of LD_PRELOAD for libpython?

To be clear: -fno-semantic-interposition only impacts libpython. All other libraries still respect LD_PRELOAD. For example, it is still possible to override glibc malloc/free.

Why -fno-semantic-interposition makes Python faster? There are multiple reasons. For of all, libpython makes a lot of function calls to libpython. Like really a lot, especially in the hot code paths. Without -fno-semantic-interposition, function calls to libpython requires to get through "interposition": for example "Procedure Linkage Table" (PLT) indirection on Linux. It prevents function inlining which has a major impact on performance (missed optimization). In short, even with PGO and LTO, libpython function calls have two performance "penalities":

* indirect function calls (PLT)
* no inlining

I'm comparing Python performance of "statically linked Python" (Debian/Ubuntu choice: don't use ./configure --enable-shared, python is not linked to libpython) to "dynamically linked Python" (Fedora choice: use "./configure --enable-shared", python is dynamically linked to libpython).

With -fno-semantic-interposition, function calls are direct and can be inlined when appropriate. You don't have to trust me, look at pyperformance benchmark results ;-)

When using ./configure --enable-shared (libpython), the "python" binary is exactly one function call and that's all:

int main(int argc, char **argv)
{ return Py_BytesMain(argc, argv); }

So 100% of the time is only spent in libpython.

For a longer rationale, see the accepted Fedora change:
https://fedoraproject.org/wiki/Changes/PythonNoSemanticInterpositionSpeedup
History
Date User Action Args
2019-12-05 16:00:31vstinnersetrecipients: + vstinner, methane, serhiy.storchaka, pablogsal
2019-12-05 16:00:31vstinnersetmessageid: <1575561631.45.0.417730594345.issue38980@roundup.psfhosted.org>
2019-12-05 16:00:31vstinnerlinkissue38980 messages
2019-12-05 16:00:30vstinnercreate