classification
Title: Compile libpython with -fno-semantic-interposition
Type: performance Stage: resolved
Components: Build Versions: Python 3.10
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: David Filiatrault, ammar2, corona10, cstratak, gregory.p.smith, hroncok, methane, pablogsal, petr.viktorin, serhiy.storchaka, shihai1991, tianon, vstinner
Priority: normal Keywords: patch

Created on 2019-12-05 16:00 by vstinner, last changed 2020-10-27 02:07 by vstinner. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 22862 merged pablogsal, 2020-10-21 17:30
PR 22892 merged petr.viktorin, 2020-10-22 14:26
Messages (22)
msg357856 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-12-05 16:00
The Fedora packaging has been modified to compile libpython with -fno-semantic-interposition flag: it makes Python up to 1.3x faster without having to touch any line of the C code! See pyperformance results:
https://fedoraproject.org/wiki/Changes/PythonNoSemanticInterpositionSpeedup#Benefit_to_Fedora

The main drawback is that -fno-semantic-interposition prevents to override Python symbols using a custom library preloaded by LD_PRELOAD. For example, override PyErr_Occurred() function.

We (authors of the Fedora change) failed to find any use case for LD_PRELOAD.

To be honest, I found *one* user in the last 10 years who used LD_PRELOAD to track memory allocations in Python 2.7. This use case is no longer relevant in Python 3 with PEP 445 which provides a supported C API to override Python memory allocators or to install hooks on Python memory allocators. Moreover, tracemalloc is a nice way to track memory allocations.

Is there anyone aware of any special use of LD_PRELOAD for libpython?

To be clear: -fno-semantic-interposition only impacts libpython. All other libraries still respect LD_PRELOAD. For example, it is still possible to override glibc malloc/free.

Why -fno-semantic-interposition makes Python faster? There are multiple reasons. For of all, libpython makes a lot of function calls to libpython. Like really a lot, especially in the hot code paths. Without -fno-semantic-interposition, function calls to libpython requires to get through "interposition": for example "Procedure Linkage Table" (PLT) indirection on Linux. It prevents function inlining which has a major impact on performance (missed optimization). In short, even with PGO and LTO, libpython function calls have two performance "penalities":

* indirect function calls (PLT)
* no inlining

I'm comparing Python performance of "statically linked Python" (Debian/Ubuntu choice: don't use ./configure --enable-shared, python is not linked to libpython) to "dynamically linked Python" (Fedora choice: use "./configure --enable-shared", python is dynamically linked to libpython).

With -fno-semantic-interposition, function calls are direct and can be inlined when appropriate. You don't have to trust me, look at pyperformance benchmark results ;-)

When using ./configure --enable-shared (libpython), the "python" binary is exactly one function call and that's all:

int main(int argc, char **argv)
{ return Py_BytesMain(argc, argv); }

So 100% of the time is only spent in libpython.

For a longer rationale, see the accepted Fedora change:
https://fedoraproject.org/wiki/Changes/PythonNoSemanticInterpositionSpeedup
msg357858 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-12-05 16:02
Maybe we need to offer a way to *opt out* from -fno-semantic-interposition. For example, ./configure --with-interposition. The default would be --without-interposition.
msg357860 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-12-05 16:13
I have seen people using LD_PRELOAD to interpose some auditing functions that can modify the actual call into libpython, or to interpose faster versions of some functions or to collect metrics (although there are better ways).

If we do this by default, once functions will be inlined these use cases will be broken.
msg357861 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-12-05 16:17
> If we do this by default, once functions will be inlined these use cases will be broken.

Could these user use a "./configure --with-interposition --enable-shared" build?
msg357862 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-12-05 16:21
> Could these user use a "./configure --with-interposition --enable-shared" build?

Sure, but the problem is the default value, no?

Maybe it should only be default when using --with-optimizations
msg357888 - (view) Author: Charalampos Stratakis (cstratak) * Date: 2019-12-05 21:05
> Maybe it should only be default when using --with-optimizations

I think it will add to the complexity of the --with-optimizations flag which already implies PGO and LTO.

Maybe an opt-in flag would be better IMHO.
msg357889 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-12-05 21:08
> I think it will add to the complexity of the --with-optimizations flag which already implies PGO and LTO.

That is why I was suggesting it: --with-optimizations for me means "activate everything that you can to make python faster".
msg357890 - (view) Author: Ammar Askar (ammar2) * (Python committer) Date: 2019-12-05 21:23
Just for a quick datapoint: llvm/clang do this by default and you need an explicit `-fsemantic-interposition` to disable it http://lists.llvm.org/pipermail/llvm-dev/2016-November/107625.html

It seems to me that the performance gains here really outweigh any weird usage of LD_PRELOAD.
msg357891 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-12-05 21:53
> It seems to me that the performance gains here really outweigh any weird usage of LD_PRELOAD.

I am very convinced of this assertion, but other users could not be, I think the discussion is how to provide/activate the option in the less intrusive way and without breaking too many use cases.

To be honest, I think it would be very rare for users to use LD_PRELOAD in this way, so I am fine if we activate it by default. But I still think it would be good to discuss these cases and take them into consideration :)
msg357905 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-12-06 08:48
In case of malloc, every memory allocating code need to use malloc/calloc/realloc. This is official and the only way to allocate a memory. But we do not guarantee that Python core uses only public C API like PyErr_Occurred(). It can use more low-level and efficient but less safer C API internally. It can replace the function with a macro which access internal structures directly (for compiling the core only). And this is actually the case. Overridding the public C API functions not always has an effect on the core.

So I think that adding -fno-semantic-interposition will likely not break many things which were not broken before.

But this should be discussed on Python-Dev. I am sure some C API functions are purposed to be overridden.
msg357926 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-12-06 17:31
> In case of malloc, every memory allocating code need to use malloc/calloc/realloc. This is official and the only way to allocate a memory. But we do not guarantee that Python core uses only public C API like PyErr_Occurred(). It can use more low-level and efficient but less safer C API internally. It can replace the function with a macro which access internal structures directly (for compiling the core only). And this is actually the case. Overridding the public C API functions not always has an effect on the core.

To confirm what you said: if we take the specific example of PyErr_Occurred(), I recently added a new _PyErr_Occurred() function which is declared as a static inline function. _PyErr_Occurred() cannot be overriden.

static inline PyObject* _PyErr_Occurred(PyThreadState *tstate)
{
    assert(tstate != NULL);
    return tstate->curexc_type;
}
msg358684 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-12-19 15:15
Pablo:
> I have seen people using LD_PRELOAD (...) to interpose faster versions of some functions or to collect metrics (although there are better ways).

IMHO if someone has to go so far into "hacking" Python, they should recompile Python with specific options. I'm not sure that using LD_PRELOAD to get "faster versions of some functions" is the best approach in term of performance, but I expect it to be convenient :-)


Charalampos:
> I think it will add to the complexity of the --with-optimizations flag which already implies PGO and LTO.

It doesn't enable LTO, only PGO :-) We had to disable LTO because of multiple compiler bugs last years.


Serhiy:
> I am sure some C API functions are purposed to be overridden.

Is it a theorical use case, or are you aware of such use case being currently used in the wild?


Ammar Askar:
> Just for a quick datapoint: llvm/clang do this by default and you need an explicit `-fsemantic-interposition` to disable it http://lists.llvm.org/pipermail/llvm-dev/2016-November/107625.html

Oh, that's really interesting, thanks!
msg372687 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-30 12:45
We wrote an article about -fno-semantic-interposition flag that we use with GCC on RHEL8 and Fedora:
https://developers.redhat.com/blog/2020/06/25/red-hat-enterprise-linux-8-2-brings-faster-python-3-8-run-speeds/
"Enabling this flag disables semantic interposition, which can increase run speed by as much as 30%."

In short, the flag allows the compiler to inline code and so make further optimizations, when Python is built with --enable-shared.
msg372877 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2020-07-02 20:05
Yes this should become part of --with-optimizations when building on a platform using a compiler that (a) supports it and (b) where it matters.

If this is only relevant on --enable-shared builds (not the default), i'd assume also make it conditional on that.

I never use --enable-shared builds myself.
msg372879 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2020-07-02 20:06
and to echo others: Do not worry about LD_PRELOAD users trying to override internals.  That is not a supported use case.  It is always a hack.  anyone using it knows this.
msg379184 - (view) Author: Ammar Askar (ammar2) * (Python committer) Date: 2020-10-21 02:50
Hey Victor, should we try to land this in Python 3.10? 

Given that no one has brought up any big concerns aside from LD_PRELOAD based hacks and how clang has already had this as the default I think it's relatively safe to make a default for with-optimizations.
msg379188 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2020-10-21 03:58
+1
msg379225 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-10-21 17:37
Victor is on vacation for some weeks, so I am creating a PR to push this forward.
msg379257 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-10-21 21:46
New changeset b451b0e9a772f009f4161f7a46476190d0d17ac1 by Pablo Galindo in branch 'master':
bpo-38980: Add -fno-semantic-interposition when building with optimizations (GH-22862)
https://github.com/python/cpython/commit/b451b0e9a772f009f4161f7a46476190d0d17ac1
msg379293 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2020-10-22 14:04
I was too eager in reviewing this :(
It turns out `-fno-semantic-interposition` is GCC 5.3, so [builds fail on older GCC](https://buildbot.python.org/all/#/builders/96/builds/216).

I'm researching how to make this conditional in autotools.
msg379310 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2020-10-22 16:12
New changeset c6d7e82d19c091af698d4e4b3623648e259843e3 by Petr Viktorin in branch 'master':
bpo-38980: Only apply -fno-semantic-interposition if available (GH-22892)
https://github.com/python/cpython/commit/c6d7e82d19c091af698d4e4b3623648e259843e3
msg379712 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-10-27 02:07
Since Fedora and RHEL build Python with -fno-semantic-interposition, we did not get any user bug report about the LD_PRELOAD use case. IMO we can safely consider that no user rely on LD_PRELOAD to override libpython symbols.

Thanks for implementing the feature Pablo and Petr!
History
Date User Action Args
2020-10-27 02:07:22vstinnersetmessages: + msg379712
2020-10-22 16:12:29petr.viktorinsetstatus: open -> closed
stage: patch review -> resolved
2020-10-22 16:12:01petr.viktorinsetmessages: + msg379310
2020-10-22 14:26:38petr.viktorinsetstage: resolved -> patch review
pull_requests: + pull_request21824
2020-10-22 14:04:12petr.viktorinsetstatus: closed -> open
nosy: + petr.viktorin
messages: + msg379293

2020-10-21 21:47:06pablogsalsetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2020-10-21 21:46:56pablogsalsetmessages: + msg379257
2020-10-21 17:37:14pablogsalsetmessages: + msg379225
2020-10-21 17:30:01pablogsalsetkeywords: + patch
stage: patch review
pull_requests: + pull_request21804
2020-10-21 03:58:04methanesetmessages: + msg379188
versions: + Python 3.10, - Python 3.9
2020-10-21 02:50:11ammar2setmessages: + msg379184
2020-07-02 20:06:53gregory.p.smithsetmessages: + msg372879
2020-07-02 20:05:45gregory.p.smithsetnosy: + gregory.p.smith
messages: + msg372877
2020-07-01 18:19:04tianonsetnosy: + tianon
2020-06-30 14:17:24corona10setnosy: + corona10
2020-06-30 14:03:44shihai1991setnosy: + shihai1991
2020-06-30 12:45:35vstinnersetmessages: + msg372687
2020-01-28 19:21:11David Filiatraultsetnosy: + David Filiatrault
2019-12-19 15:15:21vstinnersetmessages: + msg358684
2019-12-06 17:31:06vstinnersetmessages: + msg357926
2019-12-06 08:48:39serhiy.storchakasetmessages: + msg357905
2019-12-05 21:53:17pablogsalsetmessages: + msg357891
2019-12-05 21:23:06ammar2setnosy: + ammar2
messages: + msg357890
2019-12-05 21:08:43pablogsalsetmessages: + msg357889
2019-12-05 21:05:01cstrataksetnosy: + cstratak
messages: + msg357888
2019-12-05 16:21:59pablogsalsetmessages: + msg357862
2019-12-05 16:17:59vstinnersetmessages: + msg357861
2019-12-05 16:13:47pablogsalsetmessages: + msg357860
2019-12-05 16:06:11hroncoksetnosy: + hroncok
2019-12-05 16:02:37vstinnersetmessages: + msg357858
2019-12-05 16:00:31vstinnercreate