classification
Title: SOABI on Linux does not distinguish between GNU libc and musl libc
Type: behavior Stage: patch review
Components: Interpreter Core Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, Henry Schreiner, christian.heimes, eric.araujo, h-vetinari, ncopa, olivierlefloch, piro, tianon, uranusjr
Priority: normal Keywords: patch

Created on 2021-02-03 09:26 by ncopa, last changed 2021-11-24 17:23 by Henry Schreiner.

Pull Requests
URL Status Linked Edit
PR 24502 open ncopa, 2021-02-10 17:52
Messages (14)
msg386183 - (view) Author: Natanael Copa (ncopa) * Date: 2021-02-03 09:26
The SOABI does not make any difference between GNU libc and musl libc.

Using official docker images:

# debian build with GNU libc
$ docker run --rm python:slim python -c  'import sysconfig;print(sysconfig.get_config_var("SOABI"))'
cpython-39-x86_64-linux-gnu

# alpine build with musl libc
$ docker run --rm python:alpine python -c  'import sysconfig;print(sysconfig.get_config_var("SOABI"))'
cpython-39-x86_64-linux-gnu


Both ends with `-gnu`, while it would be expected that with musl it would end with `-musl`

This affects the extension suffix:

$ docker run --rm python:slim python-config --extension-suffix
.cpython-39-x86_64-linux-gnu.so

$ docker run --rm python:alpine python-config --extension-suffix
.cpython-39-x86_64-linux-gnu.so

Which again affects the pre-compiled binary wheels, and binary modules built with musl libc gets mixed up with the GNU libc modules due to the -gnu.so suffix.

The source of the problem is that the `configure.ac` file assumes that all defined(__linux__) is -gnu when detecting the PLATFORM_TRIPLET.

```
...
#if defined(__ANDROID__)
    # Android is not a multiarch system.
#elif defined(__linux__)
# if defined(__x86_64__) && defined(__LP64__)
        x86_64-linux-gnu
# elif defined(__x86_64__) && defined(__ILP32__)
        x86_64-linux-gnux32
# elif defined(__i386__)
...
```

So when building python with musl libc the PLATFORM_TRIPLET always sets `*-linux-gnu`.

output from configure run on musl system:
```
...
checking for a sed that does not truncate output... /bin/sed                                                      
checking for --with-cxx-main=<compiler>... no                                                                     
checking for the platform triplet based on compiler characteristics... x86_64-linux-gnu  
...
```

A first step in fixing this would be to make sure that we only set -gnu when __GLIBC__ is defined:
```diff
diff --git a/configure.ac b/configure.ac
index 1f5a008388..1b4690c90f 100644
--- a/configure.ac
+++ b/configure.ac
@@ -726,7 +726,7 @@ cat >> conftest.c <<EOF
 #undef unix
 #if defined(__ANDROID__)
     # Android is not a multiarch system.
-#elif defined(__linux__)
+#elif defined(__linux__) && defined (__GLIBC__)
 # if defined(__x86_64__) && defined(__LP64__)
         x86_64-linux-gnu
```

But that would make build with musl fail with "unknown platform triplet".

Not sure what the proper fix would be, but one way to extract the suffix from `$CPP -dumpmachine`
msg386184 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-02-03 09:47
The suffix "-gnu" does not stand for "glibc".

The triplet defines the calling convention. For example x86_64-linux-gnu means x86_64 / AMD64 CPU architecture, Linux, with standard GNU / GCC calling convention. Other calling conventions are "x86_64-linux-gnux32" for X32 on AMD64 and "arm-linux-gnueabihf" for 32bit ARM with extended ABI and hardware float support.

The triplets are standardized beyond Python. Debian's multiarch page lists and explains a large amount of triplets, https://wiki.debian.org/Multiarch/Tuples
msg386186 - (view) Author: Natanael Copa (ncopa) * Date: 2021-02-03 10:41
Does this mean that the SOABI should be the same for python built with musl libc and GNU libc?

They are not really ABI compatible.
msg386192 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-02-03 11:02
SOABI basically contains the CPU architecture and Kernel ABI. The libc ABI is yet another dimension that is not encoded in the shared library ABI. 

The libc ABI is more complex than just glibc or musl. You need to include the ABI version of all core components. For example manylinux2014 defines the ABI for glibc as GLIBC_2.17, CXXABI_1.3.7, CXXABI_TM_1, GLIBCXX_3.4.19, GCC_4.8.0.

As a rule of thumb, a SOABI like ".cpython-39-x86_64-linux-gnu.so" only works the current host. You cannot safely move the file to another host or bump the SO version of any library, unless you ensure that the ABIs of all libraries are compatible.
msg386194 - (view) Author: Natanael Copa (ncopa) * Date: 2021-02-03 11:03
The referenced https://wiki.debian.org/Multiarch/Tuples doc says:

> we require unique identifiers for each architecture that identifies an incompatible set of libraries that we want to be co-installed.

Since GNU libc and musl libc are not ABI compatible they can not share same unique identifier. I think replacing -gnu with -musl makes sense.
msg386196 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-02-03 11:21
Do you have glibc and musl installed side by side?
msg386197 - (view) Author: Natanael Copa (ncopa) * Date: 2021-02-03 11:53
> Do you have glibc and musl installed side by side?

No. But there is nothing preventing me to have the libc runtimes installed in parallel with glibc.

/lib/libc.so.6
/lib/libc.musl-x86_64.so.1

And it is not common that people copy libc.so.6 (with friends) to their alpine docker images to run both in same container. If that is a good idea is other discussion.


I do understand that full ABI compatibility also may involve libc ABI version, but I think that is a slightly different problem. Newer versions of glibc and musl libc are backwards compatible. You can expect a binary built with old libc version to run with new libc. But you cannot expect a binary built with musl libc to run with gnu libc.

gcc recognizes -linux-musl as a valid platform tuple different that differs from -linux-gnu:
https://github.com/gcc-mirror/gcc/blob/master/gcc/config/t-musl

The standard autotools' config.guess[1] also recognizes -musl as different platform. 

  $ ./config.guess 
  x86_64-pc-linux-musl

[1]: https://github.com/python/cpython/blob/12d0a7642fc552fa17b1608fe135306cddec5f4e/config.guess#L158

So I think it makes sense to treat *-linux-musl as a different platform than *-linux-gnu.

If you still insist that this is only about calling convention and not platform, then I think you should at least clarify that in the configure.ac script to avoid confusion:

  sed -i -e 's/PLATFORM_TRIPLET/CALLING_CONVENTION_TRIPLET/g' -e 's/platform triplet/calling convention triplet/g' configure.ac
msg386710 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-02-09 11:52
I stand corrected. The last element in the platform triplet does seem to indicate libc.

Is there any formal definition of the platform triplet or is it defined by GCC's reference implementation? A quick search didn't reveal any decisive results.

The next steps here would be:
- document the platform triplet in regards of musl libc (and potentially other libcs like uclibc and embedded newlibc)
- buildbot with Alpine musl

Let's continue this on https://discuss.python.org/t/wheels-for-musl-alpine/7084
msg386722 - (view) Author: Natanael Copa (ncopa) * Date: 2021-02-09 14:37
This mentions some examples for musl triplets/tuples:
https://wiki.musl-libc.org/getting-started.html

It points to https://github.com/richfelker/musl-cross-make/blob/master/README.md#supported-targets which I think is the best documentation. (Rich Felker is the author and lead developer or musl libc).

I'm not sure its worth spend time on uClibc which does not seem to have any commits at all since 2015. https://git.uclibc.org/uClibc/log/

We will work on our side on buildbot with Alpine linux.

Thank you for being understanding.
msg386739 - (view) Author: Arfrever Frehtes Taifersar Arahesis (Arfrever) * (Python triager) Date: 2021-02-09 19:26
While original uClibc is not maintained, its fork called uClibc-ng is supposedly binary-compatible with uClibc and is still somehow maintained:
https://uclibc-ng.org/
https://repo.or.cz/w/uclibc-ng.git

https://www.uclibc.org/news.html also says:
"While uClibc releases are on hold, you may use uClibc-ng"

libc part of platform tuple on uClibc-ng systems is expected to be "uclibc".
msg386797 - (view) Author: Natanael Copa (ncopa) * Date: 2021-02-10 18:09
I created a PR for this https://github.com/python/cpython/pull/24502
msg405471 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2021-11-01 21:24
If this is deemed a bugfix, the PR should be backported.
msg406858 - (view) Author: Tzu-ping Chung (uranusjr) * Date: 2021-11-23 16:57
Can anyone problem examples that it’s not an option to continue using the “technically incorrect” `-gnu` suffix on 3.9 and 3.10? From what I understand, te suffix technically works (as in the module will load correctly), it just fails to distinguish the ABI in the name.

If that’s correct, I feel “being able to distinguish between modules built against musl and glibc” should be a feature request and only implemented for 3.11+, while versions 3.10 and prior continue to use `-gnu`. This will also provide a simpler way out of the wheel compatibility problem; projects can distribute different wheels for 3.10 (or lower) and 3.11 (or higher), while the former wheel continues to contain `-gnu`-suffixed modules, and only contain `-musl`-suffixed ones in the latter.
msg406939 - (view) Author: Henry Schreiner (Henry Schreiner) * Date: 2021-11-24 17:23
We had a call and have a potential path forward. Quick summary:

* Add a patch on top of the current patch to make CPython look for `-gnu` on top of `-musl` for Alpine 3.15 and 3.14. Reverting the patch would break every Alpine wheel previously locally compiled (like NumPy) and would require rebuilding all shipped packages that depend on Python.
* Revert the patch for CPython 3.10 in Alpine 3.16, due mid next year.
* Take the existing patch (PR 24502) targeting upstream CPython 3.11 and change search to include `abi3-gnu` on musl after looking for `abi3-musl`. The ability to install both binaries into a single folder would be a new "feature" of CPython 3.11.
* Optionally this could be checked and normalized by auditwheel (like changing `-musl` to `-gnu` on 3.9) if desired. ABI3 wheels targeting <3.11 could be normalized to `-gnu`.

How does that sound?
History
Date User Action Args
2021-11-24 17:23:42Henry Schreinersetnosy: + Henry Schreiner
messages: + msg406939
2021-11-23 16:57:20uranusjrsetnosy: + uranusjr
messages: + msg406858
2021-11-23 12:15:21pirosetnosy: + piro
2021-11-01 21:24:19eric.araujosetversions: + Python 3.9, Python 3.10, Python 3.11
nosy: + eric.araujo

messages: + msg405471

type: behavior
2021-02-10 18:09:55ncopasetmessages: + msg386797
2021-02-10 17:52:30ncopasetkeywords: + patch
stage: patch review
pull_requests: + pull_request23292
2021-02-10 11:10:23h-vetinarisetnosy: + h-vetinari
2021-02-09 19:26:49Arfreversetnosy: + Arfrever
messages: + msg386739
2021-02-09 18:38:34olivierleflochsetnosy: + olivierlefloch
2021-02-09 16:08:56tianonsetnosy: + tianon
2021-02-09 14:37:52ncopasetmessages: + msg386722
2021-02-09 11:52:28christian.heimessetmessages: + msg386710
2021-02-03 11:53:09ncopasetmessages: + msg386197
2021-02-03 11:21:34christian.heimessetmessages: + msg386196
2021-02-03 11:03:39ncopasetmessages: + msg386194
2021-02-03 11:02:22christian.heimessetmessages: + msg386192
2021-02-03 10:41:15ncopasetmessages: + msg386186
2021-02-03 09:47:54christian.heimessetnosy: + christian.heimes
messages: + msg386184
2021-02-03 09:26:13ncopacreate