classification
Title: MacOS: Python binaries not portable between Catalina and Big Sur
Type: behavior Stage: needs patch
Components: ctypes, macOS Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: bergkvist, ned.deily, ronaldoussoren
Priority: normal Keywords:

Created on 2021-07-20 21:56 by bergkvist, last changed 2021-07-22 18:12 by bergkvist.

Pull Requests
URL Status Linked Edit
PR 27251 open bergkvist, 2021-07-20 21:56
Messages (7)
msg397916 - (view) Author: Tobias Bergkvist (bergkvist) * Date: 2021-07-20 21:56
Python-binaries compiled on either Big Sur or Catalina - and moved to the other MacOS-version will not work as expected when code depends on ctypes.util.find_library.

Example symptom of this issue: https://github.com/jupyterlab/jupyterlab/issues/9863
I have personally faced this when using Python from nixpkgs - since nixpkgs Python has been built on Catalina - and I'm using Big Sur.


Scenario 1: Compile on Catalina, copy binaries to BigSur, and call ctypes.util.find_library('c')
Python 3.11.0a0 (heads/main:635bfe8162, Jul 19 2021, 08:09:05) [Clang 12.0.0 (clang-1200.0.32.29)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from ctypes.util import find_library; print(find_library('c'))
None

Scenario 2: Compile on Big Sur, copy binaries to Catalina, and call ctypes.util.find_library('c'):
Python 3.11.0a0 (heads/main:635bfe8162, Jul 19 2021, 08:28:48) [Clang 12.0.5 (clang-1205.0.22.11)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from ctypes.util import find_library; print(find_library('c'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.11/ctypes/__init__.py", line 8, in <module>
    from _ctypes import Union, Structure, Array
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ImportError: dlopen(/usr/local/lib/python3.11/lib-dynload/_ctypes.cpython-311-darwin.so, 2): Symbol not found: __dyld_shared_cache_contains_path
  Referenced from: /usr/local/lib/python3.11/lib-dynload/_ctypes.cpython-311-darwin.so (which was built for Mac OS X 11.4)
  Expected in: /usr/lib/libSystem.B.dylib
 in /usr/local/lib/python3.11/lib-dynload/_ctypes.cpython-311-darwin.so
msg397926 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2021-07-21 07:07
The problem with moving from Catalina to Big Sur is a known issue, AFAIK there's an open issue for this.

The problem is that Big Sur moved system libraries into a big blob (which Apple calls the shared library cache). Ctypes uses an API that's new in macOS 11 to check if a library is in that cache, but only when compiled with the the macOS 11 SDK or later as the API is not available in earlier SDKs.

Moving from Big Sur to earlier version should work fine, but only if you set the deployment target correctly during the build. This is how the "universal2" installers on python.org are build.
msg397927 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2021-07-21 07:55
Anyways, the solution for "build on older macOS version, deploy to Big Sur" is to dynamically look for the relevant API (``_dyld_shared_cache_contains_path``) and use it when available. But only in that scenario, the current code path using explicit weak linking should be kept for those building using a recent SDK (cleaner code, better error reporting).

This should be a fairly easy patch, but I don't know when I'll get around to looking into this further.

Alternatively we could require that Python is build using the macOS 11 SDK (or later) when targeting Big Sur.

I'm dropping 3.8 from the list of versions because it is in "bug fix only" mode and won't receive a patch for this.  IIRC 3.8 also doesn't support Big Sur in the first place, we've only back ported Big Sur support to 3.9.
msg397930 - (view) Author: Tobias Bergkvist (bergkvist) * Date: 2021-07-21 08:38
An alternative to using _dyld_shared_cache_contains_path is to use dlopen to check for library existence (which is what Apple recommends in their change notes: https://developer.apple.com/documentation/macos-release-notes/macos-big-sur-11_0_1-release-notes).

> New in macOS Big Sur 11.0.1, the system ships with a built-in dynamic linker cache of all system-provided libraries. As part of this change, copies of dynamic libraries are no longer present on the filesystem. Code that attempts to check for dynamic library presence by looking for a file at a path or enumerating a directory will fail. Instead, check for library presence by attempting to dlopen() the path, which will correctly check for the library in the cache. (62986286)

I have created a PR which modifies the current find_library from using _dyld_shared_cache_contains_path to dlopen. It passes all of the existing find_library-tests:
https://github.com/python/cpython/pull/27251

There might be downsides to using dlopen (performance?) or something else I haven't considered. The huge upside however, is that the function is basically available on all Unix-systems.
msg397979 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2021-07-22 07:20
The disadvantage of using dlopen is that this function has side effects, and those can affect program behaviour.  Because of this I'm against switching to using dlopen to probe for libraries.
msg397984 - (view) Author: Tobias Bergkvist (bergkvist) * Date: 2021-07-22 12:43
You are absolutely right! From the manpage of dlopen(3) on MacOS Big Sur:

> dlopen() examines the mach-o file specified by path. If the file is compatible with the current process and has not already been loaded into the current process, it is loaded and linked.  After being linked, if it contains any initializer functions, they are called, before dlopen() returns. dlopen() can load dynamic libraries and bundles.  It returns a handle that can be used with dlsym() and dlclose(). A second call to dlopen() with the same path will return the same handle, but the internal reference count for the handle will be incremented. Therefore all dlopen() calls should be balanced with a dlclose() call.

Essentially, if the shared library contains initializer functions that have some kind of side-effects, dlopen will also trigger these side-effects.

Basic example:
```
// mylib.c
#include <stdio.h>
void myinit(void) {
    printf("Hello from mylib\n");
}
__attribute__((section("__DATA,__mod_init_func"))) typeof(myinit) *__init = myinit;
```

---
```
// main.c
#include <dlfcn.h>
#include <stdio.h>
int main(void) {
    void* handle = dlopen("./mylib.dyld", RTLD_LAZY);
    if (handle == NULL) printf("Failed to load"); 
}
```

$ clang mylib.c -shared -o mylib.dyld
$ clang main.c -o main
$ ./main
Hello from mylib
msg397990 - (view) Author: Tobias Bergkvist (bergkvist) * Date: 2021-07-22 18:12
Okay, I decided to look into how I could do dynamic loading as you suggested.

Here is a POC (executable) for using _dyld_shared_cache_contains_path when available:

```
#include <stdio.h>
#include <dlfcn.h>

void* libsystemb_handle;
typedef bool (*_dyld_shared_cache_contains_path_f)(const char* path);
_dyld_shared_cache_contains_path_f _dyld_shared_cache_contains_path;

bool _dyld_shared_cache_contains_path_fallback(const char* name) {
    return false;
}

__attribute__((constructor)) void load_libsystemb(void) {
    if (
        (libsystemb_handle = dlopen("/usr/lib/libSystem.B.dylib", RTLD_LAZY)) == NULL ||
        (_dyld_shared_cache_contains_path = dlsym(libsystemb_handle, "_dyld_shared_cache_contains_path")) == NULL
    ) {
        _dyld_shared_cache_contains_path = _dyld_shared_cache_contains_path_fallback;
    }
}

__attribute__((destructor)) void unload_libsystemb(void) {
    if (libsystemb_handle != NULL) {
        dlclose(libsystemb_handle);
    }
}

int main(int argc, char ** argv) {
    printf("Library exists in cache: %d\n", _dyld_shared_cache_contains_path(argv[1]));
}
```

A fallback function is used when _dyld_shared_cache_contains_path cannot be loaded, which always returns false. If there is no cache - the (nonexistent) cache also does not contain whatever path you pass it.

The constructor function is called when the Python extension is loaded - ensuring that _dyld_shared_cache_contains_path is defined and callable. I've read that C extension modules cannot be autoreloaded (https://ipython.org/ipython-doc/3/config/extensions/autoreload.html) - so this might mean there is no need for a deconstructor? Instead the OS would handle cleanup once the process exits?

This could be compiled on either MacOS Catalina or Big Sur, and run without problems on the other MacOS version.

Regarding the "explicit weak linking" when building on MacOS Big Sur and later; wouldn't this mean that a Big Sur build wouldn't work on Catalina?

Would you be positive towards a PR with the approach I demonstrated here?
History
Date User Action Args
2021-07-22 18:12:56bergkvistsetmessages: + msg397990
2021-07-22 12:43:27bergkvistsetmessages: + msg397984
2021-07-22 07:20:01ronaldoussorensetmessages: + msg397979
2021-07-21 08:38:50bergkvistsetmessages: + msg397930
2021-07-21 07:55:25ronaldoussorensetstage: needs patch
messages: + msg397927
versions: - Python 3.8
2021-07-21 07:07:25ronaldoussorensetmessages: + msg397926
2021-07-20 21:59:55zach.waresetnosy: + ronaldoussoren, ned.deily
components: + macOS
2021-07-20 21:56:46bergkvistcreate