This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Importing the same extension module under multiple names breaks non-reinitialisable extension modules
Type: behavior Stage:
Components: Extension Modules Versions: Python 3.8, Python 3.7, Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, Dormouse759, amaury.forgeotdarc, asvetlov, brett.cannon, eric.snow, eudoxos, gregory.p.smith, ncoghlan, petr.viktorin, pitrou, r.david.murray, scoder, twouters, vstinner
Priority: normal Keywords:

Created on 2018-02-28 18:51 by twouters, last changed 2022-04-11 14:58 by admin.

Messages (9)
msg313064 - (view) Author: Thomas Wouters (twouters) * (Python committer) Date: 2018-02-28 18:50
This is a continuation, of sorts, of issue16421; adding most of that issue's audience to the noisy list.

When importing the same extension module under multiple names that share the same basename, Python 3 will call the extension module's init function multiple times. With extension modules that do not support re-initialisation, this causes them to trample all over their own state. In the case of numpy, this corrupts CPython internal data structures, like builtin types.

Simple reproducer:
% python3.6 -m venv numpy-3.6
% numpy-3.6/bin/python -m pip install numpy
% PYTHONPATH=./numpy-3.6/lib/python3.6/site-packages/numpy/core/ ./numpy-3.6/bin/python -c "import numpy.core.multiarray, multiarray; u'' < 1"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
Segmentation fault

(The corruption happens because PyInit_multiarray initialises subclasses of builtin types, which causes them to share some data (e.g. tp_as_number) with the base class: https://github.com/python/cpython/blob/master/Objects/typeobject.c#L5277. Calling it a second time then copies data from a different class into that shared data, corrupting the base class: https://github.com/python/cpython/blob/master/Objects/typeobject.c#L4950. The Py_TPFLAGS_READY flag is supposed to protect against this, but PyInit_multiarray resets the tp_flags value. I ran into this because we have code that vendors numpy and imports it in two different ways.)

The specific case of numpy is somewhat convoluted and exacerbated by dubious design choices in numpy, but it is not hard to show that calling an extension module's PyInit function twice (if the module doesn't support reinitialisation through PEP 3121) is bad: any C globals initialised in the PyInit function will be trampled on.

This was not a problem in Python 2 because the extension module cache worked based purely on filename. It was changed in response to issue16421, but the intent there appears to be to call *different* PyInit methods in the same module. However, because PyInit functions are based off of the *basename* of the module, not the full module name, a different module name does not mean a different init function name.

I think the right approach is to change the extension module cache to key on filename and init function name, although this is a little tricky: the init function name is calculated much later in the process. Alternatively, key it on filename and module basename, rather than full module name.
msg313137 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2018-03-02 16:09
PEP 489 ("Multi-phase extension module initialization") is relevant here, so I've nosied Petr.
msg313138 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2018-03-02 16:21
Well, PEP 489 basically punts this to module authors: generally, C globals are bad, but if you do have global state, please manage it, keeping in mind that multiple module objects can be created from the extension.

That's required to make everything work with subinterpreters.

See: https://www.python.org/dev/peps/pep-0489/#subinterpreters-and-interpreter-reloading

CCing Marcel, who's working on PEP 489-related stuff now.
msg313148 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2018-03-02 17:34
> change the extension module cache to key on filename and init function name

... or on the pointer to the PyInit function. If that's the same, we obviously have the same extension module. If it differs, even for the same module name, then other globals of the modules will probably also be distinct.
msg313233 - (view) Author: Thomas Wouters (twouters) * (Python committer) Date: 2018-03-05 03:35
Re: Petr: we can't expect extension module authors to retroactively fix released modules. We can't even expect everyone to fix this for future releases; moving away from globals (which may not be specific to the Python extension module) may be a lot of effort. Just look at how much work it's taking to move CPython itself to stop using globals in so many places. And while it may be necessary for sub-interpreters (which is only the case for global state that's Python objects), many people just don't care about sub-interpreters.

Re: Stefan: the init function pointer isn't known until much later in the current process, and calculated from the module name. There is not currently a way to import a module with a different init function pointer. I don't think this warrants that much of a rewrite.
msg324066 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2018-08-25 13:28
I think the best work-around for now is to implement a bit of PEP 489, including a module create function that always returns the same static module reference instead of creating a new one after the first call, and a module exec function that simply returns if the module has already been initialised. Keeping a global pointer to the module instance around should work. This is what the current Cython master branch does (also to make use of some of the nice features that PEP 489 brings).
msg324068 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2018-08-25 13:54
That's quite problematic, since then you're sharing a mutable object across interpreters.
The user can store any attribute on module objects, including e.g. Python functions that reference their original interpreter's global state, but become callable in other interpreters.
msg324069 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2018-08-25 14:25
Well, first of all, it's better than a crash. :)

Secondly, I'm sure NumPy doesn't currently support subinterpreters, just like most other extension modules. If I'm not mistaken, an interpreter switch can be detected through the interpreter state pointer [1] in the thread state, and extension modules that lack subinterpreter support can consider a change an error for them. since then something is trying to re-import the module into a different interpreter. That's not entirely safe since addresses can be reused, which I guess was the reason for adding an ID [2] in Py3.7, but that's only available in Py3.7+, not in Py3.5. So, the interpreter address is probably as good as it gets for Py<3.7.

[1] https://docs.python.org/3/c-api/init.html#c.PyThreadState
[2] https://docs.python.org/3/c-api/init.html#c.PyInterpreterState_GetID

Note: I'm not trying to keep anyone from implementing subinterpreter support here – just showing a way to keep things working and improving gradually as long as there is no full support for PEP 489, extension module reloading and subinterpreters, so that users don't have to go all the way in one step.
msg324077 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2018-08-25 16:36
FYI, I've updated Cython's module import checks to include an interpreter check. This (multi-file) test shows the new behaviour, which is to raise an ImportError on module creation when it detects a different interpreter than during the initial import:

https://github.com/cython/cython/blob/master/tests/run/reimport_from_subinterpreter.srctree

The checks are implemented here (and called a bit further down in the module create function):
https://github.com/cython/cython/blob/4ce754271ff4cfbd8df2b278e812154fb1b02319/Cython/Utility/ModuleSetupCode.c#L909-L932

I also added a test that should match the problem discussed here, which makes what I described appear as a viable solution (or work-around):

https://github.com/cython/cython/blob/master/tests/run/reimport_from_package.srctree
History
Date User Action Args
2022-04-11 14:58:58adminsetgithub: 77154
2018-08-25 16:36:18scodersetmessages: + msg324077
2018-08-25 14:25:19scodersetmessages: + msg324069
2018-08-25 13:54:13petr.viktorinsetmessages: + msg324068
2018-08-25 13:28:29scodersetmessages: + msg324066
2018-03-05 03:35:21twouterssetmessages: + msg313233
2018-03-02 17:34:20scodersetmessages: + msg313148
2018-03-02 17:30:37scodersetnosy: + scoder
components: + Extension Modules
2018-03-02 16:21:11petr.viktorinsetnosy: + Dormouse759
messages: + msg313138
2018-03-02 16:09:59eric.snowsetnosy: + petr.viktorin
messages: + msg313137
2018-02-28 23:59:42gregory.p.smithsetversions: + Python 3.6, Python 3.7, Python 3.8
2018-02-28 23:58:34gregory.p.smithsetnosy: + gregory.p.smith
2018-02-28 18:51:00twouterscreate