This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author twouters
Recipients Arfrever, amaury.forgeotdarc, asvetlov, brett.cannon, eric.snow, eudoxos, ncoghlan, pitrou, r.david.murray, twouters, vstinner
Date 2018-02-28.18:50:59
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1519843860.12.0.467229070634.issue32973@psf.upfronthosting.co.za>
In-reply-to
Content
This is a continuation, of sorts, of issue16421; adding most of that issue's audience to the noisy list.

When importing the same extension module under multiple names that share the same basename, Python 3 will call the extension module's init function multiple times. With extension modules that do not support re-initialisation, this causes them to trample all over their own state. In the case of numpy, this corrupts CPython internal data structures, like builtin types.

Simple reproducer:
% python3.6 -m venv numpy-3.6
% numpy-3.6/bin/python -m pip install numpy
% PYTHONPATH=./numpy-3.6/lib/python3.6/site-packages/numpy/core/ ./numpy-3.6/bin/python -c "import numpy.core.multiarray, multiarray; u'' < 1"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
Segmentation fault

(The corruption happens because PyInit_multiarray initialises subclasses of builtin types, which causes them to share some data (e.g. tp_as_number) with the base class: https://github.com/python/cpython/blob/master/Objects/typeobject.c#L5277. Calling it a second time then copies data from a different class into that shared data, corrupting the base class: https://github.com/python/cpython/blob/master/Objects/typeobject.c#L4950. The Py_TPFLAGS_READY flag is supposed to protect against this, but PyInit_multiarray resets the tp_flags value. I ran into this because we have code that vendors numpy and imports it in two different ways.)

The specific case of numpy is somewhat convoluted and exacerbated by dubious design choices in numpy, but it is not hard to show that calling an extension module's PyInit function twice (if the module doesn't support reinitialisation through PEP 3121) is bad: any C globals initialised in the PyInit function will be trampled on.

This was not a problem in Python 2 because the extension module cache worked based purely on filename. It was changed in response to issue16421, but the intent there appears to be to call *different* PyInit methods in the same module. However, because PyInit functions are based off of the *basename* of the module, not the full module name, a different module name does not mean a different init function name.

I think the right approach is to change the extension module cache to key on filename and init function name, although this is a little tricky: the init function name is calculated much later in the process. Alternatively, key it on filename and module basename, rather than full module name.
History
Date User Action Args
2018-02-28 18:51:00twouterssetrecipients: + twouters, brett.cannon, amaury.forgeotdarc, ncoghlan, pitrou, vstinner, Arfrever, r.david.murray, asvetlov, eric.snow, eudoxos
2018-02-28 18:51:00twouterssetmessageid: <1519843860.12.0.467229070634.issue32973@psf.upfronthosting.co.za>
2018-02-28 18:51:00twouterslinkissue32973 messages
2018-02-28 18:50:59twouterscreate