This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: No protection: `import numpy` in two different threads can lead to race-condition
Type: behavior Stage:
Components: Interpreter Core Versions: Python 3.11
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: christian.heimes, jhgoebbert, seberg
Priority: normal Keywords:

Created on 2022-03-21 17:03 by jhgoebbert, last changed 2022-04-11 14:59 by admin.

Messages (4)
msg415687 - (view) Author: Jens Henrik Goebbert (jhgoebbert) Date: 2022-03-21 17:03
While using [Xpra](https://github.com/Xpra-org/xpra) we came across a bug which might be a Python or a NumPy issue.
Perhaps some of you can help us understanding some internals.

Calling `import numpy` at the same time in two different threads of the Python program can lead to a race-condition.
This happens for example with Xpra when loading the encoder nvjpeg:
```
2022-03-20 12:54:59,298  cannot load enc_nvjpeg (nvjpeg encoder)
Traceback (most recent call last):
  File "<pythondir>/lib/python3.9/site-packages/xpra/codecs/loader.py", line 52, in codec_import_check
    ic =  __import__(class_module, {}, {}, classnames)
  File "xpra/codecs/nvjpeg/encoder.pyx", line 8, in init xpra.codecs.nvjpeg.encoder
  File "<pythondir>/lib/python3.9/site-packages/numpy/__init__.py", line 150, in <module>
    from . import core
  File "<pythondir>/lib/python3.9/site-packages/numpy/core/__init__.py", line 51, in <module>
    del os.environ[envkey]
  File "<pythondir>/lib/python3.9/os.py", line 695, in __delitem__
    raise KeyError(key) from None
KeyError: 'OPENBLAS_MAIN_FREE'
```

Here the environment variable OPENBLAS_MAIN_FREE is set in the `numpy` code:
https://github.com/numpy/numpy/blob/maintenance/1.21.x/numpy/core/__init__.py#L18
and short after that it is deleted
https://github.com/numpy/numpy/blob/maintenance/1.21.x/numpy/core/__init__.py#L51
But this deletion fails ... perhaps because the initialization runs at the same time in two threads :thinking:

Shouldn't Python protect us by design?

@seberg comments [HERE](https://github.com/numpy/numpy/issues/21223#issuecomment-1074008386):
```
So, my current hypothesis (and I have briefly checked the Python code) is that Python does not do manual locking. But it effectively locks due to this going into C and thus holding the GIL. But somewhere during the import of NumPy, NumPy probably releases the GIL briefly and that could allow the next thread to go into the import machinery.
[..]
NumPy may be doing some worse than typical stuff here, but right now it seems to me that Python should be protecting us.
```

Can anyone comment on this?
msg415690 - (view) Author: Sebastian Berg (seberg) * Date: 2022-03-21 17:28
To add to this: it would seem to me that the side-effects of importing should be guaranteed to only be called once?

However, IO or other operations could be part of the import side-effects and release the GIL.  So even a simple, pure-Python, package could run into this same issue and probably won't even realize that they can run into it.
(Assuming I understand how this is happening correctly.)

So it would seem to me that either:
* Python should lock on the thread level or maybe the `sys.modules` dictionary?
* The `threading` module could somehow ensure safety by hooking into
  the import machinery?
* Packages that may release the GIL or have side-effects that must
  only be run once have to lock (i.e. NumPy).
  (But it seems to me that many packages will not even be aware of
  possible issues.)
msg415718 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2022-03-21 22:43
Python used to have a global import lock. The global import lock prevented recursive imports from other threads while the current thread was importing. The import lock was source of issues and dead locks. Antoine replaced it with a fine grained import lock in bpo-9260.
msg415722 - (view) Author: Sebastian Berg (seberg) * Date: 2022-03-21 23:04
Thanks, so there should already be a lock in place (sorry, I missed that).  But somehow we seem to get around it?

Do you know what may cause the locking logic to fail in this case?  Recursive imports in NumPy itself?  Or Cython using low-level C-API?

I.e. can you think of something to investigate that may help NumPy/Cython to make sure that locking is successful?


/* Cythons Import code (slightly cleaned up for Python 3 only): */
static PyObject *__Pyx_Import(PyObject *name, PyObject *from_list, int level) {
    PyObject *empty_list = 0;
    PyObject *module = 0;
    PyObject *global_dict = 0;
    PyObject *empty_dict = 0;
    PyObject *list;
    if (from_list)
        list = from_list;
    else {
        empty_list = PyList_New(0);
        if (!empty_list)
            goto bad;
        list = empty_list;
    }
    global_dict = PyModule_GetDict(__pyx_m);
    if (!global_dict)
        goto bad;
    empty_dict = PyDict_New();
    if (!empty_dict)
        goto bad;
    {
        if (level == -1) {
            if ((1) && (strchr(__Pyx_MODULE_NAME, '.'))) {
                module = PyImport_ImportModuleLevelObject(
                    name, global_dict, empty_dict, list, 1);
                if (!module) {
                    if (!PyErr_ExceptionMatches(PyExc_ImportError))
                        goto bad;
                    PyErr_Clear();
                }
            }
            level = 0;
        }
        if (!module) {
            module = PyImport_ImportModuleLevelObject(
                name, global_dict, empty_dict, list, level);
        }
    }
bad:
    Py_XDECREF(empty_list);
    Py_XDECREF(empty_dict);
    return module;
}
History
Date User Action Args
2022-04-11 14:59:57adminsetgithub: 91238
2022-03-21 23:04:38sebergsetmessages: + msg415722
2022-03-21 22:43:13christian.heimessetnosy: + christian.heimes
messages: + msg415718
2022-03-21 17:28:44sebergsetnosy: + seberg
messages: + msg415690
2022-03-21 17:03:05jhgoebbertcreate