This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Python not reentrant
Type: behavior Stage:
Components: Interpreter Core Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: eric.smith, eric.snow, ncoghlan, skaller, vstinner
Priority: normal Keywords:

Created on 2018-09-17 02:04 by skaller, last changed 2022-04-11 14:59 by admin.

Messages (4)
msg325508 - (view) Author: john skaller (skaller) Date: 2018-09-17 02:04
Executive Summary: Python currently is not properly re-entrant. This comment applies to the CAPI and particularly embedding. A fix is not possible in Python 3.x but should be scheduled for Python 4. On Linux all binary plugins are broken as well.

The fault is exhibited by the need to first call PyInitialise(). This is clearly wrong because there is nowhere to put the initialised data. The correct sequence should be to first create an interpreter handle, and then initialise that. Other API calls exhibit the same fault. For example PyErr_Occured().

Use of thread local storage is NOT enough.

A general embedding scenario is this: a thunk program is used to dynamically load a shared library and execute a function in it. That function may load other shared libraries. Note carefully there is no global data, the libraries are pure code. [This is not an imagined scenario, my whole system works this way]

The same library may be loaded several times. For example, A can load B and C, and both B and C can load D. Proper visibility control means A cannot see any symbols of D.

In this scenario if D wishes to run a Python interpreter, it must call PyInitialise(), and it will be called twice, since D is called twice, once from A, and once from B. Indeed, if the top level spawns multiple threads, it can be called many more times than that. 

Remember the libraries are pure code and fully reentrant. There is no way to record if a function has been called already.

In order for Python to be fully re-entrant there is a simple test: if the C code of the Python library contains ANY global variables at all then Python is wrong. Global variables INCLUDE thread local storage. ALL data and ALL functions must hang off a handle so that all functionality and behaviour is fully isolated to each handle.

Exceptions to the rule: poorly designed OS such as Unix have some non-reentrant features. The worst of these in Unix is signal handling. It is not possible to handle signals without a global variable to communicate between the signal handler and application. The right way to do this would have been to use a polling service to detect the signal. In any case systems like Python do have to work with badly designed API's sometimes and therefore these special cases do form legitimate exceptions to the requirement that the API be re-entrant. My recommendation is to provide a cheat API which looks re-entrant but actually isn't, because it delegates to a hidden lower level which isn't, of necessity. YMMV: how to handle bad underlying API's should be open for discussion.

Other consequences: On linux at least ALL plugin extensions are built incorrectly. The correct way to build a plugin requires explicitly linking against the Python library, so that symbols in the Python API can be found. These symbols must NOT be found in the application because this is, quite simply, not possible, if the application does not include those symbols. In my scenario, the top level application is three lines of C than does nothing other than load a library and run a fixed function in it. And that library has no idea that one of the libraries IT loads may call another library which happens to want to run some Python code. Indeed my system can *generate* Python modules, and compile and link them against the Python library, but it cannot load any existing plugins on Linux, because those plugins were incorrectly built and do not link to the Python library as they should. They expect to find symbols in the symbol table magically provided but those symbols are not there.

On OSX, however, it works. That is because on OSX, a --framework is used to contain the Python library and all plugins HAVE to be linked against the framework. I expect the Windows builds to work too, for the same reason (but I'm not sure).

This issue is related to the lack of re-entrancy because the same principle is broken in both cases. If you need a service, you must ask for it, and when you get it, it is exclusively yours.
msg325524 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2018-09-17 11:48
I think this is along the lines of a recent discussion on the c-api mailing list about passing a context value (your handle) to every C call. See this message for some discussion: https://mail.python.org/mm3/archives/list/capi-sig@python.org/message/ZN6BJVUGIOGWKHY47PKPX5Z3SGCYUAX5/
msg325527 - (view) Author: john skaller (skaller) Date: 2018-09-17 12:04
eric: yes, that's relevant. Very happy it is discussed.

Contrary to some indication in the post, passing a context handle around everywhere is NOT a burden at all. My system does exactly that.

I would note, API's which already require, say, an interpreter handle, don't require any modification.

Also, I would note, legacy API's do not have to be broken, you just have a single, legacy, global variable holding the default context, and deprecate any functions using it.
msg325549 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2018-09-17 15:54
Also see PEP 432 ("Restructuring the CPython startup sequence"), which is still in the process of fine-tuning.
History
Date User Action Args
2022-04-11 14:59:06adminsetgithub: 78888
2018-09-17 15:54:30eric.snowsetnosy: + eric.snow, vstinner, ncoghlan
messages: + msg325549
2018-09-17 12:04:18skallersetmessages: + msg325527
2018-09-17 11:48:46eric.smithsetnosy: + eric.smith
messages: + msg325524
2018-09-17 02:04:35skallercreate