This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Single-phase initialized modules gets initialized multiple times in 3.10.0
Type: behavior Stage:
Components: C API Versions: Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: daniel-falk
Priority: normal Keywords:

Created on 2021-12-10 13:32 by daniel-falk, last changed 2022-04-11 14:59 by admin.

Messages (1)
msg408209 - (view) Author: Daniel (daniel-falk) Date: 2021-12-10 13:32
The documentation (https://docs.python.org/3/c-api/init.html#c.Py_NewInterpreter) states:


For modules using single-phase initialization, e.g. PyModule_Create(), the first time a particular extension is imported, it is initialized normally, and a (shallow) copy of its module’s dictionary is squirreled away. When the same extension is imported by another (sub-)interpreter, a new module is initialized and filled with the contents of this copy; the extension’s init function is not called. Objects in the module’s dictionary thus end up shared across (sub-)interpreters, which might cause unwanted behavior (see Bugs and caveats below).


This does however seem to have changed (sometime between 3.6.9 and 3.10.0). Consider the following code:


#include <Python.h>

/*
 * Create a module "my_spam" that uses single-phase initialization
 */
static PyModuleDef EmbModule = {
      PyModuleDef_HEAD_INIT, "my_spam", NULL, -1,
      NULL,
      NULL, NULL, NULL, NULL
};

/*
 * According to the docs this function is only called once when dealing with
 * subinterpreters, the next time a shallow copy of the initial state is
 * returned. This does however not seem to be the case in Python 3.10.0..
 */
static PyObject* PyInit_emb(void) {
  PyObject *module = PyModule_Create(&EmbModule);
  PyModule_AddObject(module, "test", PyDict_New());

  printf("Init my_spam module %p\n", module);
  return module;
}

/*
 * Main program
 */

int main(int argc, char *argv[]) {
  PyImport_AppendInittab("my_spam", &PyInit_emb);
  Py_Initialize();

  // Save the main state
  PyThreadState *mainstate = PyThreadState_Get();

  // Create two new interpreters
  PyThreadState *inter1 = Py_NewInterpreter();
  PyThreadState *inter2 = Py_NewInterpreter();

  // Import the my_spam module into the first subinterpreter
  // and change the global variable of it
  PyThreadState_Swap(inter1);
  PyRun_SimpleString("import sys; print(sys.version_info)");
  PyRun_SimpleString("import my_spam; print('my_spam.test: ', my_spam.test)");
  PyRun_SimpleString("my_spam.test[1]=1; print('my_spam.test: ', my_spam.test)");

  // Import the my_spam module into the second subinterpreter
  // and change the global variable of it
  PyThreadState_Swap(inter2);
  PyRun_SimpleString("import sys; print(sys.version_info)");
  PyRun_SimpleString("import my_spam; print('my_spam.test: ', my_spam.test)");
  PyRun_SimpleString("my_spam.test[2]=2; print('my_spam.test: ', my_spam.test)");

  // Close the subinterpreters
  Py_EndInterpreter(inter2);
  PyThreadState_Swap(inter1);
  Py_EndInterpreter(inter1);

  // Swap back to the main state and finalize python
  PyThreadState_Swap(mainstate);
  if (Py_FinalizeEx() < 0) {
      exit(120);
  }

  return 0;
}



Compiled with python 3.6.9 this does act according to the documentation:


$ gcc test_subinterpreters.c -I/home/daniel/.pyenv/versions/3.6.9/include/python3.6m -L/home/daniel/.pyenv/versions/3.6.9/lib -lpython3.6m && LD_LIBRARY_PATH=/home/daniel/.pyenv/versions/3.6.9/lib ./a.out
sys.version_info(major=3, minor=6, micro=9, releaselevel='final', serial=0)
Init my_spam module 0x7ff7a63d1ef8
my_spam.test:  {}
my_spam.test:  {1: 1}
sys.version_info(major=3, minor=6, micro=9, releaselevel='final', serial=0)
my_spam.test:  {1: 1}
my_spam.test:  {1: 1, 2: 2}


But compiled with 3.10.0 the module is reinitialized and thus objects in the module are not shared between the subinterpreters:


$ gcc test_subinterpreters.c -I/home/daniel/.pyenv/versions/3.10.0/include/python3.10 -L/home/daniel/.pyenv/versions/3.10.0/lib -lpython3.10 && LD_LIBRARY_PATH=/home/daniel/.pyenv/versions/3.10.0/lib ./a.out
sys.version_info(major=3, minor=10, micro=0, releaselevel='final', serial=0)
Init my_spam module 0x7f338a9a9530
my_spam.test:  {}
my_spam.test:  {1: 1}
sys.version_info(major=3, minor=10, micro=0, releaselevel='final', serial=0)
Init my_spam module 0x7f338a9a96c0
my_spam.test:  {}
my_spam.test:  {2: 2}


To me the new behavior seems nicer, but at the very least the documentation should be updated. It also seems like if this could break integrations, albeit it is an unlikely "feature" to rely on.
History
Date User Action Args
2022-04-11 14:59:53adminsetgithub: 90194
2021-12-10 13:32:42daniel-falkcreate