classification
Title: [subinterpreters] Meta issue: per-interpreter GIL
Type: enhancement Stage:
Components: Subinterpreters Versions: Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: aeros, alex-garel, corona10, eric.snow, erlendaasland, shihai1991, vstinner
Priority: normal Keywords: patch

Created on 2020-05-05 12:51 by vstinner, last changed 2020-12-31 10:11 by alex-garel.

Files
File name Uploaded Description Edit
demo-pyperf.py vstinner, 2020-05-05 21:52
resolve_slotdups.patch vstinner, 2020-12-26 22:23
Messages (24)
msg368136 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-05 12:51
To be able to run multiple (sub)interpreters in parallel, the unique global interpreter lock aka "GIL" should be replace with multiple GILs: one "GIL" per interpreter. The scope of such per-interpreter GIL would be a single interpreter.

The current CPython code base is not fully read to have one GIL per interpreter. TODO:

* Move signals pending and gil_drop_request from _PyRuntimeState.ceval to PyInterpreterState.ceval: https://github.com/ericsnowcurrently/multi-core-python/issues/34
* Add a lock to pymalloc, or disable pymalloc when subinterpreters are used: https://github.com/ericsnowcurrently/multi-core-python/issues/30
* Make free lists per interpreters: tuple, dict, frame, etc.
* Make Unicode interned strings per interpreter
* Make Unicode latin1 single character string singletons per interpreter
* None, True, False, ... singletons: make them per-interpreter (bpo-39511) or immortal (bpo-40255)
* etc.

Until we can ensure that no Python object is shared between two interpreters, we might make PyObject.ob_refcnt, PyGC_Head (_gc_next and _gc_prev) and _dictkeysobject.dk_refcnt atomic.

C extension modules should be modified as well:

* Convert to PEP 489 multi-phase initialization
* Replace globals ("static" variables) with a module state, or design a new "per-interpreter" local storage similar to Thread Local Storage (TLS). There is already PyInterpreterState.dict which is convenient to use in "Python" code, but it's not convenient to use in "C" code (code currently using "static int ..." for example).

I'm not sure how to handle C extensions which are binding for a C library which has a state and so should not be used multiple times in parallel. Some C extensions use a "global lock" for that. The question is how to get 

Most of these tasks are already tracked in Eric Snow's "Multi Core Python" project:
https://github.com/ericsnowcurrently/multi-core-python/issues

This issue is related to PEP 554 "Multiple Interpreters in the Stdlib", but not required by this PEP.

This issue is a tracker for sub-issues related to the goal "have one GIL per interpreter".

--

Some changes have a negative impact on "single threaded" Python application. Even if the overhead is low, one option to be able to move faster on this issue may be to add a new temporary configure option to have an opt-in build mode to better isolate subinterpreters. Examples:

* disable pymalloc
* atomic reference counters
* disable free lists

That would be a temporary solution to "unblock" the development on this list. For the long term, free lists should be made per-interpreter, pymalloc should support multiple interpreters, no Python object must be shared by two interpreters, etc.

--

One idea to detect if a Python object is shared by two interpreters *in debug mode* would be to store a reference to the interpreter which created it, and then check if the current interpreter is the same. If not, fail with a Python Fatal Error.

--

During Python 3.9 development cycle, many states moved from the global _PyRuntimeState to per-interpreter PyInterpreterState:

* GC state (bpo-36854)
* warnings state (bpo-36737)
* small integer singletons (bpo-38858)
* parser state (bpo-36876)
* ceval pending calls and "eval breaker" (bpo-39984)
* etc.

Many corner cases related to daemon threads have also been fixed:

* https://vstinner.github.io/daemon-threads-python-finalization-python32.html
* https://vstinner.github.io/threading-shutdown-race-condition.html
* https://vstinner.github.io/gil-bugfixes-daemon-threads-python39.html

And more code is now shared for the initialization and finalization of the main interpreter and subinterpreters (ex: see bpo-38858).

Subinterpreters builtins and sys are now really isolated from the main interpreter (bpo-38858).

--

Obviously, there are likely tons of other issues which are not known at this stage. Again, this issue is a placeholder to track them all. It may be more efficient to create one sub-issue per sub-task, rather than discussing all tasks at the same place.
msg368138 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-05 12:57
> Move signals pending and gil_drop_request from _PyRuntimeState.ceval to PyInterpreterState.ceval: https://github.com/ericsnowcurrently/multi-core-python/issues/34

I created bpo-40513: "Move _PyRuntimeState.ceval to PyInterpreterState".
msg368142 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-05 13:14
> Some changes have a negative impact on "single threaded" Python application. Even if the overhead is low, one option to be able to move faster on this issue may be to add a new temporary configure option to have an opt-in build mode to better isolate subinterpreters. (...)

I created bpo-40514: "Add --experimental-isolated-subinterpreters build option".
msg368184 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-05 17:39
I created bpo-40522: "Subinterpreters: get the current Python interpreter state from Thread Local Storage (autoTSSkey)".
msg368195 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-05 18:32
Attached demo.py: benchmark to compare performance of sequential execution, threads and subinterpreters.
msg368203 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-05 20:13
(oops, there was a typo in my script: threads and subinterpreters was the same benchmark)
msg368206 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-05 21:10
Hum, demo.py is not reliable for threads: the standard deviation is quite large. I rewrote it using pyperf to compute the average and the standard deviation.
msg368210 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-05 21:53
I updated demo-pyperf.py to also benchmark multiprocessing.
msg368272 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-06 15:59
I created bpo-40533: "Subinterpreters: don't share Python objects between interpreters".
msg368310 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-06 23:07
See also bpo-39465: "Design a subinterpreter friendly alternative to _Py_IDENTIFIER". Currently, this C API is not compatible with subinterpreters.
msg368670 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-11 22:25
"Static" types are shared by all interpreters. We should convert them to heap allocated types using PyType_FromSpec(), see:

* bpo-40077: Convert static types to PyType_FromSpec()
* bpo-40601: [C API] Hide static types from the limited C API
msg368839 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-14 14:11
> Add a lock to pymalloc, or disable pymalloc when subinterpreters are used: (...)

By the way, tracemalloc is not compatible with subinterpreters.

test.support.run_in_subinterp() skips the test if tracemalloc is tracing.
msg368908 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-15 01:32
I marked bpo-36877 "[subinterpreters][meta] Move fields from _PyRuntimeState to PyInterpreterState" as a duplicate of this issue.
msg368914 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-05-15 01:55
I created a new "Subinterpreters" component in the bug tracker. It may help to better track all issues related to subinterpreters.
msg370608 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-02 12:46
Currently, the import lock is shared by all interpreters. It would also help for performance to make it per-interpreter to parallelize imports.
msg372253 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-24 13:55
Update of the EXPERIMENTAL_ISOLATED_SUBINTERPRETERS status.

I made many free lists and singletons per interpreter in bpo-40521.

TODO:

* _PyUnicode_FromId() and interned strings are still shared: typeobject.c requires a workaround for that.
* GC is disabled in subinterpreters since some objects are still shared
* Type method cache is shared.
* pymalloc is shared.
* The GIL is shared.

I'm investigating performance of my _PyUnicode_FromId() PR: https://github.com/python/cpython/pull/20058

This PR now uses "atomic functions" proposed in a second PR: https://github.com/python/cpython/pull/20766

The "atomic functions" avoids the need to have to declare a variable or a structure member as atomic, which would cause different issues if they are declared in Python public headers (which is the case for _Py_Identifier used by _PyUnicode_FromId()).
msg372773 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-07-01 17:30
> Update of the EXPERIMENTAL_ISOLATED_SUBINTERPRETERS status.

Also:

* _PyLong_Zero and _PyLong_One singletons are shared
* Py_None, Py_True and Py_False singletons are shared: bpo-39511 and PR 18301
* Static types like PyUnicode_Type and PyLong_Type are shared: see bpo-40077 and bpo-40601
* The dictionary of Unicode interned strings is shared: PR 20085
* context.c: _token_missing singleton is shared
* "struct _PyArg_Parser" generated by Argument Clinic is shared: see _PyArg_Fini()

Misc notes:

* init_interp_main(): if sys.warnoptions is not empty, "import warnings" is called to process these options, but not in subinterpreters: only in the main intepreter.
* _PyImport_FixupExtensionObject() contains code specific to the main interpreter. Maybe this function will not longer be needed once builtin extension modules will be converted to PEP 489 "multiphase initialization" API. I'm not sure.
msg380102 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-10-31 22:25
> * _PyLong_Zero and _PyLong_One singletons are shared

Removed by bpo-42161 (commit c310185c081110741fae914c06c7aaf673ad3d0d).
msg380108 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-10-31 23:45
FYI I'm also using https://pythondev.readthedocs.io/subinterpreters.html to track the progress on isolating subinterpreters.
msg380323 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-11-04 13:58
See also bpo-15751: "Make the PyGILState API compatible with subinterpreters".
msg383780 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-12-25 23:39
> Type method cache is shared.

I created bpo-42745: "[subinterpreters] Make the type attribute lookup cache per-interpreter".
msg383830 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-12-26 22:10
I played with ./configure --with-experimental-isolated-subinterpreters. I tried to run "pip list" in parallel in multiple interpreters.

I hit multiple issues:

* non-atomic reference count of Python objects shared by multiple interpreters, objects shared via static types for example.

* resolve_slotdups() uses a static variable

* pip requires _xxsubinterpreters.create(isolated=False): the vendored distro package runs the lsb_release command with subprocess.

* Race conditions in PyType_Ready() on static types:

  * Objects/typeobject.c:5494: PyType_Ready: Assertion "(type->tp_flags & (1UL << 13)) == 0" failed
  * Race condition in add_subclass()

* parser_init() doesn't support subinterpreters

* unicode_dealloc() fails to delete an interned string in the Unicode interned dictionary => https://bugs.python.org/issue40521#msg383829


To run "pip list", I used:

CODE = """
import runpy
import sys
import traceback
sys.argv = ["pip", "list"]
try:
    runpy.run_module("pip", run_name="__main__", alter_sys=True)
except SystemExit:
    pass
except Exception as exc:
    traceback.print_exc()
    print("BUG", exc)
    raise
"""
msg383831 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-12-26 22:23
> * resolve_slotdups() uses a static variable

Attached resolve_slotdups.patch works around the issue by removing the cache.
msg383875 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-12-27 23:21
FYI I wrote an article about this issue: "Isolate Python Subinterpreters"
https://vstinner.github.io/isolate-subinterpreters.html
History
Date User Action Args
2020-12-31 10:11:26alex-garelsetnosy: + alex-garel
2020-12-27 23:21:46vstinnersetmessages: + msg383875
2020-12-26 22:23:54vstinnersetfiles: + resolve_slotdups.patch
keywords: + patch
messages: + msg383831
2020-12-26 22:10:15vstinnersetmessages: + msg383830
2020-12-25 23:39:03vstinnersetmessages: + msg383780
2020-11-04 13:58:01vstinnersetmessages: + msg380323
2020-10-31 23:45:07vstinnersetmessages: + msg380108
2020-10-31 23:10:46erlendaaslandsetnosy: + erlendaasland
2020-10-31 22:25:03vstinnersetmessages: + msg380102
2020-07-01 17:30:20vstinnersetmessages: + msg372773
2020-06-24 13:55:09vstinnersetmessages: + msg372253
2020-06-02 12:46:45vstinnersetmessages: + msg370608
versions: + Python 3.10, - Python 3.9
2020-05-15 01:55:20vstinnersetmessages: + msg368914
2020-05-15 01:32:00vstinnersetmessages: + msg368908
2020-05-15 01:31:54vstinnerlinkissue36877 superseder
2020-05-15 00:35:28vstinnersetcomponents: + Subinterpreters, - Interpreter Core
title: Meta issue: per-interpreter GIL -> [subinterpreters] Meta issue: per-interpreter GIL
2020-05-14 14:11:45vstinnersetmessages: + msg368839
2020-05-11 22:25:02vstinnersetmessages: + msg368670
2020-05-06 23:07:16vstinnersetmessages: + msg368310
2020-05-06 15:59:32vstinnersetmessages: + msg368272
2020-05-06 04:47:58shihai1991setnosy: + shihai1991
2020-05-05 21:53:11vstinnersetmessages: + msg368210
2020-05-05 21:52:48vstinnersetfiles: - demo.py
2020-05-05 21:52:47vstinnersetfiles: - demo-pyperf.py
2020-05-05 21:52:39vstinnersetfiles: + demo-pyperf.py
2020-05-05 21:10:07vstinnersetfiles: + demo-pyperf.py

messages: + msg368206
2020-05-05 20:13:59vstinnersetfiles: + demo.py

messages: + msg368203
2020-05-05 20:13:30vstinnersetfiles: - demo.py
2020-05-05 18:32:48vstinnersetfiles: + demo.py

messages: + msg368195
2020-05-05 17:39:00vstinnersetmessages: + msg368184
2020-05-05 17:10:57aerossetnosy: + aeros
2020-05-05 15:23:08corona10setnosy: + corona10
2020-05-05 15:12:39vstinnersetnosy: + eric.snow
2020-05-05 13:14:21vstinnersetmessages: + msg368142
2020-05-05 12:57:18vstinnersetmessages: + msg368138
2020-05-05 12:51:15vstinnercreate