Message 368136 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	vstinner
Date	2020-05-05.12:51:13
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1588683075.13.0.0239787407564.issue40512@roundup.psfhosted.org>
In-reply-to

Content
To be able to run multiple (sub)interpreters in parallel, the unique global interpreter lock aka "GIL" should be replace with multiple GILs: one "GIL" per interpreter. The scope of such per-interpreter GIL would be a single interpreter. The current CPython code base is not fully read to have one GIL per interpreter. TODO: * Move signals pending and gil_drop_request from _PyRuntimeState.ceval to PyInterpreterState.ceval: https://github.com/ericsnowcurrently/multi-core-python/issues/34 * Add a lock to pymalloc, or disable pymalloc when subinterpreters are used: https://github.com/ericsnowcurrently/multi-core-python/issues/30 * Make free lists per interpreters: tuple, dict, frame, etc. * Make Unicode interned strings per interpreter * Make Unicode latin1 single character string singletons per interpreter * None, True, False, ... singletons: make them per-interpreter (bpo-39511) or immortal (bpo-40255) * etc. Until we can ensure that no Python object is shared between two interpreters, we might make PyObject.ob_refcnt, PyGC_Head (_gc_next and _gc_prev) and _dictkeysobject.dk_refcnt atomic. C extension modules should be modified as well: * Convert to PEP 489 multi-phase initialization * Replace globals ("static" variables) with a module state, or design a new "per-interpreter" local storage similar to Thread Local Storage (TLS). There is already PyInterpreterState.dict which is convenient to use in "Python" code, but it's not convenient to use in "C" code (code currently using "static int ..." for example). I'm not sure how to handle C extensions which are binding for a C library which has a state and so should not be used multiple times in parallel. Some C extensions use a "global lock" for that. The question is how to get Most of these tasks are already tracked in Eric Snow's "Multi Core Python" project: https://github.com/ericsnowcurrently/multi-core-python/issues This issue is related to PEP 554 "Multiple Interpreters in the Stdlib", but not required by this PEP. This issue is a tracker for sub-issues related to the goal "have one GIL per interpreter". -- Some changes have a negative impact on "single threaded" Python application. Even if the overhead is low, one option to be able to move faster on this issue may be to add a new temporary configure option to have an opt-in build mode to better isolate subinterpreters. Examples: * disable pymalloc * atomic reference counters * disable free lists That would be a temporary solution to "unblock" the development on this list. For the long term, free lists should be made per-interpreter, pymalloc should support multiple interpreters, no Python object must be shared by two interpreters, etc. -- One idea to detect if a Python object is shared by two interpreters in debug mode would be to store a reference to the interpreter which created it, and then check if the current interpreter is the same. If not, fail with a Python Fatal Error. -- During Python 3.9 development cycle, many states moved from the global _PyRuntimeState to per-interpreter PyInterpreterState: * GC state (bpo-36854) * warnings state (bpo-36737) * small integer singletons (bpo-38858) * parser state (bpo-36876) * ceval pending calls and "eval breaker" (bpo-39984) * etc. Many corner cases related to daemon threads have also been fixed: * https://vstinner.github.io/daemon-threads-python-finalization-python32.html * https://vstinner.github.io/threading-shutdown-race-condition.html * https://vstinner.github.io/gil-bugfixes-daemon-threads-python39.html And more code is now shared for the initialization and finalization of the main interpreter and subinterpreters (ex: see bpo-38858). Subinterpreters builtins and sys are now really isolated from the main interpreter (bpo-38858). -- Obviously, there are likely tons of other issues which are not known at this stage. Again, this issue is a placeholder to track them all. It may be more efficient to create one sub-issue per sub-task, rather than discussing all tasks at the same place.

To be able to run multiple (sub)interpreters in parallel, the unique global interpreter lock aka "GIL" should be replace with multiple GILs: one "GIL" per interpreter. The scope of such per-interpreter GIL would be a single interpreter.

The current CPython code base is not fully read to have one GIL per interpreter. TODO:

* Move signals pending and gil_drop_request from _PyRuntimeState.ceval to PyInterpreterState.ceval: https://github.com/ericsnowcurrently/multi-core-python/issues/34
* Add a lock to pymalloc, or disable pymalloc when subinterpreters are used: https://github.com/ericsnowcurrently/multi-core-python/issues/30
* Make free lists per interpreters: tuple, dict, frame, etc.
* Make Unicode interned strings per interpreter
* Make Unicode latin1 single character string singletons per interpreter
* None, True, False, ... singletons: make them per-interpreter (bpo-39511) or immortal (bpo-40255)
* etc.

Until we can ensure that no Python object is shared between two interpreters, we might make PyObject.ob_refcnt, PyGC_Head (_gc_next and _gc_prev) and _dictkeysobject.dk_refcnt atomic.

C extension modules should be modified as well:

* Convert to PEP 489 multi-phase initialization
* Replace globals ("static" variables) with a module state, or design a new "per-interpreter" local storage similar to Thread Local Storage (TLS). There is already PyInterpreterState.dict which is convenient to use in "Python" code, but it's not convenient to use in "C" code (code currently using "static int ..." for example).

I'm not sure how to handle C extensions which are binding for a C library which has a state and so should not be used multiple times in parallel. Some C extensions use a "global lock" for that. The question is how to get

Most of these tasks are already tracked in Eric Snow's "Multi Core Python" project:
https://github.com/ericsnowcurrently/multi-core-python/issues

This issue is related to PEP 554 "Multiple Interpreters in the Stdlib", but not required by this PEP.

This issue is a tracker for sub-issues related to the goal "have one GIL per interpreter".

Some changes have a negative impact on "single threaded" Python application. Even if the overhead is low, one option to be able to move faster on this issue may be to add a new temporary configure option to have an opt-in build mode to better isolate subinterpreters. Examples:

* disable pymalloc
* atomic reference counters
* disable free lists

That would be a temporary solution to "unblock" the development on this list. For the long term, free lists should be made per-interpreter, pymalloc should support multiple interpreters, no Python object must be shared by two interpreters, etc.

One idea to detect if a Python object is shared by two interpreters *in debug mode* would be to store a reference to the interpreter which created it, and then check if the current interpreter is the same. If not, fail with a Python Fatal Error.

During Python 3.9 development cycle, many states moved from the global _PyRuntimeState to per-interpreter PyInterpreterState:

* GC state (bpo-36854)
* warnings state (bpo-36737)
* small integer singletons (bpo-38858)
* parser state (bpo-36876)
* ceval pending calls and "eval breaker" (bpo-39984)
* etc.

Many corner cases related to daemon threads have also been fixed:

* https://vstinner.github.io/daemon-threads-python-finalization-python32.html
* https://vstinner.github.io/threading-shutdown-race-condition.html
* https://vstinner.github.io/gil-bugfixes-daemon-threads-python39.html

And more code is now shared for the initialization and finalization of the main interpreter and subinterpreters (ex: see bpo-38858).

Subinterpreters builtins and sys are now really isolated from the main interpreter (bpo-38858).

Obviously, there are likely tons of other issues which are not known at this stage. Again, this issue is a placeholder to track them all. It may be more efficient to create one sub-issue per sub-task, rather than discussing all tasks at the same place.

History
Date	User	Action	Args
2020-05-05 12:51:15	vstinner	set	recipients: + vstinner
2020-05-05 12:51:15	vstinner	set	messageid: <1588683075.13.0.0239787407564.issue40512@roundup.psfhosted.org>
2020-05-05 12:51:15	vstinner	link	issue40512 messages
2020-05-05 12:51:13	vstinner	create