Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[subinterpreters] Meta issue: per-interpreter GIL #84692

Closed
vstinner opened this issue May 5, 2020 · 31 comments
Closed

[subinterpreters] Meta issue: per-interpreter GIL #84692

vstinner opened this issue May 5, 2020 · 31 comments
Labels
3.11 only security fixes topic-subinterpreters type-feature A feature request or enhancement

Comments

@vstinner
Copy link
Member

vstinner commented May 5, 2020

BPO 40512
Nosy @vstinner, @jparise, @encukou, @markshannon, @ericsnowcurrently, @ndjensen, @corona10, @shihai1991, @aeros, @erlend-aasland, @nw0
PRs
  • bpo-40512: Store pointer to interpreter state in a thread local variable #29228
  • Files
  • demo-pyperf.py
  • resolve_slotdups.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2020-05-05.12:51:15.115>
    labels = ['expert-subinterpreters', 'type-feature', '3.11']
    title = '[subinterpreters] Meta issue: per-interpreter GIL'
    updated_at = <Date 2022-03-06.03:51:12.908>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2022-03-06.03:51:12.908>
    actor = 'jon'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Subinterpreters']
    creation = <Date 2020-05-05.12:51:15.115>
    creator = 'vstinner'
    dependencies = []
    files = ['49128', '49700']
    hgrepos = []
    issue_num = 40512
    keywords = ['patch']
    message_count = 30.0
    messages = ['368136', '368138', '368142', '368184', '368195', '368203', '368206', '368210', '368272', '368310', '368670', '368839', '368908', '368914', '370608', '372253', '372773', '380102', '380108', '380323', '383780', '383830', '383831', '383875', '388746', '399847', '401117', '401127', '401130', '403727']
    nosy_count = 12.0
    nosy_names = ['vstinner', 'jon', 'petr.viktorin', 'Mark.Shannon', 'eric.snow', 'ndjensen', 'corona10', 'alex-garel', 'shihai1991', 'aeros', 'erlendaasland', 'nw0']
    pr_nums = ['29228']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue40512'
    versions = ['Python 3.11']

    @vstinner
    Copy link
    Member Author

    vstinner commented May 5, 2020

    To be able to run multiple (sub)interpreters in parallel, the unique global interpreter lock aka "GIL" should be replace with multiple GILs: one "GIL" per interpreter. The scope of such per-interpreter GIL would be a single interpreter.

    The current CPython code base is not fully read to have one GIL per interpreter. TODO:

    Until we can ensure that no Python object is shared between two interpreters, we might make PyObject.ob_refcnt, PyGC_Head (_gc_next and _gc_prev) and _dictkeysobject.dk_refcnt atomic.

    C extension modules should be modified as well:

    • Convert to PEP-489 multi-phase initialization
    • Replace globals ("static" variables) with a module state, or design a new "per-interpreter" local storage similar to Thread Local Storage (TLS). There is already PyInterpreterState.dict which is convenient to use in "Python" code, but it's not convenient to use in "C" code (code currently using "static int ..." for example).

    I'm not sure how to handle C extensions which are binding for a C library which has a state and so should not be used multiple times in parallel. Some C extensions use a "global lock" for that. The question is how to get

    Most of these tasks are already tracked in Eric Snow's "Multi Core Python" project:
    https://github.com/ericsnowcurrently/multi-core-python/issues

    This issue is related to PEP-554 "Multiple Interpreters in the Stdlib", but not required by this PEP.

    This issue is a tracker for sub-issues related to the goal "have one GIL per interpreter".

    --

    Some changes have a negative impact on "single threaded" Python application. Even if the overhead is low, one option to be able to move faster on this issue may be to add a new temporary configure option to have an opt-in build mode to better isolate subinterpreters. Examples:

    • disable pymalloc
    • atomic reference counters
    • disable free lists

    That would be a temporary solution to "unblock" the development on this list. For the long term, free lists should be made per-interpreter, pymalloc should support multiple interpreters, no Python object must be shared by two interpreters, etc.

    --

    One idea to detect if a Python object is shared by two interpreters *in debug mode* would be to store a reference to the interpreter which created it, and then check if the current interpreter is the same. If not, fail with a Python Fatal Error.

    --

    During Python 3.9 development cycle, many states moved from the global _PyRuntimeState to per-interpreter PyInterpreterState:

    Many corner cases related to daemon threads have also been fixed:

    And more code is now shared for the initialization and finalization of the main interpreter and subinterpreters (ex: see bpo-38858).

    Subinterpreters builtins and sys are now really isolated from the main interpreter (bpo-38858).

    --

    Obviously, there are likely tons of other issues which are not known at this stage. Again, this issue is a placeholder to track them all. It may be more efficient to create one sub-issue per sub-task, rather than discussing all tasks at the same place.

    @vstinner vstinner added interpreter-core (Objects, Python, Grammar, and Parser dirs) 3.9 only security fixes type-feature A feature request or enhancement labels May 5, 2020
    @vstinner
    Copy link
    Member Author

    vstinner commented May 5, 2020

    Move signals pending and gil_drop_request from _PyRuntimeState.ceval to PyInterpreterState.ceval: ericsnowcurrently/multi-core-python#34

    I created bpo-40513: "Move _PyRuntimeState.ceval to PyInterpreterState".

    @vstinner
    Copy link
    Member Author

    vstinner commented May 5, 2020

    Some changes have a negative impact on "single threaded" Python application. Even if the overhead is low, one option to be able to move faster on this issue may be to add a new temporary configure option to have an opt-in build mode to better isolate subinterpreters. (...)

    I created bpo-40514: "Add --experimental-isolated-subinterpreters build option".

    @vstinner
    Copy link
    Member Author

    vstinner commented May 5, 2020

    I created bpo-40522: "Subinterpreters: get the current Python interpreter state from Thread Local Storage (autoTSSkey)".

    @vstinner
    Copy link
    Member Author

    vstinner commented May 5, 2020

    Attached demo.py: benchmark to compare performance of sequential execution, threads and subinterpreters.

    @vstinner
    Copy link
    Member Author

    vstinner commented May 5, 2020

    (oops, there was a typo in my script: threads and subinterpreters was the same benchmark)

    @vstinner
    Copy link
    Member Author

    vstinner commented May 5, 2020

    Hum, demo.py is not reliable for threads: the standard deviation is quite large. I rewrote it using pyperf to compute the average and the standard deviation.

    @vstinner
    Copy link
    Member Author

    vstinner commented May 5, 2020

    I updated demo-pyperf.py to also benchmark multiprocessing.

    @vstinner
    Copy link
    Member Author

    vstinner commented May 6, 2020

    I created bpo-40533: "Subinterpreters: don't share Python objects between interpreters".

    @vstinner
    Copy link
    Member Author

    vstinner commented May 6, 2020

    See also bpo-39465: "Design a subinterpreter friendly alternative to _Py_IDENTIFIER". Currently, this C API is not compatible with subinterpreters.

    @vstinner
    Copy link
    Member Author

    "Static" types are shared by all interpreters. We should convert them to heap allocated types using PyType_FromSpec(), see:

    • bpo-40077: Convert static types to PyType_FromSpec()
    • bpo-40601: [C API] Hide static types from the limited C API

    @vstinner
    Copy link
    Member Author

    Add a lock to pymalloc, or disable pymalloc when subinterpreters are used: (...)

    By the way, tracemalloc is not compatible with subinterpreters.

    test.support.run_in_subinterp() skips the test if tracemalloc is tracing.

    @vstinner vstinner added topic-subinterpreters and removed interpreter-core (Objects, Python, Grammar, and Parser dirs) labels May 15, 2020
    @vstinner vstinner changed the title Meta issue: per-interpreter GIL [subinterpreters] Meta issue: per-interpreter GIL May 15, 2020
    @vstinner vstinner added topic-subinterpreters and removed interpreter-core (Objects, Python, Grammar, and Parser dirs) labels May 15, 2020
    @vstinner vstinner changed the title Meta issue: per-interpreter GIL [subinterpreters] Meta issue: per-interpreter GIL May 15, 2020
    @vstinner
    Copy link
    Member Author

    I marked bpo-36877 "[subinterpreters][meta] Move fields from _PyRuntimeState to PyInterpreterState" as a duplicate of this issue.

    @vstinner
    Copy link
    Member Author

    I created a new "Subinterpreters" component in the bug tracker. It may help to better track all issues related to subinterpreters.

    @vstinner
    Copy link
    Member Author

    vstinner commented Jun 2, 2020

    Currently, the import lock is shared by all interpreters. It would also help for performance to make it per-interpreter to parallelize imports.

    @vstinner vstinner added 3.10 only security fixes and removed 3.9 only security fixes labels Jun 2, 2020
    @vstinner vstinner removed the 3.9 only security fixes label Jun 2, 2020
    @vstinner
    Copy link
    Member Author

    Update of the EXPERIMENTAL_ISOLATED_SUBINTERPRETERS status.

    I made many free lists and singletons per interpreter in bpo-40521.

    TODO:

    • _PyUnicode_FromId() and interned strings are still shared: typeobject.c requires a workaround for that.
    • GC is disabled in subinterpreters since some objects are still shared
    • Type method cache is shared.
    • pymalloc is shared.
    • The GIL is shared.

    I'm investigating performance of my _PyUnicode_FromId() PR: #20058

    This PR now uses "atomic functions" proposed in a second PR: #20766

    The "atomic functions" avoids the need to have to declare a variable or a structure member as atomic, which would cause different issues if they are declared in Python public headers (which is the case for _Py_Identifier used by _PyUnicode_FromId()).

    @vstinner
    Copy link
    Member Author

    vstinner commented Jul 1, 2020

    Update of the EXPERIMENTAL_ISOLATED_SUBINTERPRETERS status.

    Also:

    • _PyLong_Zero and _PyLong_One singletons are shared
    • Py_None, Py_True and Py_False singletons are shared: bpo-39511 and PR 18301
    • Static types like PyUnicode_Type and PyLong_Type are shared: see bpo-40077 and bpo-40601
    • The dictionary of Unicode interned strings is shared: PR 20085
    • context.c: _token_missing singleton is shared
    • "struct _PyArg_Parser" generated by Argument Clinic is shared: see _PyArg_Fini()

    Misc notes:

    • init_interp_main(): if sys.warnoptions is not empty, "import warnings" is called to process these options, but not in subinterpreters: only in the main intepreter.
    • _PyImport_FixupExtensionObject() contains code specific to the main interpreter. Maybe this function will not longer be needed once builtin extension modules will be converted to PEP-489 "multiphase initialization" API. I'm not sure.

    @vstinner
    Copy link
    Member Author

    • _PyLong_Zero and _PyLong_One singletons are shared

    Removed by bpo-42161 (commit c310185).

    @vstinner
    Copy link
    Member Author

    FYI I'm also using https://pythondev.readthedocs.io/subinterpreters.html to track the progress on isolating subinterpreters.

    @vstinner
    Copy link
    Member Author

    vstinner commented Nov 4, 2020

    See also bpo-15751: "Make the PyGILState API compatible with subinterpreters".

    @vstinner
    Copy link
    Member Author

    Type method cache is shared.

    I created bpo-42745: "[subinterpreters] Make the type attribute lookup cache per-interpreter".

    @vstinner
    Copy link
    Member Author

    I played with ./configure --with-experimental-isolated-subinterpreters. I tried to run "pip list" in parallel in multiple interpreters.

    I hit multiple issues:

    • non-atomic reference count of Python objects shared by multiple interpreters, objects shared via static types for example.

    • resolve_slotdups() uses a static variable

    • pip requires _xxsubinterpreters.create(isolated=False): the vendored distro package runs the lsb_release command with subprocess.

    • Race conditions in PyType_Ready() on static types:

      • Objects/typeobject.c:5494: PyType_Ready: Assertion "(type->tp_flags & (1UL << 13)) == 0" failed
      • Race condition in add_subclass()
    • parser_init() doesn't support subinterpreters

    • unicode_dealloc() fails to delete an interned string in the Unicode interned dictionary => https://bugs.python.org/issue40521#msg383829

    To run "pip list", I used:

    CODE = """
    import runpy
    import sys
    import traceback
    sys.argv = ["pip", "list"]
    try:
        runpy.run_module("pip", run_name="__main__", alter_sys=True)
    except SystemExit:
        pass
    except Exception as exc:
        traceback.print_exc()
        print("BUG", exc)
        raise
    """

    @vstinner
    Copy link
    Member Author

    • resolve_slotdups() uses a static variable

    Attached resolve_slotdups.patch works around the issue by removing the cache.

    @vstinner
    Copy link
    Member Author

    FYI I wrote an article about this issue: "Isolate Python Subinterpreters"
    https://vstinner.github.io/isolate-subinterpreters.html

    @vstinner
    Copy link
    Member Author

    See bpo-43313: "feature: support pymalloc for subinterpreters. each subinterpreter has pymalloc_state".

    @vstinner
    Copy link
    Member Author

    PyStructSequence_InitType2() is not compatible with subinterpreters: it uses static types. Moreover, it allocates tp_members memory which is not released when the type is destroyed. But I'm not sure that the type is ever destroyed, since this API is designed for static types.

    @shihai1991
    Copy link
    Member

    PyStructSequence_InitType2() is not compatible with subinterpreters: it uses static types. Moreover, it allocates tp_members memory which is not released when the type is destroyed. But I'm not sure that the type is ever destroyed, since this API is designed for static types.

    IMO, I suggest to create a new function, PyStructSequence_FromModuleAndDesc(module, desc, flags) to create a heaptype and don't aloocates memory block for tp_members,something like 'PyType_FromModuleAndSpec()`.

    I don't know there have any block issue to do this converting operation. But I can take a look.

    @petr ping, Petr, do you have any better idea about this question :)

    @vstinner
    Copy link
    Member Author

    vstinner commented Sep 6, 2021

    Hai Shi:

    IMO, I suggest to create a new function, PyStructSequence_FromModuleAndDesc()

    Please create a new issue. If possible, I would prefer to have a sub-issue for that, to keep this issue as a tracking issue for all issues related to subinterpreters.

    @shihai1991
    Copy link
    Member

    bpo-45113: [subinterpreters][C API] Add a new function to create PyStructSequence from Heap.

    @shihai1991 shihai1991 added 3.11 only security fixes and removed 3.10 only security fixes labels Sep 6, 2021
    @encukou
    Copy link
    Member

    encukou commented Oct 12, 2021

    PyStructSequence_NewType exists, and is the same as the proposed PyStructSequence_FromModuleAndDesc except it doesn't take the module (which isn't necessary: PyStructSequence_Desc has no way to define functionality that would need the module state).

    @vstinner
    Copy link
    Member Author

    vstinner commented Nov 3, 2022

    Sadly, I don't have the bandwidth to work on this issue, so I just close it.

    @ericsnowcurrently is now working on https://peps.python.org/pep-0684/ which is a little bit different.

    @vstinner vstinner closed this as completed Nov 3, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 only security fixes topic-subinterpreters type-feature A feature request or enhancement
    Projects
    Status: Done
    Development

    No branches or pull requests

    3 participants