Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a cross-interpreter-safe mechanism to indicate that an object may be destroyed. #77789

Closed
ericsnowcurrently opened this issue May 22, 2018 · 82 comments
Assignees
Labels
3.9 only security fixes topic-subinterpreters

Comments

@ericsnowcurrently
Copy link
Member

BPO 33608
Nosy @nascheme, @ncoghlan, @vstinner, @phsilva, @pmp-p, @markshannon, @ericsnowcurrently, @serhiy-storchaka, @soltysh, @1st1, @koobs, @zooba, @phmc, @emilyemorehouse, @jdahlin
PRs
  • [WIP] bpo-33608: Factor out a private, per-interpreter _Py_AddPendingCall(). #9334
  • bpo-33608: Factor out a private, per-interpreter _Py_AddPendingCall(). #11617
  • bpo-33608: Factor out a private, per-interpreter _Py_AddPendingCall(). #11617
  • bpo-33608: Factor out a private, per-interpreter _Py_AddPendingCall(). #11617
  • bpo-33608: Use _Py_AddPendingCall() in _PyCrossInterpreterData_Release(). #12024
  • bpo-33608: Simplify DISPATCH by hoisting eval_breaker ahead of time. #12062
  • Revert: bpo-33608: Factor out a private, per-interpreter _Py_AddPendingCall(). (GH-11617) #12159
  • bpo-33608: Fix the Py_atomic_* macros. #12240
  • bpo-33608: Simplify DISPATCH by hoisting eval_breaker ahead of time. #12243
  • bpo-33608: Make sure locks in the runtime are properly re-created.  #12245
  • bpo-33608: Deal with pending calls relative to runtime shutdown. #12246
  • bpo-33608: Minor cleanup related to pending calls. #12247
  • bpo-33608: Fix PyEval_InitThreads() warning #12346
  • bpo-33608: Factor out a private, per-interpreter _Py_AddPendingCall(). #12360
  • bpo-33608: make arm macros match x86 macros #12665
  • bpo-33608: Revert "Factor out a private, per-interpreter _Py_AddPendingCall()." #12806
  • bpo-33608: Normalize atomic macros so that they all expect an atomic struct #12877
  • bpo-33608: Factor out a private, per-interpreter _Py_AddPendingCall(). #13714
  • Revert "bpo-33608: Factor out a private, per-interpreter _Py_AddPendingCall(). (gh-13714)" #13780
  • Files
  • managers.patch
  • take_gil.assert.patch
  • fini_crash.c
  • wrap_threadstate.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ericsnowcurrently'
    closed_at = <Date 2020-04-08.21:58:41.738>
    created_at = <Date 2018-05-22.19:34:41.672>
    labels = ['expert-subinterpreters', '3.9']
    title = 'Add a cross-interpreter-safe mechanism to indicate that an object may be destroyed.'
    updated_at = <Date 2020-06-03.16:43:06.602>
    user = 'https://github.com/ericsnowcurrently'

    bugs.python.org fields:

    activity = <Date 2020-06-03.16:43:06.602>
    actor = 'vstinner'
    assignee = 'eric.snow'
    closed = True
    closed_date = <Date 2020-04-08.21:58:41.738>
    closer = 'vstinner'
    components = ['Subinterpreters']
    creation = <Date 2018-05-22.19:34:41.672>
    creator = 'eric.snow'
    dependencies = []
    files = ['48333', '48334', '48731', '48732']
    hgrepos = []
    issue_num = 33608
    keywords = ['patch']
    message_count = 82.0
    messages = ['317333', '317334', '317344', '317404', '325481', '325548', '336490', '336544', '336596', '336602', '336603', '336677', '336687', '336897', '336948', '336951', '336952', '336975', '337075', '337096', '337110', '337115', '337116', '337117', '337159', '337179', '337218', '337235', '337554', '337557', '337996', '338039', '340057', '340069', '340070', '340071', '340075', '340077', '340079', '340082', '340133', '340139', '340143', '340145', '340666', '340768', '342791', '342792', '344210', '344211', '344240', '344244', '344245', '344355', '344372', '344417', '344429', '344434', '344435', '344438', '344441', '344442', '344443', '344444', '344445', '344446', '357169', '357170', '357179', '358364', '358367', '358748', '364670', '365160', '365163', '365178', '366009', '366013', '366016', '366017', '366019', '366027']
    nosy_count = 16.0
    nosy_names = ['nascheme', 'ncoghlan', 'vstinner', 'phsilva', 'pmpp', 'Mark.Shannon', 'eric.snow', 'serhiy.storchaka', 'maciej.szulik', 'yselivanov', 'koobs', 'steve.dower', 'pconnell', 'emilyemorehouse', 'Johan Dahlin', 'shprotx']
    pr_nums = ['9334', '11617', '11617', '11617', '12024', '12062', '12159', '12240', '12243', '12245', '12246', '12247', '12346', '12360', '12665', '12806', '12877', '13714', '13780']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue33608'
    versions = ['Python 3.9']

    @ericsnowcurrently
    Copy link
    Member Author

    In order to keep subinterpreters properly isolated, objects
    from one interpreter should not be used in C-API calls in
    another interpreter. That is relatively straight-forward
    except in one case: indicating that the other interpreter
    doesn't need the object to exist any more (similar to
    PyBuffer_Release() but more general). I consider the
    following solutions to be the most viable. Both make use
    of recounts to protect cross-interpreter usage (with incref
    before sharing).

    1. manually switch interpreters (new private function)
      a. acquire the GIL
      b. if refcount > 1 then decref and release the GIL
      c. switch
      d. new thread (or re-use dedicated one)
      e. decref
      f. kill thread
      g. switch back
      h. release the GIL
    2. change pending call mechanism (see Py_AddPendingCall) to
      per-interpreter instead of global (add "interp" arg to
      signature of new private C-API function)
      a. queue a function that decrefs the object
    3. new cross-interpreter-specific private C-API function
      a. queue the object for decref (a la Py_AddPendingCall)
      in owning interpreter

    I favor #2, since it is more generally applicable. #3 would
    probably be built on #2 anyway. #1 is relatively inefficient.
    With #2, Py_AddPendingCall() would become a simple wrapper
    around the new private function.

    @ericsnowcurrently
    Copy link
    Member Author

    As a lesser (IMHO) alternative, we could also modify Py_DECREF
    to respect a new "shared" marker of some sort (perhaps relative
    to bpo-33607), but that would probably still require one of the
    refcount-based solutions (and add a branch to Py_DECREF).

    @ericsnowcurrently ericsnowcurrently added the 3.8 only security fixes label May 22, 2018
    @vstinner
    Copy link
    Member

    "That is relatively straight-forward except in one case: indicating that the other interpreter doesn't need the object to exist any more (similar to PyBuffer_Release() but more general)"

    Why an interpreter would access an object from a different interpreter? Each interpreter should have its own object space, no?

    @vstinner vstinner changed the title Add a cross-interpreter-safe mechanism to indicate that an object may be destroyed. [subinterpreters] Add a cross-interpreter-safe mechanism to indicate that an object may be destroyed. May 22, 2018
    @ncoghlan
    Copy link
    Contributor

    Adding a low level callback based mechanism to ask another interpreter to do work seems like a good way to go to me.

    As an example of why that might be needed, consider the case of sharing a buffer exporting object with another subinterpreter: when the memoryview in the subinterpreter is destroyed, it needs to request that the buffer view be released in the source interpreter that actually owns the original object.

    @nascheme
    Copy link
    Member

    I would suggest that sharing of objects between interpreters should be stamped out. Could we have some #ifdef debug checking that would warn or assert so this doesn't happen? I know currently we share a lot of objects. However, in the long term, that does not seem like the correct design. Instead, each interpreter should have its own objects and any passing or sharing should be tightly controlled.

    @ericsnowcurrently
    Copy link
    Member Author

    @neil, we're definitely on the same page. In fact, in a world where subinterpreters do not share a GIL, we can't ever use an object in one interpreter that was created in another (due to thread safety on refcounts). The role of "tightly controlling" passing/sharing objects (for a very loose definition of "sharing") falls to the channels described in PEP-554. [1]

    However, there are several circumstances where interpreters may collaborate that involves one holding a reference (but not using it) to an object owned by the other. For instance, see PyBuffer_Release(). [2] This issue is about addressing that situation safely. It is definitely not about safely using objects from other interpreters.

    [1] The low-level implementation, including channels, already exists in Modules/_xxsubinterpretersmodule.c.
    [2] https://docs.python.org/3/c-api/buffer.html#c.PyBuffer_Release

    @ericsnowcurrently ericsnowcurrently self-assigned this Sep 17, 2018
    @ericsnowcurrently
    Copy link
    Member Author

    New changeset ef4ac96 by Eric Snow in branch 'master':
    bpo-33608: Factor out a private, per-interpreter _Py_AddPendingCall(). (GH-11617)
    ef4ac96

    @ericsnowcurrently
    Copy link
    Member Author

    #55826 has introduces a slight performance regression which will need to be resolved. If I can't find the problem right away then I'll probably temporarily revert the commit.

    See https://mail.python.org/pipermail/python-dev/2019-February/156451.html.

    @ericsnowcurrently
    Copy link
    Member Author

    At least, it *might* be a performance regression. Here are the two commits I tried:

    after: ef4ac96 (PR bpo-11617, from this issue)
    before: 463572c (the one right before)

    After building each (a normal build, not debug), I ran the micro-benchmark Raymond referred to ("./python Tools/scripts/var_access_benchmark.py") multiple times. There was enough variability between runs from the same commit that I'm uncertain at this point if there really is any performance regression. For the most part the results between the two commits are really close. Also, the results from the "after" commit fall within the variability I've seen between runs of the "before" commit.

    I'm going to try with the "performance" suite (https://github.com/python/performance) to see if it shows any regression.

    @ericsnowcurrently
    Copy link
    Member Author

    Here's what I did (more or less):

    $ python3 -m pip install --user perf
    $ python3 -m perf system tune
    $ git clone git@github.com:python/performance.git
    $ git clone git@github.com:python/cpython.git
    $ cd cpython
    $ git checkout ef4ac967e2f3a9a18330cc6abe14adb4bc3d0465
    $ ./configure
    $ make -j8
    $ ./python ../performance/pyperformance run --fast --output speed.after
    $ git checkout HEAD^
    $ make -j8
    $ ./python ../performance/pyperformance run --fast --output speed.before
    $ ./python ../performance/pyperformance compare speed.before speed.after -O table

    Here are my results:

    speed.before
    ============

    Performance version: 0.7.0
    Report on Linux-4.15.0-45-generic-x86_64-with-glibc2.26
    Number of logical CPUs: 4
    Start date: 2019-02-25 20:39:44.151699
    End date: 2019-02-25 20:48:04.817516

    speed.after
    ===========

    Performance version: 0.7.0
    Report on Linux-4.15.0-45-generic-x86_64-with-glibc2.26
    Number of logical CPUs: 4
    Start date: 2019-02-25 20:29:56.610702
    End date: 2019-02-25 20:38:11.667461

    +-------------------------+--------------+-------------+--------------+-----------------------+
    | Benchmark | speed.before | speed.after | Change | Significance |
    +=========================+==============+=============+==============+=======================+
    | 2to3 | 447 ms | 440 ms | 1.02x faster | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | chaos | 155 ms | 156 ms | 1.01x slower | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | crypto_pyaes | 154 ms | 152 ms | 1.01x faster | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | deltablue | 10.7 ms | 10.5 ms | 1.01x faster | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | django_template | 177 ms | 172 ms | 1.03x faster | Significant (t=3.66) |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | dulwich_log | 105 ms | 106 ms | 1.01x slower | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | fannkuch | 572 ms | 573 ms | 1.00x slower | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | float | 149 ms | 146 ms | 1.02x faster | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | go | 351 ms | 349 ms | 1.00x faster | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | hexiom | 14.6 ms | 14.4 ms | 1.01x faster | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | html5lib | 126 ms | 122 ms | 1.03x faster | Significant (t=3.46) |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | json_dumps | 17.6 ms | 17.2 ms | 1.02x faster | Significant (t=2.65) |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | json_loads | 36.4 us | 36.1 us | 1.01x faster | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | logging_format | 15.2 us | 14.9 us | 1.02x faster | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | logging_silent | 292 ns | 296 ns | 1.01x slower | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | logging_simple | 13.7 us | 13.4 us | 1.02x faster | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | mako | 22.9 ms | 22.5 ms | 1.02x faster | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | meteor_contest | 134 ms | 134 ms | 1.00x faster | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | nbody | 157 ms | 161 ms | 1.03x slower | Significant (t=-3.85) |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | nqueens | 134 ms | 132 ms | 1.02x faster | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | pathlib | 30.1 ms | 31.0 ms | 1.03x slower | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | pickle | 11.5 us | 11.6 us | 1.01x slower | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | pickle_dict | 29.5 us | 30.5 us | 1.03x slower | Significant (t=-6.37) |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | pickle_list | 4.54 us | 4.58 us | 1.01x slower | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | pickle_pure_python | 652 us | 651 us | 1.00x faster | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | pidigits | 212 ms | 215 ms | 1.02x slower | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | python_startup | 11.6 ms | 11.5 ms | 1.01x faster | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | python_startup_no_site | 8.07 ms | 7.95 ms | 1.01x faster | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | raytrace | 729 ms | 722 ms | 1.01x faster | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | regex_compile | 249 ms | 247 ms | 1.01x faster | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | regex_dna | 203 ms | 204 ms | 1.00x slower | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | regex_effbot | 3.55 ms | 3.55 ms | 1.00x faster | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | regex_v8 | 28.3 ms | 28.3 ms | 1.00x slower | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | richards | 105 ms | 105 ms | 1.00x faster | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | scimark_fft | 429 ms | 436 ms | 1.01x slower | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | scimark_lu | 238 ms | 237 ms | 1.00x faster | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | scimark_monte_carlo | 144 ms | 139 ms | 1.04x faster | Significant (t=3.61) |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | scimark_sor | 265 ms | 260 ms | 1.02x faster | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | scimark_sparse_mat_mult | 5.41 ms | 5.25 ms | 1.03x faster | Significant (t=4.26) |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | spectral_norm | 174 ms | 173 ms | 1.01x faster | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | sqlalchemy_declarative | 216 ms | 214 ms | 1.01x faster | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | sqlalchemy_imperative | 41.6 ms | 41.2 ms | 1.01x faster | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | sqlite_synth | 3.99 us | 3.91 us | 1.02x faster | Significant (t=2.49) |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | sympy_expand | 559 ms | 559 ms | 1.00x faster | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | sympy_integrate | 25.2 ms | 25.3 ms | 1.00x slower | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | sympy_str | 261 ms | 260 ms | 1.00x faster | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | sympy_sum | 136 ms | 138 ms | 1.01x slower | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | telco | 8.36 ms | 8.32 ms | 1.00x faster | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | tornado_http | 273 ms | 271 ms | 1.01x faster | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | unpack_sequence | 58.8 ns | 60.0 ns | 1.02x slower | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | unpickle | 21.5 us | 21.5 us | 1.00x faster | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | unpickle_list | 5.60 us | 5.55 us | 1.01x faster | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | unpickle_pure_python | 497 us | 481 us | 1.03x faster | Significant (t=5.04) |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | xml_etree_generate | 141 ms | 140 ms | 1.01x faster | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | xml_etree_iterparse | 131 ms | 133 ms | 1.01x slower | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | xml_etree_parse | 186 ms | 187 ms | 1.00x slower | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+
    | xml_etree_process | 115 ms | 113 ms | 1.01x faster | Not significant |
    +-------------------------+--------------+-------------+--------------+-----------------------+

    @ericsnowcurrently
    Copy link
    Member Author

    ...so it doesn't appear that my PR introduces a performance regression.

    @vstinner
    Copy link
    Member

    ...so it doesn't appear that my PR introduces a performance regression.

    IMHO there is no performance regression at all. Just noice in the result which doesn't come with std dev.

    @ericsnowcurrently
    Copy link
    Member Author

    FYI, I have a couple of small follow-up changes to land before I close this issue.

    @db3l
    Copy link
    Contributor

    db3l commented Mar 1, 2019

    I suspect changes for this issue may be creating test_io failures on my windows builders, most notably my x86 Windows 7 builder where test_io has been failing both attempts on almost all builds. It fails with a lock failure during interpreter shutdown, and commit ef4ac96 appears the most plausible commit out of the set introduced at the first failing build on Feb 24.

    See https://buildbot.python.org/all/#/builders/58/builds/1977 for the first failure. test_io has failed both attempts on all but 3 of the subsequent 16 tests of the 3.x branch.

    It might be worth noting that this builder is slow, so if there are timeouts involved or a race condition of any sort it might be triggering it more readily than other builders. I do, however, see the same failures - albeit less frequently and not usually on both tries - on the Win8 and Win10 builders.

    For what it's worth one other oddity is that while having test_multiprocessing* failures are relatively common on the Win7 builder during the first round of tests due to exceeding the timeout, they usually pass on the retry, but over this same time frame have begun failing - not due to timeout - even on the second attempt, which is unusual. It might be coincidental but similar failures are also showing up sporadically on the Win8/Win10 builders where such failures are not common at all.

    @ericsnowcurrently
    Copy link
    Member Author

    New changeset b05b711 by Eric Snow in branch 'master':
    bpo-33608: Use _Py_AddPendingCall() in _PyCrossInterpreterData_Release(). (gh-12024)
    b05b711

    @ericsnowcurrently
    Copy link
    Member Author

    New changeset bda918b by Eric Snow in branch 'master':
    bpo-33608: Simplify ceval's DISPATCH by hoisting eval_breaker ahead of time. (gh-12062)
    bda918b

    @ericsnowcurrently
    Copy link
    Member Author

    I've merged the PR for hoisting eval_breaker before the ceval loop starts. @Raymond, see if that addresses the performance regression you've been seeing.

    I'll close this issue after I've sorted out the buildbot issues David mentioned (on his Windows builders).

    @db3l
    Copy link
    Contributor

    db3l commented Mar 2, 2019

    If I can help with testing or builder access or anything just let me know. It appears that I can pretty reliably trigger the error through just the individual tests (test_daemon_threads_shutdown_std{out,err}_deadlock) in isolation, which is definitely less tedious than a full buildbot run to test a change.

    BTW, while not directly related since it was only just merged, and I won't pretend to have any real understanding of the changes here, I do have a question about PR 12024 ... it appears HEAD_UNLOCK is used twice in _PyInterpreterState_LookUpID. Shouldn't one of those be HEAD_LOCK?

    @vstinner
    Copy link
    Member

    vstinner commented Mar 4, 2019

    I suspect changes for this issue may be creating test_io failures on my windows builders, (...)

    I created bpo-36177 to track this regression.

    @vstinner
    Copy link
    Member

    vstinner commented Mar 4, 2019

    The commit ef4ac96 introduced a regression in test.test_multiprocessing_spawn.WithThreadsTestManagerRestart.test_rapid_restart: bpo-36114.

    Would it be possible to revert it until the bug is properly understood and fixed?

    @vstinner
    Copy link
    Member

    vstinner commented Mar 4, 2019

    The commit ef4ac96 introduced a regression in test.test_multiprocessing_spawn.WithThreadsTestManagerRestart.test_rapid_restart: bpo-36114.

    There is a very similar bug on Windows: bpo-36116.

    I'm now also quite confident that bpo-36177 is also a regression caused by this issue.

    @vstinner
    Copy link
    Member

    vstinner commented Mar 4, 2019

    New changeset 4d61e6e by Victor Stinner in branch 'master':
    Revert: bpo-33608: Factor out a private, per-interpreter _Py_AddPendingCall(). (GH-11617) (GH-12159)
    4d61e6e

    @vstinner
    Copy link
    Member

    vstinner commented Mar 4, 2019

    Hi Eric, I'm sorry but I had to revert your recent work. It introduced regressions in at least in test_io and test_multiprocessing_spawn on Windows and FreeBSD. I simply don't have the bandwidth to investigate this regression yet, whereas we really need the CI to remain stable to catch other regressions (the master branch received multiple new features recently, and there are some other regressions by that way ;-)).

    test_multiprocessing_spawn is *very* hard to reproduce on FreeBSD, but I can reliably reproduce it. It just takes time. The issue is a crash producing a coredump. I consider that the bug is important enough to justify a revert.

    The revert is an opportunity to seat down and track the root cause rather than urging to push a "temporary" quickfix. This bug seems to be very deep in the Python internals: thread state during Python shutdown. So it will take time to fully understand it and fix it (or redesign your recent changes, I don't know).

    Tell me if you need my help to reproduce the bug. The bug has been seen on FreeBSD but also Windows:

    -- The Night's Watch

    @vstinner
    Copy link
    Member

    vstinner commented Mar 4, 2019

    For curious people, Pablo Galindo spent a few hours to investigate https://bugs.python.org/issue36114 and I spent a few more hours to finish the analysis. The bug indirectly crashed my laptop :-)
    https://twitter.com/VictorStinner/status/1102528982036201478

    That's what I mean by "I don't have the bandwidth": we already spent hours to identify the regression ;-)

    @ericsnowcurrently
    Copy link
    Member Author

    That's okay, Victor. Thanks for jumping on this. I'll take a look when I get a chance.

    @vstinner
    Copy link
    Member

    vstinner commented Mar 5, 2019

    That's okay, Victor. Thanks for jumping on this. I'll take a look when I get a chance.

    From what I saw, your first commit was enough to reproduce the crash.

    If I recall correctly, Antoine Pitrou modified the GIL so threads exit immediately when Py_Finalize() is called. I'm thinking at:

    void
    PyEval_RestoreThread(PyThreadState *tstate)
    {
        ...
        take_gil(tstate);
        /* _Py_Finalizing is protected by the GIL */
        if (_Py_IsFinalizing() && !_Py_CURRENTLY_FINALIZING(tstate)) {
            drop_gil(tstate);
            PyThread_exit_thread();
            Py_UNREACHABLE();
        }
        ...
        PyThreadState_Swap(tstate);
    }

    Problem: this code uses tstate, whereas the crash occurred because tstate pointed to freed memory:

    """
    Thread 1 got the crash

    (gdb) p *tstate
    $3 = {
    prev = 0xdbdbdbdbdbdbdbdb,
    next = 0xdbdbdbdbdbdbdbdb,
    interp = 0xdbdbdbdbdbdbdbdb,
    ...
    }

    ...

    Thread 1 (LWP 100696):
    #0 0x0000000000368210 in take_gil (tstate=0x8027e2050) at Python/ceval_gil.h:216
    #1 0x0000000000368a94 in PyEval_RestoreThread (tstate=0x8027e2050) at Python/ceval.c:281
    ...
    """

    https://bugs.python.org/issue36114#msg337090

    When this crash occurred, Py_Finalize() already completed in the main thread!

    """
    void _Py_NO_RETURN
    Py_Exit(int sts)
    {
        if (Py_FinalizeEx() < 0) {  /* <==== DONE! */
            sts = 120;
        }
    exit(sts);    /* \<=============== Crash occurred here! \*/
    

    }
    """

    Py_Finalize() is supposed to wait for threads before deleting Python thread states:

    """
    int
    Py_FinalizeEx(void)
    {
        ...
    /* The interpreter is still entirely intact at this point, and the
     * exit funcs may be relying on that.  In particular, if some thread
     * or exit func is still waiting to do an import, the import machinery
     * expects Py_IsInitialized() to return true.  So don't say the
     * interpreter is uninitialized until after the exit funcs have run.
     * Note that Threading.py uses an exit func to do a join on all the
     * threads created thru it, so this also protects pending imports in
     * the threads created via Threading.
     */
    call_py_exitfuncs(interp);
    
    ...
    
    /* Remaining threads (e.g. daemon threads) will automatically exit
       after taking the GIL (in PyEval_RestoreThread()). */
    _PyRuntime.finalizing = tstate;
    _PyRuntime.initialized = 0;
    _PyRuntime.core_initialized = 0;
    ...
    
        /* Delete current thread. After this, many C API calls become crashy. */
        PyThreadState_Swap(NULL);
    
        PyInterpreterState_Delete(interp);
    ...
    

    }
    """

    The real problem for years are *deamon threads* which... BY DESIGN... remain alive after Py_Finalize() exit! But as I explained, they must exit as soon at they attempt to get GIL.

    @ericsnowcurrently
    Copy link
    Member Author

    Thanks for taking a look Victor! That info is helpful. It will likely be a few days before I can sort this out.

    Once I have addressed the problem I'll re-merge. I plan on using the "buildbot-custom" branch to make sure the buildbots are happy with the change before making that PR.

    @ericsnowcurrently
    Copy link
    Member Author

    FWIW, the failures before were in test_io and test_multiprocessing_spawn, not test_asyncio.

    @ericsnowcurrently
    Copy link
    Member Author

    I had really hoped to get this in for 3.8, as I may need it. However, realistically I doubt it will take just a few minutes to resolve this. With the beta 1 targeted for today it makes sense to revert my change. :(

    (I may still try to convince our fearless RM to allow it in beta 2, but we'll see.)

    @vstinner
    Copy link
    Member

    vstinner commented Jun 3, 2019

    FWIW, the failures before were in test_io and test_multiprocessing_spawn,
    not test_asyncio.

    I am not aware of a test_asyncio issue related to this issue, only crashes
    in multiprocessing. I bet on a similar crash than the previous attempt,
    during shutdown.

    @vstinner
    Copy link
    Member

    vstinner commented Jun 3, 2019

    New changeset e225beb by Victor Stinner in branch 'master':
    Revert "bpo-33608: Factor out a private, per-interpreter _Py_AddPendingCall(). (gh-13714)" (GH-13780)
    e225beb

    @vstinner
    Copy link
    Member

    vstinner commented Jun 3, 2019

    What annoy me the most in this issue is that I am unable to reproduce it. Even directly on the FreeBSD CURRENT buildbot, I failed to reproduce rhe crash even when I stressed the buildbot and the test.

    On the other side, when the CI runs the test suite, crashes are likely!?

    These bugs are the worst.

    IMHO we must make multiprocessing more deterministic. Python 3.8 is better, but there are still many known issues.
    https://pythondev.readthedocs.io/unstable_tests.html

    test_venv multiprocessing test misuses the multiprocessing issue, it emirs a ResourceWarning.

    @vstinner
    Copy link
    Member

    vstinner commented Jun 3, 2019

    My bet is that multiprocessing triggers a bug in daemon threads. Multiprocessing is using and abusing daemon threads. For example, as I wrote, test_venv misuses the multiprocessing API and implicitly let Python release (or maybe not!) "resources" where "resources" can be processes and threads, likely including daemon threads.

    Eric: I would love to give you access to a VM where I can reproduce the bug but... I failed to reproduce the bug today :-(

    I would suggest to write a stress test for daemon threads:

    • spawn daemon threads: 4x times threads than CPU to stress the system
    • each thread would sleep a random number of milliseconds and then execute a few pure Python instructions
    • spawns these threads, wait a random number of milliseconds, and then "exit Python"

    The race condition is that daemon threads may or maybe run during Python finalization depending on the delay.

    Maybe you can make the crash more likely by adding a sleep of a few *seconds* before the final exit. For example, at the exit of Py_RunMain().

    Last time I saw the crash, it was a thread which uses a Python structure whereas the memory of the structure was freed (filled with a random pattern by debug hooks on memory allocators).

    Ah, either use a debug build of Python, or use PYTHONMALLOC=debug, or -X dev command line option, to fill freed memory with a specific pattern, to trigger the crash.

    ... Good luck.

    @ericsnowcurrently
    Copy link
    Member Author

    Well, for this PR I actually disabled the execution of pending calls during runtime finialization. I'd hoped it would help, but it did not. :(

    Regardless, I'll need to look even more closely at what is different during runtime finalization with my PR.

    @ericsnowcurrently
    Copy link
    Member Author

    Also, thanks for the suggestions, Victor!

    @phmc
    Copy link
    Mannequin

    phmc mannequin commented Nov 21, 2019

    Based on Victor's info from https://bugs.python.org/issue36114#msg337090 I believe the crash is essentially what's reproduced in the attached program.

    From the root of a (built) cpython clone run:

    gcc -c -o fini_crash.o -IInclude -I. fini_crash.c && gcc -o fini_crash fini_crash.o libpython3.9.a -lcrypt -lpthread -ldl -lutil -lm && ./fini_crash

    The output should be:

    MAIN: allow other thread to execute
    OTHER: acquired GIL
    OTHER: released GIL
    MAIN: interpreter finalized
    OTHER: attempt to acquire GIL...crash!
    [1] 266749 segmentation fault (core dumped) ./fini_crash

    And running it through valgrind:

    $ valgrind --suppressions=Misc/valgrind-python.supp fini_crash                                                                                                                 -- COMMAND -- 13:4[12/5973]
    ==266836== Memcheck, a memory error detector
    ==266836== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
    ==266836== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
    ==266836== Command: fini_crash                    
    ==266836==                                            
    MAIN: allow other thread to execute                       
    OTHER: acquired GIL                                
    OTHER: released GIL                                                                                                    
    MAIN: interpreter finalized
    OTHER: attempt to acquire GIL...crash!                                                                                 
    ==266836== Thread 2:                                                                                                   
    ==266836== Invalid read of size 8                                                                                      
    ==266836==    at 0x15607D: PyEval_RestoreThread (ceval.c:389)                                                                                                                                                                                  
    ==266836==    by 0x15479F: evil_main (in /home/phconnel/dev/cpython/fini_crash)
    ==266836==    by 0x48B94CE: start_thread (in /usr/lib/libpthread-2.30.so)
    ==266836==    by 0x4B232D2: clone (in /usr/lib/libc-2.30.so)
    ==266836==  Address 0x4d17270 is 16 bytes inside a block of size 264 free'd
    ==266836==    at 0x48399AB: free (vg_replace_malloc.c:540)
    ==266836==    by 0x1773FF: tstate_delete_common (pystate.c:829)
    ==266836==    by 0x1773FF: _PyThreadState_Delete (pystate.c:848)
    ==266836==    by 0x1773FF: zapthreads (pystate.c:311)
    ==266836==    by 0x1773FF: PyInterpreterState_Delete (pystate.c:321)
    ==266836==    by 0x174920: finalize_interp_delete (pylifecycle.c:1242)
    ==266836==    by 0x174920: Py_FinalizeEx.part.0 (pylifecycle.c:1400)
    ==266836==    by 0x15487B: main (in /home/phconnel/dev/cpython/fini_crash)
    ==266836==  Block was alloc'd at
    ==266836==    at 0x483877F: malloc (vg_replace_malloc.c:309)
    ==266836==    by 0x178D7C: new_threadstate (pystate.c:557)
    ==266836==    by 0x178D7C: PyThreadState_New (pystate.c:629)
    ==266836==    by 0x178D7C: PyGILState_Ensure (pystate.c:1288)
    ==266836==    by 0x154759: evil_main (in /home/phconnel/dev/cpython/fini_crash)
    ==266836==    by 0x48B94CE: start_thread (in /usr/lib/libpthread-2.30.so)
    ==266836==    by 0x4B232D2: clone (in /usr/lib/libc-2.30.so)
    ==266836== 
    ==266836== Invalid read of size 8
    ==266836==    at 0x156081: PyEval_RestoreThread (ceval.c:389)
    ==266836==    by 0x15479F: evil_main (in /home/phconnel/dev/cpython/fini_crash)
    ==266836==    by 0x48B94CE: start_thread (in /usr/lib/libpthread-2.30.so)
    ==266836==    by 0x4B232D2: clone (in /usr/lib/libc-2.30.so)
    ==266836==  Address 0x4c3a0f0 is 16 bytes inside a block of size 2,960 free'd
    ==266836==    at 0x48399AB: free (vg_replace_malloc.c:540)
    ==266836==    by 0x174920: finalize_interp_delete (pylifecycle.c:1242)
    ==266836==    by 0x174920: Py_FinalizeEx.part.0 (pylifecycle.c:1400)
    ==266836==    by 0x15487B: main (in /home/phconnel/dev/cpython/fini_crash)
    ==266836==  Block was alloc'd at
    ==266836==    at 0x483877F: malloc (vg_replace_malloc.c:309)
    ==266836==    by 0x177153: PyInterpreterState_New (pystate.c:205)
    ==266836==    by 0x1732BF: pycore_create_interpreter (pylifecycle.c:526)
    ==266836==    by 0x1732BF: pyinit_config.constprop.0 (pylifecycle.c:695)
    ==266836==    by 0x1766B7: pyinit_core (pylifecycle.c:879)
    ==266836==    by 0x1766B7: Py_InitializeFromConfig (pylifecycle.c:1055)
    ==266836==    by 0x1766B7: Py_InitializeEx (pylifecycle.c:1093)
    ==266836==    by 0x154801: main (in /home/phconnel/dev/cpython/fini_crash)
    ==266836==

    @phmc
    Copy link
    Mannequin

    phmc mannequin commented Nov 21, 2019

    Just to summarise, I'm fairly sure this is exactly what Victor saw: a daemon thread attempts to reacquire the GIL via Py_END_ALLOW_THREADS after interpreter finalisation. Obviously the threadstate pointer held by the thread is then invalid...so we crash.

    So I see basically two options:

    1. Don't (always) free threadstate structures in Py_Finalize, and figure out a way to avoid leaking them (if Python is re-initialized in the same process).

    2. Ban this behaviour entirely, e.g. have Py_Finalize fail if there are live threads with threadstate objects.

    The discussion so far assumes that we should support this, i.e. #1. Any thoughts on that? (I'll have a think about whether this is actually doable!)

    @phmc
    Copy link
    Mannequin

    phmc mannequin commented Nov 21, 2019

    The attached patch (wrap_threadstate.diff) is enough to stop the crash. It's a slightly dirty proof-of-concept, but equally could be the basis for a solution.

    The main functional issue is that there's still a race on the Py_BLOCK_THREADS side: it's possible that the "is threadstate still valid" check can pass, but the interpreter is finalised while the daemon thread is trying to reacquire the GIL in PyEval_RestoreThread.

    (The Py_UNBLOCK_THREADS side is non-racy as the GIL is held while the ts and wrapper updates are done).

    @ericsnowcurrently
    Copy link
    Member Author

    Sorry for the delay, Phil. I'll try to take a look in the next couple of hours.

    @ericsnowcurrently
    Copy link
    Member Author

    I'm out of time and this deserves some careful discussion. I'll get to it next Friday (or sooner if possible). Sorry!

    @ericsnowcurrently
    Copy link
    Member Author

    Thanks for the detailed analysis, Phil. I think the results are pretty conclusive: daemon threads are the worst. :) But seriously, thanks.

    As you demonstrated, it isn't just Python "daemon" threads that cause the problem. It is essentially any external access of the C-API once runtime finalization has started. The docs [1] aren't super clear about it, but there are some fundamental assumptions we make about runtime finalization:

    • no use of the C-API while Py_FinalizeEx() is executing (except for a few helpers like Py_Initialized)
    • only a small portion of the C-API is available afterward (at least until Py_Initialize() is run)

    I guess the real question is what to do about this?

    Given that this is essentially a separate problem, let's move further discussion and effort over related to sorting out problematic threads to bpo-36476, "Runtime finalization assumes all other threads have exited." @phil, would you mind attaching those same two files to that issue?

    [1] https://docs.python.org/3/c-api/init.html#c.Py_FinalizeEx

    @vstinner
    Copy link
    Member

    I partially reimplemented commit ef4ac96 in the following issues:

    • bpo-39984: Move some ceval fields from _PyRuntime.ceval to PyInterpreterState.ceval
    • bpo-40010: Inefficient signal handling in multithreaded applications
    • bpo-39877: Daemon thread is crashing in PyEval_RestoreThread() while the main thread is exiting the process
    • bpo-37127: Handling pending calls during runtime finalization may cause problems.

    I cannot give a list of commits, I made too many of them :-D

    The most important change is:

    commit 50e6e99
    Author: Victor Stinner <vstinner@python.org>
    Date: Thu Mar 19 02:41:21 2020 +0100

    bpo-39984: Move pending calls to PyInterpreterState (GH-19066)
    
    If Py_AddPendingCall() is called in a subinterpreter, the function is
    now scheduled to be called from the subinterpreter, rather than being
    called from the main interpreter.
    
    Each subinterpreter now has its own list of scheduled calls.
    
    * Move pending and eval_breaker fields from _PyRuntimeState.ceval
      to PyInterpreterState.ceval.
    * new_interpreter() now calls _PyEval_InitThreads() to create
      pending calls lock.
    * Fix Py_AddPendingCall() for subinterpreters. It now calls
      _PyThreadState_GET() which works in a subinterpreter if the
      caller holds the GIL, and only falls back on
      PyGILState_GetThisThreadState() if _PyThreadState_GET()
      returns NULL.
    

    My plan is now to fix pending calls in subinterpreters. Currently, they are only executed in the "main thread"... _PyRuntimeState.main_thread must be moved to PyInterpreterState, as it was done in commit ef4ac96.

    This issue shows that it's dangerous to change too many things at once. Python internals were still very fragile: bpo-39877 and and bpo-37127 are good examples of that. I'm trying to move *very slowly*. Move pieces one by one.

    I added _Py_ThreadCanHandlePendingCalls() function in commit d831688 (bpo-40010): it should ease moving _PyRuntimeState.main_thread to PyInterpreterState. I also added _Py_ThreadCanHandleSignals() in a previous commit, so it's easier to change which thread is allowed or not to handle signals and pending calls.

    I also plan to revisit (removal) pending_calls.finalizing added by commit 842a2f0 (bpo-33608). I plan to work on that in bpo-37127.

    @markshannon
    Copy link
    Member

    Just to add my 2 cents.

    I think this a bad idea and is likely to be unsafe.
    Having interpreters interfering with each other's objects is almost certain to lead to race conditions.

    IMO, objects should *never* be shared across interpreters (once interpreters are expected to be able to run in parallel).

    Any channel between two interpreters should consist of two objects, one per interpreter. The shared memory they use to communicated can be managed by the runtime. Likewise for inter-interpreter locks. The lock itself should be managed by the runtime, with each interpreter having its own lock object with a handle on the shared lock.

    @ericsnowcurrently
    Copy link
    Member Author

    FYI, in bpo-39984 Victor moved pending calls to PyInterpreterState, which was part of my reverted change. However, there are a few other pieces of that change that need to be applied before this issue is resolved. I'm not sure when I'll get to it, but hopefully soon.

    @vstinner
    Copy link
    Member

    Having interpreters interfering with each other's objects is almost certain to lead to race conditions.

    To be honest, I don't understand the purpose of this issue. Some messages are talking about pending calls, some others are talking about channels, and a few are talking about "shared" objects.

    Can someone either clarify the exact intent of this issue, or close this one and create a new issue with a clear explanation of what should be done and why?

    @vstinner
    Copy link
    Member

    vstinner commented Apr 8, 2020

    New changeset cfc3c2f by Victor Stinner in branch 'master':
    bpo-37127: Remove _pending_calls.finishing (GH-19439)
    cfc3c2f

    @vstinner
    Copy link
    Member

    vstinner commented Apr 8, 2020

    I created bpo-40231: Fix pending calls in subinterpreters.

    @vstinner
    Copy link
    Member

    vstinner commented Apr 8, 2020

    Pavel Kostyuchenko:

    (...) The error seems to be caused not by those changes, but by lack of synchronization in the multiprocessing.managers.Server.

    Pavel: would you mind to open a separated issue to suggest to add synchronization and/or avoid daemon thread in multiprocessing?

    The concurrent.futures module was recently modified to avoid daemon threads in bpo-39812.

    @vstinner
    Copy link
    Member

    vstinner commented Apr 8, 2020

    FYI, after merging that PR I realized that the COMPUTE_EVAL_BREAKER macro isn't quite right.

    I reworked this function in bpo-40010.

    @vstinner
    Copy link
    Member

    vstinner commented Apr 8, 2020

    This issue has a long history. A change has been applied and then reverted three times in a row. Pending calls are now per-interpreter.

    The issue title is "Add a cross-interpreter-safe mechanism to indicate that an object may be destroyed." but I don't understand if pending calls are expected to be used to communicate between two interpreters. Why not using a UNIX pipe and exchange bytes through it? Py_AddPendingCall() is a weird concept. I would prefer to not abuse it.

    Moreover, it's unclear if this issue attempts to *share* a same object between two interpreters. I would prefer to avoid that by any possible way.

    I close this issue with a complex history.

    If someone wants to continue to work on this topic, please open an issue with a very clear description of what should be done and how it is supposed to be used.

    @vstinner vstinner added 3.9 only security fixes and removed 3.8 only security fixes labels Apr 8, 2020
    @vstinner vstinner closed this as completed Apr 8, 2020
    @ericsnowcurrently
    Copy link
    Member Author

    I close this issue with a complex history.

    If someone wants to continue to work on this topic, please open an issue with a very clear description of what should be done and how it is supposed to be used.

    Yeah, there is more to do. I'll create a new issue.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes topic-subinterpreters
    Projects
    None yet
    Development

    No branches or pull requests

    8 participants