Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling pending calls during runtime finalization may cause problems. #81308

Closed
ericsnowcurrently opened this issue Jun 1, 2019 · 12 comments
Closed
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@ericsnowcurrently
Copy link
Member

BPO 37127
Nosy @vstinner, @ericsnowcurrently
PRs
  • bpo-39984: trip_signal() uses PyGILState_GetThisThreadState() #19061
  • bpo-37127: Remove _pending_calls.finishing #19439
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ericsnowcurrently'
    closed_at = <Date 2020-04-08.21:19:44.166>
    created_at = <Date 2019-06-01.19:11:46.358>
    labels = ['interpreter-core', 'type-crash']
    title = 'Handling pending calls during runtime finalization may cause problems.'
    updated_at = <Date 2020-04-08.21:19:44.165>
    user = 'https://github.com/ericsnowcurrently'

    bugs.python.org fields:

    activity = <Date 2020-04-08.21:19:44.165>
    actor = 'vstinner'
    assignee = 'eric.snow'
    closed = True
    closed_date = <Date 2020-04-08.21:19:44.166>
    closer = 'vstinner'
    components = ['Interpreter Core']
    creation = <Date 2019-06-01.19:11:46.358>
    creator = 'eric.snow'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 37127
    keywords = ['patch']
    message_count = 12.0
    messages = ['344202', '344207', '364489', '364490', '364543', '364544', '364545', '364550', '365997', '366004', '366007', '366008']
    nosy_count = 2.0
    nosy_names = ['vstinner', 'eric.snow']
    pr_nums = ['19061', '19439']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue37127'
    versions = []

    @ericsnowcurrently
    Copy link
    Member Author

    In Python/lifecycle.c (Py_FinalizeEx) we call _Py_FinishPendingCalls(), right after we stop all non-daemon Python threads but before we've actually started finalizing the runtime state. That call looks for any remaining pending calls (for the main interpreter) and runs them. There's some evidence of a bug there.

    In bpo-33608 I moved the pending calls to per-interpreter state. We saw failures (sometimes sporadic) on a few buildbots (e.g. FreeBSD) during runtime finalization. However, nearly all of the buildbots were fine, so it may be a question of architecture or slow hardware. See bpo-33608 for details on the failures.

    There are a number of possibilities, but it's been tricky reproducing the problem in order to investigate. Here are some theories:

    • daemon threads (a known weak point in runtime finalization) block pending calls from happening until some time after portions of the runtime have already been cleaned up
    • there's a race that causes the pending calls machinery to get caught in some sort infinite loop (e.g. a pending call fails and gets re-queued)
    • a corner case in the pending calls logic that triggers only during finalization

    Here are some other points to consider:

    • do we have the same problem during subinterpreter finalization (Py_EndInterpreter() rather than runtime finalization)?
    • perhaps the problem extends beyond finalization, but the conditions are more likely there
    • the change for bpo-33608 could have introduced the bug rather that exposing an existing one

    @ericsnowcurrently ericsnowcurrently self-assigned this Jun 1, 2019
    @ericsnowcurrently ericsnowcurrently added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump labels Jun 1, 2019
    @ericsnowcurrently
    Copy link
    Member Author

    Also, someone did manage to investigate and identify a likely cause:

    https://bugs.python.org/issue33608#msg342791

    @vstinner
    Copy link
    Member

    In the master branch of Python, trip_signal() calls _PyEval_AddPendingCall(tstate) and tstate is get using _PyRuntimeState_GetThreadState(runtime).

    trip_signal() can be called while the GIL is not held: tstate is NULL in this case. For example, it's the case when calling signal.raise_signal().

    Problem: _PyEval_AddPendingCall() uses tstate if pending->finishing is non-zero.

        if (pending->finishing) {
            ...
            _PyErr_Fetch(tstate, &exc, &val, &tb);
            _PyErr_SetString(tstate, PyExc_SystemError,
                            "Py_AddPendingCall: cannot add pending calls "
                            "(Python shutting down)");
            ...
        }

    pending->finishing was addd in bpo-33608 by the change:

    commit 842a2f0
    Author: Eric Snow <ericsnowcurrently@gmail.com>
    Date: Fri Mar 15 15:47:51 2019 -0600

    bpo-33608: Deal with pending calls relative to runtime shutdown. (gh-12246)
    

    I found this issue while trying to make pending calls per interpreter in bpo-39984: move pending from _PyRuntimeState to PyInterpreterState.

    @vstinner
    Copy link
    Member

    In bpo-33608 I moved the pending calls to per-interpreter state. We saw failures (sometimes sporadic) on a few buildbots (e.g. FreeBSD) during runtime finalization. However, nearly all of the buildbots were fine, so it may be a question of architecture or slow hardware. See bpo-33608 for details on the failures.

    That was a crash in multiprocessing tests. PyEval_RestoreThread(tstate) was called with a freed PyThreadState: tstate->interp = 0xdbdbdbdbdbdbdbdb for example.

    I recently fixed PyEval_RestoreThread() in bpo-39877 to exit properly daemon theads without dereferencing tstate which points to freed memory.

    To reproduce the bug, it helped me to add a sleep at Python exit. Fixed value (ex: 1 second) or better: random sleep (between 1 ms and 1 sec).

    @vstinner
    Copy link
    Member

    pending->finishing was addd in bpo-33608 by the change: commit 842a2f0

    Since this change, Py_AddPendingCall() now requires the thread to have a Python thread state: it is used if pending->finishing is non-zero.

    The function documentation says:

    "This function doesn’t need a current thread state to run, and it doesn’t need the global interpreter lock."

    https://docs.python.org/dev/c-api/init.html#c.Py_AddPendingCall

    The current implementation seems to contradict the documentation.

    We may update the documentation to replace "doesn't need" with "requires" (a Python thread state).

    Or the implementation can use PyGILState_Ensure() and PyGILState_Release() to create a temporary Python thread state if the thread doesn't have one.

    Removing pending->finishing would only partially fix the issue. With PR 19061, tstate is required to access "ceval" structure.

    trip_signal() called by the C signal handler has a similar issue: it only requires non-NULL tstate if pending->finishing is non-zero.

    @vstinner
    Copy link
    Member

    trip_signal() has another problem. If pending->finishing is non-zero, _PyEval_AddPendingCall() uses the C API whereas the current thread may not hold the GIL. That's forbidden by the C API.

    The more I think about it, the more I think that pending->finishing is fragile and should be removed.

    I understood that pending->finishing was added to workaround crashes during Python finalization. I fixed multiple crashes related to daemon threads during Python finalization in bpo-39877. Maybe the bugs that I fixed were the ones which pending->finishing was supposed to work around.

    @vstinner
    Copy link
    Member

    My notes on Python finalization: https://pythondev.readthedocs.io/finalization.html

    @vstinner
    Copy link
    Member

    New changeset 8849e59 by Victor Stinner in branch 'master':
    bpo-39984: trip_signal() uses PyGILState_GetThisThreadState() (GH-19061)
    8849e59

    @vstinner
    Copy link
    Member

    vstinner commented Apr 8, 2020

    See also bpo-40082: trip_signal() gets NULL tstate on Windows on CTRL+C.

    @vstinner
    Copy link
    Member

    vstinner commented Apr 8, 2020

    New changeset cfc3c2f by Victor Stinner in branch 'master':
    bpo-37127: Remove _pending_calls.finishing (GH-19439)
    cfc3c2f

    @vstinner
    Copy link
    Member

    vstinner commented Apr 8, 2020

    I removed _pending_calls.finishing for multiple reasons:

    • _PyEval_AddPendingCall() used the C API whereas the caller may not hold the GIL.
    • bpo-40082: trip_signal() can be called from a thread which has no Python thread state. On Windows, CTRL+C calls trip_signal() in a new thread a each call.

    I rewrote trip_signal() to only use the PyInterpreterState ("interp") and avoid PyThreadState ("tstate") in PR 19441 to fix bpo-40082.

    trip_signal() should read and set atomtic variables: don't modify globals without a lock. _PyEval_AddPendingCall() is not fully async-signal safe yet :-/ Using a lock is unsafe in a signal handler.

    @vstinner
    Copy link
    Member

    vstinner commented Apr 8, 2020

    Also, someone did manage to investigate and identify a likely cause:
    https://bugs.python.org/issue33608#msg342791

    I made many changes in Python internals since bpo-33608 was reported. I am not aware of any recent issue with pending calls and so close the issue.

    If you get new crashes/hangs with pending calls during Python finalization, please open a new issue.

    @vstinner vstinner closed this as completed Apr 8, 2020
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    jamadden added a commit to python-greenlet/greenlet that referenced this issue Oct 28, 2022
    It can require the GIL, even though it isn't supposed to. This is
    fixed in 3.9 and was not an issue in 3.7.
    
    The symptom is "Fatal Python error: Python memory allocator called
    without holding the GIL"
    
    See python/cpython#81308
    
    Unfortunately, the only workaround I see requires deep magic and duplicating CPython internals.
    github-actions bot added a commit to MaRDI4NFDI/open-interfaces that referenced this issue Nov 7, 2022
    Bumps [greenlet](https://github.com/python-greenlet/greenlet) from
    1.1.3.post0 to 2.0.1.
    <details>
    <summary>Changelog</summary>
    <p><em>Sourced from <a
    href="https://github.com/python-greenlet/greenlet/blob/master/CHANGES.rst">greenlet's
    changelog</a>.</em></p>
    <blockquote>
    <h1>2.0.1 (2022-11-07)</h1>
    <ul>
    <li>Python 3.11: Fix a memory leak. See <code>issue 328
    &lt;https://github.com/python-greenlet/greenlet/issues/328&gt;</code>_
    and
    <code>gevent issue 1924
    &lt;https://github.com/gevent/gevent/issues/1924&gt;</code>_.</li>
    </ul>
    <h1>2.0.0.post0 (2022-11-03)</h1>
    <ul>
    <li>Add <code>Programming Language :: Python :: 3.11</code> to the PyPI
    classifier metadata.</li>
    </ul>
    <h1>2.0.0 (2022-10-31)</h1>
    <ul>
    <li>Nothing changed yet.</li>
    </ul>
    <h1>2.0.0rc5 (2022-10-31)</h1>
    <ul>
    <li>Linux: Fix another group of rare crashes that could occur when
    shutting down an
    interpeter running multiple threads. See <code>issue 325
    &lt;https://github.com/python-greenlet/greenlet/issues/325&gt;</code>_.</li>
    </ul>
    <h1>2.0.0rc4 (2022-10-30)</h1>
    <ul>
    <li>Linux: Fix a rare crash that could occur when shutting down an
    interpreter running multiple threads, when some of those threads are
    in greenlets making calls to functions that release the GIL.</li>
    </ul>
    <h1>2.0.0rc3 (2022-10-29)</h1>
    <ul>
    <li>Python 2: Fix a crash that could occur when raising an old-style
    instance object.</li>
    </ul>
    <h1>2.0.0rc2 (2022-10-28)</h1>
    <ul>
    <li>Workaround <code>a CPython 3.8 bug
    &lt;https://github.com/python/cpython/issues/81308&gt;</code>_ that
    could cause
    the interpreter to crash during an early phase of shutdown with the
    message &quot;Fatal Python error: Python memory allocator called
    without</li>
    </ul>
    <!-- raw HTML omitted -->
    </blockquote>
    <p>... (truncated)</p>
    </details>
    <details>
    <summary>Commits</summary>
    <ul>
    <li><a
    href="https://github.com/python-greenlet/greenlet/commit/69f6483c41776fb9095c3fd7b02d3e4ede289c78"><code>69f6483</code></a>
    Preparing release 2.0.1</li>
    <li><a
    href="https://github.com/python-greenlet/greenlet/commit/df2154d45b8cfa751de1fc9e6a95e0b09a53becf"><code>df2154d</code></a>
    Merge pull request <a
    href="https://github-redirect.dependabot.com/python-greenlet/greenlet/issues/329">#329</a>
    from python-greenlet/issue328</li>
    <li><a
    href="https://github.com/python-greenlet/greenlet/commit/aa6f2515b2fefced69be933952bd1f3319d0de58"><code>aa6f251</code></a>
    Python 3.11: Fix a memory leak switching to greenlets.</li>
    <li><a
    href="https://github.com/python-greenlet/greenlet/commit/7389976dc748677dd4fce535b91ef07e4387c1d6"><code>7389976</code></a>
    Back to development: 2.0.1</li>
    <li><a
    href="https://github.com/python-greenlet/greenlet/commit/0216c2ae008cb185292b136e34ba92ae3f40da6b"><code>0216c2a</code></a>
    Preparing release 2.0.0.post0</li>
    <li><a
    href="https://github.com/python-greenlet/greenlet/commit/7e1704ccbf2ade357416b63d95d645120df1f4dc"><code>7e1704c</code></a>
    .github/workflows/tests.yml: Move to final release of 3.11</li>
    <li><a
    href="https://github.com/python-greenlet/greenlet/commit/f18bff43df80c5190760dc2aedfa436a1fae7bed"><code>f18bff4</code></a>
    setup.py: Add Python 3.11 to the PyPI classifier metadata.</li>
    <li><a
    href="https://github.com/python-greenlet/greenlet/commit/fc50ac173d0011492c19c87eb619017fdab47fd9"><code>fc50ac1</code></a>
    Back to development: 2.0.1</li>
    <li><a
    href="https://github.com/python-greenlet/greenlet/commit/f9c29d90b1fd33e3ade1272f46e17dfe17d651e6"><code>f9c29d9</code></a>
    Preparing release 2.0.0</li>
    <li><a
    href="https://github.com/python-greenlet/greenlet/commit/4ba1df63dcd40c8056f21117a91bf56d0be8a6eb"><code>4ba1df6</code></a>
    Back to development: 2.0.0rc6</li>
    <li>Additional commits viewable in <a
    href="https://github.com/python-greenlet/greenlet/compare/1.1.3.post0...2.0.1">compare
    view</a></li>
    </ul>
    </details>
    <br />
    
    
    [![Dependabot compatibility
    score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=greenlet&package-manager=pip&previous-version=1.1.3.post0&new-version=2.0.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
    
    Dependabot will resolve any conflicts with this PR as long as you don't
    alter it yourself. You can also trigger a rebase manually by commenting
    `@dependabot rebase`.
    
    [//]: # (dependabot-automerge-start)
    [//]: # (dependabot-automerge-end)
    
    ---
    
    <details>
    <summary>Dependabot commands and options</summary>
    <br />
    
    You can trigger Dependabot actions by commenting on this PR:
    - `@dependabot rebase` will rebase this PR
    - `@dependabot recreate` will recreate this PR, overwriting any edits
    that have been made to it
    - `@dependabot merge` will merge this PR after your CI passes on it
    - `@dependabot squash and merge` will squash and merge this PR after
    your CI passes on it
    - `@dependabot cancel merge` will cancel a previously requested merge
    and block automerging
    - `@dependabot reopen` will reopen this PR if it is closed
    - `@dependabot close` will close this PR and stop Dependabot recreating
    it. You can achieve the same result by closing it manually
    - `@dependabot ignore this major version` will close this PR and stop
    Dependabot creating any more for this major version (unless you reopen
    the PR or upgrade to it yourself)
    - `@dependabot ignore this minor version` will close this PR and stop
    Dependabot creating any more for this minor version (unless you reopen
    the PR or upgrade to it yourself)
    - `@dependabot ignore this dependency` will close this PR and stop
    Dependabot creating any more for this dependency (unless you reopen the
    PR or upgrade to it yourself)
    
    
    </details>
    netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Nov 22, 2022
    2.0.1 (2022-11-07)
    ==================
    
    - Python 3.11: Fix a memory leak. See `issue 328
      <https://github.com/python-greenlet/greenlet/issues/328>`_ and
      `gevent issue 1924 <https://github.com/gevent/gevent/issues/1924>`_.
    
    
    2.0.0.post0 (2022-11-03)
    ========================
    
    - Add ``Programming Language :: Python :: 3.11`` to the PyPI
      classifier metadata.
    
    
    2.0.0 (2022-10-31)
    ==================
    
    - Nothing changed yet.
    
    
    2.0.0rc5 (2022-10-31)
    =====================
    
    - Linux: Fix another group of rare crashes that could occur when shutting down an
      interpeter running multiple threads. See `issue 325 <https://github.com/python-greenlet/greenlet/issues/325>`_.
    
    
    2.0.0rc4 (2022-10-30)
    =====================
    
    - Linux: Fix a rare crash that could occur when shutting down an
      interpreter running multiple threads, when some of those threads are
      in greenlets making calls to functions that release the GIL.
    
    
    2.0.0rc3 (2022-10-29)
    =====================
    
    - Python 2: Fix a crash that could occur when raising an old-style
      instance object.
    
    
    2.0.0rc2 (2022-10-28)
    =====================
    
    - Workaround `a CPython 3.8 bug
      <https://github.com/python/cpython/issues/81308>`_ that could cause
      the interpreter to crash during an early phase of shutdown with the
      message "Fatal Python error: Python memory allocator called without
      holding the GI." This only impacted CPython 3.8a3 through CPython
      3.9a5; the fix is only applied to CPython 3.8 releases (please don't
      use an early alpha release of CPython 3.9).
    
    
    2.0.0rc1 (2022-10-27)
    =====================
    
    - Deal gracefully with greenlet switches that occur while deferred
      deallocation of objects is happening using CPython's "trash can"
      mechanism. Previously, if a large nested container held items that
      switched greenlets during delayed deallocation, and that second
      greenlet also invoked the trash can, CPython's internal state could
      become corrupt. This was visible as an assertion error in debug
      builds. Now, the relevant internal state is saved and restored
      during greenlet switches. See also `gevent issue 1909
      <https://github.com/gevent/gevent/issues/1909>`_.
    - Rename the C API function ``PyGreenlet_GET_PARENT`` to
      ``PyGreenlet_GetParent`` for consistency. The old name remains
      available as a deprecated alias.
    
    
    
    2.0.0a2 (2022-03-24)
    ====================
    
    - Fix a crash on older versions of the Windows C runtime when an
      unhandled C++ exception was thrown inside a greenlet by another
      native extension. This is a bug in that extension, and the
      interpreter will still abort, but at least it does so deliberately.
      Thanks to Kirill Smelkov. See `PR 286
      <https://github.com/python-greenlet/greenlet/pull/286>`_.
    - Musllinux wheels for aarch64 are now built, tested, and uploaded to
      PyPI. Thanks to Alexander Piskun.
    - This version of greenlet is known to compile and pass tests on
      CPython 3.11.0a6. Earlier 3.11 releases will not work; later
      releases may or may not work. See `PR 294
      <https://github.com/python-greenlet/greenlet/pull/294>`_. Special
      thanks to Victor Stinner, Brandt Bucher and the CPython developers.
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants