classification
Title: Add a cross-interpreter-safe mechanism to indicate that an object may be destroyed.
Type: Stage: patch review
Components: Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: eric.snow Nosy List: Johan Dahlin, db3l, emilyemorehouse, eric.snow, nascheme, ncoghlan, pmpp, serhiy.storchaka, vstinner, yselivanov
Priority: normal Keywords: patch

Created on 2018-05-22 19:34 by eric.snow, last changed 2019-03-15 22:33 by eric.snow.

Pull Requests
URL Status Linked Edit
PR 9334 closed eric.snow, 2018-09-15 18:25
PR 11617 merged eric.snow, 2019-01-18 20:43
PR 11617 merged eric.snow, 2019-01-18 20:43
PR 11617 merged eric.snow, 2019-01-18 20:43
PR 12024 merged eric.snow, 2019-02-24 23:56
PR 12062 merged eric.snow, 2019-02-27 01:19
PR 12159 merged vstinner, 2019-03-04 12:36
PR 12240 merged eric.snow, 2019-03-08 18:37
PR 12243 merged eric.snow, 2019-03-08 23:47
PR 12245 merged eric.snow, 2019-03-09 00:07
PR 12246 merged eric.snow, 2019-03-09 00:15
PR 12247 merged eric.snow, 2019-03-09 00:18
PR 12346 merged vstinner, 2019-03-15 14:06
PR 12360 open eric.snow, 2019-03-15 22:33
Messages (32)
msg317333 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2018-05-22 19:34
In order to keep subinterpreters properly isolated, objects
from one interpreter should not be used in C-API calls in
another interpreter.  That is relatively straight-forward
except in one case: indicating that the other interpreter
doesn't need the object to exist any more (similar to
PyBuffer_Release() but more general).  I consider the
following solutions to be the most viable.  Both make use
of recounts to protect cross-interpreter usage (with incref
before sharing).

1. manually switch interpreters (new private function)
  a. acquire the GIL
  b. if refcount > 1 then decref and release the GIL
  c. switch
  d. new thread (or re-use dedicated one)
  e. decref
  f. kill thread
  g. switch back
  h. release the GIL
2. change pending call mechanism (see Py_AddPendingCall) to
   per-interpreter instead of global (add "interp" arg to
   signature of new private C-API function)
  a. queue a function that decrefs the object
3. new cross-interpreter-specific private C-API function
  a. queue the object for decref (a la Py_AddPendingCall)
     in owning interpreter

I favor #2, since it is more generally applicable.  #3 would
probably be built on #2 anyway.  #1 is relatively inefficient.
With #2, Py_AddPendingCall() would become a simple wrapper
around the new private function.
msg317334 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2018-05-22 19:36
As a lesser (IMHO) alternative, we could also modify Py_DECREF
to respect a new "shared" marker of some sort (perhaps relative
to #33607), but that would probably still require one of the
refcount-based solutions (and add a branch to Py_DECREF).
msg317344 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-05-22 20:39
"That is relatively straight-forward except in one case: indicating that the other interpreter doesn't need the object to exist any more (similar to PyBuffer_Release() but more general)"

Why an interpreter would access an object from a different interpreter? Each interpreter should have its own object space, no?
msg317404 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2018-05-23 13:17
Adding a low level callback based mechanism to ask another interpreter to do work seems like a good way to go to me.

As an example of why that might be needed, consider the case of sharing a buffer exporting object with another subinterpreter: when the memoryview in the subinterpreter is destroyed, it needs to request that the buffer view be released in the source interpreter that actually owns the original object.
msg325481 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2018-09-16 11:54
I would suggest that sharing of objects between interpreters should be stamped out.  Could we have some #ifdef debug checking that would warn or assert so this doesn't happen?  I know currently we share a lot of objects.  However, in the long term, that does not seem like the correct design.  Instead, each interpreter should have its own objects and any passing or sharing should be tightly controlled.
msg325548 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2018-09-17 15:41
@Neil, we're definitely on the same page.  In fact, in a world where subinterpreters do not share a GIL, we can't ever use an object in one interpreter that was created in another (due to thread safety on refcounts).  The role of "tightly controlling" passing/sharing objects (for a very loose definition of "sharing") falls to the channels described in PEP 554. [1]

However, there are several circumstances where interpreters may collaborate that involves one holding a reference (but not using it) to an object owned by the other.  For instance, see PyBuffer_Release(). [2]  This issue is about addressing that situation safely.  It is definitely not about safely using objects from other interpreters.

[1] The low-level implementation, including channels, already exists in Modules/_xxsubinterpretersmodule.c.
[2] https://docs.python.org/3/c-api/buffer.html#c.PyBuffer_Release
msg336490 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2019-02-24 23:40
New changeset ef4ac967e2f3a9a18330cc6abe14adb4bc3d0465 by Eric Snow in branch 'master':
bpo-33608: Factor out a private, per-interpreter _Py_AddPendingCall(). (GH-11617)
https://github.com/python/cpython/commit/ef4ac967e2f3a9a18330cc6abe14adb4bc3d0465
msg336544 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2019-02-25 17:40
GH-11617 has introduces a slight performance regression which will need to be resolved.  If I can't find the problem right away then I'll probably temporarily revert the commit.

See https://mail.python.org/pipermail/python-dev/2019-February/156451.html.
msg336596 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2019-02-26 02:29
At least, it *might* be a performance regression.  Here are the two commits I tried:

after:  ef4ac967e2f3a9a18330cc6abe14adb4bc3d0465 (PR #11617, from this issue)
before:  463572c8beb59fd9d6850440af48a5c5f4c0c0c9 (the one right before)

After building each (a normal build, not debug), I ran the micro-benchmark Raymond referred to ("./python Tools/scripts/var_access_benchmark.py") multiple times.  There was enough variability between runs from the same commit that I'm uncertain at this point if there really is any performance regression.  For the most part the results between the two commits are really close.  Also, the results from the "after" commit fall within the variability I've seen between runs of the "before" commit.

I'm going to try with the "performance" suite (https://github.com/python/performance) to see if it shows any regression.
msg336602 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2019-02-26 04:10
Here's what I did (more or less):

$ python3 -m pip install --user perf
$ python3 -m perf system tune
$ git clone git@github.com:python/performance.git
$ git clone git@github.com:python/cpython.git
$ cd cpython
$ git checkout ef4ac967e2f3a9a18330cc6abe14adb4bc3d0465
$ ./configure
$ make -j8
$ ./python ../performance/pyperformance run --fast --output speed.after
$ git checkout HEAD^
$ make -j8
$ ./python ../performance/pyperformance run --fast --output speed.before
$ ./python ../performance/pyperformance compare speed.before speed.after -O table

Here are my results:

speed.before
============

Performance version: 0.7.0
Report on Linux-4.15.0-45-generic-x86_64-with-glibc2.26
Number of logical CPUs: 4
Start date: 2019-02-25 20:39:44.151699
End date: 2019-02-25 20:48:04.817516

speed.after
===========

Performance version: 0.7.0
Report on Linux-4.15.0-45-generic-x86_64-with-glibc2.26
Number of logical CPUs: 4
Start date: 2019-02-25 20:29:56.610702
End date: 2019-02-25 20:38:11.667461

+-------------------------+--------------+-------------+--------------+-----------------------+
| Benchmark               | speed.before | speed.after | Change       | Significance          |
+=========================+==============+=============+==============+=======================+
| 2to3                    | 447 ms       | 440 ms      | 1.02x faster | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| chaos                   | 155 ms       | 156 ms      | 1.01x slower | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| crypto_pyaes            | 154 ms       | 152 ms      | 1.01x faster | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| deltablue               | 10.7 ms      | 10.5 ms     | 1.01x faster | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| django_template         | 177 ms       | 172 ms      | 1.03x faster | Significant (t=3.66)  |
+-------------------------+--------------+-------------+--------------+-----------------------+
| dulwich_log             | 105 ms       | 106 ms      | 1.01x slower | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| fannkuch                | 572 ms       | 573 ms      | 1.00x slower | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| float                   | 149 ms       | 146 ms      | 1.02x faster | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| go                      | 351 ms       | 349 ms      | 1.00x faster | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| hexiom                  | 14.6 ms      | 14.4 ms     | 1.01x faster | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| html5lib                | 126 ms       | 122 ms      | 1.03x faster | Significant (t=3.46)  |
+-------------------------+--------------+-------------+--------------+-----------------------+
| json_dumps              | 17.6 ms      | 17.2 ms     | 1.02x faster | Significant (t=2.65)  |
+-------------------------+--------------+-------------+--------------+-----------------------+
| json_loads              | 36.4 us      | 36.1 us     | 1.01x faster | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| logging_format          | 15.2 us      | 14.9 us     | 1.02x faster | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| logging_silent          | 292 ns       | 296 ns      | 1.01x slower | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| logging_simple          | 13.7 us      | 13.4 us     | 1.02x faster | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| mako                    | 22.9 ms      | 22.5 ms     | 1.02x faster | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| meteor_contest          | 134 ms       | 134 ms      | 1.00x faster | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| nbody                   | 157 ms       | 161 ms      | 1.03x slower | Significant (t=-3.85) |
+-------------------------+--------------+-------------+--------------+-----------------------+
| nqueens                 | 134 ms       | 132 ms      | 1.02x faster | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| pathlib                 | 30.1 ms      | 31.0 ms     | 1.03x slower | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| pickle                  | 11.5 us      | 11.6 us     | 1.01x slower | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| pickle_dict             | 29.5 us      | 30.5 us     | 1.03x slower | Significant (t=-6.37) |
+-------------------------+--------------+-------------+--------------+-----------------------+
| pickle_list             | 4.54 us      | 4.58 us     | 1.01x slower | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| pickle_pure_python      | 652 us       | 651 us      | 1.00x faster | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| pidigits                | 212 ms       | 215 ms      | 1.02x slower | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| python_startup          | 11.6 ms      | 11.5 ms     | 1.01x faster | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| python_startup_no_site  | 8.07 ms      | 7.95 ms     | 1.01x faster | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| raytrace                | 729 ms       | 722 ms      | 1.01x faster | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| regex_compile           | 249 ms       | 247 ms      | 1.01x faster | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| regex_dna               | 203 ms       | 204 ms      | 1.00x slower | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| regex_effbot            | 3.55 ms      | 3.55 ms     | 1.00x faster | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| regex_v8                | 28.3 ms      | 28.3 ms     | 1.00x slower | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| richards                | 105 ms       | 105 ms      | 1.00x faster | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| scimark_fft             | 429 ms       | 436 ms      | 1.01x slower | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| scimark_lu              | 238 ms       | 237 ms      | 1.00x faster | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| scimark_monte_carlo     | 144 ms       | 139 ms      | 1.04x faster | Significant (t=3.61)  |
+-------------------------+--------------+-------------+--------------+-----------------------+
| scimark_sor             | 265 ms       | 260 ms      | 1.02x faster | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| scimark_sparse_mat_mult | 5.41 ms      | 5.25 ms     | 1.03x faster | Significant (t=4.26)  |
+-------------------------+--------------+-------------+--------------+-----------------------+
| spectral_norm           | 174 ms       | 173 ms      | 1.01x faster | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| sqlalchemy_declarative  | 216 ms       | 214 ms      | 1.01x faster | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| sqlalchemy_imperative   | 41.6 ms      | 41.2 ms     | 1.01x faster | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| sqlite_synth            | 3.99 us      | 3.91 us     | 1.02x faster | Significant (t=2.49)  |
+-------------------------+--------------+-------------+--------------+-----------------------+
| sympy_expand            | 559 ms       | 559 ms      | 1.00x faster | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| sympy_integrate         | 25.2 ms      | 25.3 ms     | 1.00x slower | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| sympy_str               | 261 ms       | 260 ms      | 1.00x faster | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| sympy_sum               | 136 ms       | 138 ms      | 1.01x slower | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| telco                   | 8.36 ms      | 8.32 ms     | 1.00x faster | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| tornado_http            | 273 ms       | 271 ms      | 1.01x faster | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| unpack_sequence         | 58.8 ns      | 60.0 ns     | 1.02x slower | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| unpickle                | 21.5 us      | 21.5 us     | 1.00x faster | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| unpickle_list           | 5.60 us      | 5.55 us     | 1.01x faster | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| unpickle_pure_python    | 497 us       | 481 us      | 1.03x faster | Significant (t=5.04)  |
+-------------------------+--------------+-------------+--------------+-----------------------+
| xml_etree_generate      | 141 ms       | 140 ms      | 1.01x faster | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| xml_etree_iterparse     | 131 ms       | 133 ms      | 1.01x slower | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| xml_etree_parse         | 186 ms       | 187 ms      | 1.00x slower | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
| xml_etree_process       | 115 ms       | 113 ms      | 1.01x faster | Not significant       |
+-------------------------+--------------+-------------+--------------+-----------------------+
msg336603 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2019-02-26 04:11
...so it doesn't appear that my PR introduces a performance regression.
msg336677 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-02-26 14:37
> ...so it doesn't appear that my PR introduces a performance regression.

IMHO there is no performance regression at all. Just noice in the result which doesn't come with std dev.
msg336687 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2019-02-26 15:57
FYI, I have a couple of small follow-up changes to land before I close this issue.
msg336897 - (view) Author: David Bolen (db3l) Date: 2019-03-01 07:20
I suspect changes for this issue may be creating test_io failures on my windows builders, most notably my x86 Windows 7 builder where test_io has been failing both attempts on almost all builds.  It fails with a lock failure during interpreter shutdown, and commit ef4ac967 appears the most plausible commit out of the set introduced at the first failing build on Feb 24.

See https://buildbot.python.org/all/#/builders/58/builds/1977 for the first failure.  test_io has failed both attempts on all but 3 of the subsequent 16 tests of the 3.x branch.

It might be worth noting that this builder is slow, so if there are timeouts involved or a race condition of any sort it might be triggering it more readily than other builders.  I do, however, see the same failures - albeit less frequently and not usually on both tries - on the Win8 and Win10 builders.

For what it's worth one other oddity is that while having test_multiprocessing* failures are relatively common on the Win7 builder during the first round of tests due to exceeding the timeout, they usually pass on the retry, but over this same time frame have begun failing - not due to timeout - even on the second attempt, which is unusual.  It might be coincidental but similar failures are also showing up sporadically on the Win8/Win10 builders where such failures are not common at all.
msg336948 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2019-03-01 19:35
New changeset b05b711a2cef6c6c381e01069dedac372e0b9fb2 by Eric Snow in branch 'master':
bpo-33608: Use _Py_AddPendingCall() in _PyCrossInterpreterData_Release(). (gh-12024)
https://github.com/python/cpython/commit/b05b711a2cef6c6c381e01069dedac372e0b9fb2
msg336951 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2019-03-01 20:15
New changeset bda918bf65a88560ec453aaba0758a9c0d49b449 by Eric Snow in branch 'master':
bpo-33608: Simplify ceval's DISPATCH by hoisting eval_breaker ahead of time. (gh-12062)
https://github.com/python/cpython/commit/bda918bf65a88560ec453aaba0758a9c0d49b449
msg336952 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2019-03-01 20:22
I've merged the PR for hoisting eval_breaker before the ceval loop starts.  @Raymond, see if that addresses the performance regression you've been seeing.

I'll close this issue after I've sorted out the buildbot issues David mentioned (on his Windows builders).
msg336975 - (view) Author: David Bolen (db3l) Date: 2019-03-02 00:27
If I can help with testing or builder access or anything just let me know.  It appears that I can pretty reliably trigger the error through just the individual tests (test_daemon_threads_shutdown_std{out,err}_deadlock) in isolation, which is definitely less tedious than a full buildbot run to test a change.

BTW, while not directly related since it was only just merged, and I won't pretend to have any real understanding of the changes here, I do have a question about PR 12024 ... it appears HEAD_UNLOCK is used twice in _PyInterpreterState_LookUpID.  Shouldn't one of those be HEAD_LOCK?
msg337075 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-04 08:22
> I suspect changes for this issue may be creating test_io failures on my windows builders, (...)

I created bpo-36177 to track this regression.
msg337096 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-04 12:16
The commit ef4ac967e2f3a9a18330cc6abe14adb4bc3d0465 introduced a regression in test.test_multiprocessing_spawn.WithThreadsTestManagerRestart.test_rapid_restart: bpo-36114.

Would it be possible to revert it until the bug is properly understood and fixed?
msg337110 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-04 13:12
> The commit ef4ac967e2f3a9a18330cc6abe14adb4bc3d0465 introduced a regression in test.test_multiprocessing_spawn.WithThreadsTestManagerRestart.test_rapid_restart: bpo-36114.

There is a very similar bug on Windows: bpo-36116.

I'm now also quite confident that bpo-36177 is also a regression caused by this issue.
msg337115 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-04 13:21
New changeset 4d61e6e3b802399be62a521d6fa785698cb670b5 by Victor Stinner in branch 'master':
Revert: bpo-33608: Factor out a private, per-interpreter _Py_AddPendingCall(). (GH-11617) (GH-12159)
https://github.com/python/cpython/commit/4d61e6e3b802399be62a521d6fa785698cb670b5
msg337116 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-04 13:21
Hi Eric, I'm sorry but I had to revert your recent work. It introduced regressions in at least in test_io and test_multiprocessing_spawn on Windows and FreeBSD. I simply don't have the bandwidth to investigate this regression yet, whereas we really need the CI to remain stable to catch other regressions (the master branch received multiple new features recently, and there are some other regressions by that way ;-)).

test_multiprocessing_spawn is *very* hard to reproduce on FreeBSD, but I can reliably reproduce it. It just takes time. The issue is a crash producing a coredump. I consider that the bug is important enough to justify a revert.

The revert is an opportunity to seat down and track the root cause rather than urging to push a "temporary" quickfix. This bug seems to be very deep in the Python internals: thread state during Python shutdown. So it will take time to fully understand it and fix it (or redesign your recent changes, I don't know).

Tell me if you need my help to reproduce the bug. The bug has been seen on FreeBSD but also Windows:

* test_multiprocessing_spawn started to produce coredumps on FreeBSD: https://bugs.python.org/issue36114
* test_multiprocessing_spawn started to fail randomly on Windows: https://bugs.python.org/issue36116
* test_io started to fail randomly on Windows: https://bugs.python.org/issue36177

-- The Night's Watch
msg337117 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-04 13:25
For curious people, Pablo Galindo spent a few hours to investigate https://bugs.python.org/issue36114 and I spent a few more hours to finish the analysis. The bug indirectly crashed my laptop :-)
https://twitter.com/VictorStinner/status/1102528982036201478

That's what I mean by "I don't have the bandwidth": we already spent hours to identify the regression ;-)
msg337159 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2019-03-04 23:58
That's okay, Victor.  Thanks for jumping on this.  I'll take a look when I get a chance.
msg337179 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-05 11:23
> That's okay, Victor.  Thanks for jumping on this.  I'll take a look when I get a chance.

From what I saw, your first commit was enough to reproduce the crash.

If I recall correctly, Antoine Pitrou modified the GIL so threads exit immediately when Py_Finalize() is called. I'm thinking at:

void
PyEval_RestoreThread(PyThreadState *tstate)
{
    ...
    take_gil(tstate);
    /* _Py_Finalizing is protected by the GIL */
    if (_Py_IsFinalizing() && !_Py_CURRENTLY_FINALIZING(tstate)) {
        drop_gil(tstate);
        PyThread_exit_thread();
        Py_UNREACHABLE();
    }
    ...
    PyThreadState_Swap(tstate);
}

Problem: this code uses tstate, whereas the crash occurred because tstate pointed to freed memory:

"""
Thread 1 got the crash

(gdb) p *tstate
$3 = {
  prev = 0xdbdbdbdbdbdbdbdb,
  next = 0xdbdbdbdbdbdbdbdb,
  interp = 0xdbdbdbdbdbdbdbdb,
  ...
}

...

Thread 1 (LWP 100696):
#0  0x0000000000368210 in take_gil (tstate=0x8027e2050) at Python/ceval_gil.h:216
#1  0x0000000000368a94 in PyEval_RestoreThread (tstate=0x8027e2050) at Python/ceval.c:281
...
"""

https://bugs.python.org/issue36114#msg337090

When this crash occurred, Py_Finalize() already completed in the main thread!

"""
void _Py_NO_RETURN
Py_Exit(int sts)
{
    if (Py_FinalizeEx() < 0) {  /* <==== DONE! */
        sts = 120;
    }

    exit(sts);    /* <=============== Crash occurred here! */
}
"""

Py_Finalize() is supposed to wait for threads before deleting Python thread states:

"""
int
Py_FinalizeEx(void)
{
    ...

    /* The interpreter is still entirely intact at this point, and the
     * exit funcs may be relying on that.  In particular, if some thread
     * or exit func is still waiting to do an import, the import machinery
     * expects Py_IsInitialized() to return true.  So don't say the
     * interpreter is uninitialized until after the exit funcs have run.
     * Note that Threading.py uses an exit func to do a join on all the
     * threads created thru it, so this also protects pending imports in
     * the threads created via Threading.
     */
    call_py_exitfuncs(interp);

    ...

    /* Remaining threads (e.g. daemon threads) will automatically exit
       after taking the GIL (in PyEval_RestoreThread()). */
    _PyRuntime.finalizing = tstate;
    _PyRuntime.initialized = 0;
    _PyRuntime.core_initialized = 0;
    ...

    /* Delete current thread. After this, many C API calls become crashy. */
    PyThreadState_Swap(NULL);

    PyInterpreterState_Delete(interp);

    ...
}
"""

The real problem for years are *deamon threads* which... BY DESIGN... remain alive after Py_Finalize() exit! But as I explained, they must exit as soon at they attempt to get GIL.
msg337218 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2019-03-05 16:08
Thanks for taking a look Victor!  That info is helpful.  It will likely be a few days before I can sort this out.

Once I have addressed the problem I'll re-merge.  I plan on using the "buildbot-custom" branch to make sure the buildbots are happy with the change before making that PR.
msg337235 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-05 18:03
The bug is hard to reproduce even manually. I can test a PR for you once
it's ready.
msg337554 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2019-03-09 05:47
New changeset 5be45a6105d656c551adeee7770afdc3b806fbb5 by Eric Snow in branch 'master':
bpo-33608: Minor cleanup related to pending calls. (gh-12247)
https://github.com/python/cpython/commit/5be45a6105d656c551adeee7770afdc3b806fbb5
msg337557 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2019-03-09 06:44
New changeset 8479a3426eb7d1840473f7788e639954363ed37e by Eric Snow in branch 'master':
bpo-33608: Make sure locks in the runtime are properly re-created.  (gh-12245)
https://github.com/python/cpython/commit/8479a3426eb7d1840473f7788e639954363ed37e
msg337996 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-15 15:04
New changeset e3f4070aee6f2d489416fdcafd51d6b04d661919 by Victor Stinner in branch 'master':
bpo-33608: Fix PyEval_InitThreads() warning (GH-12346)
https://github.com/python/cpython/commit/e3f4070aee6f2d489416fdcafd51d6b04d661919
msg338039 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2019-03-15 21:47
New changeset 842a2f07f2f08a935ef470bfdaeef40f87490cfc by Eric Snow in branch 'master':
bpo-33608: Deal with pending calls relative to runtime shutdown. (gh-12246)
https://github.com/python/cpython/commit/842a2f07f2f08a935ef470bfdaeef40f87490cfc
History
Date User Action Args
2019-03-15 22:33:04eric.snowsetpull_requests: + pull_request12327
2019-03-15 21:47:56eric.snowsetmessages: + msg338039
2019-03-15 15:04:23vstinnersetmessages: + msg337996
2019-03-15 14:06:08vstinnersetpull_requests: + pull_request12312
2019-03-09 06:44:37eric.snowsetmessages: + msg337557
2019-03-09 05:47:17eric.snowsetmessages: + msg337554
2019-03-09 00:18:53eric.snowsetpull_requests: + pull_request12236
2019-03-09 00:15:59eric.snowsetpull_requests: + pull_request12235
2019-03-09 00:07:40eric.snowsetpull_requests: + pull_request12234
2019-03-08 23:47:57eric.snowsetpull_requests: + pull_request12231
2019-03-08 18:37:51eric.snowsetstage: patch review
pull_requests: + pull_request12228
2019-03-05 18:03:07vstinnersetmessages: + msg337235
title: [subinterpreters] Add a cross-interpreter-safe mechanism to indicate that an object may be destroyed. -> Add a cross-interpreter-safe mechanism to indicate that an object may be destroyed.
2019-03-05 16:08:16eric.snowsetmessages: + msg337218
stage: patch review -> (no value)
2019-03-05 11:23:11vstinnersetmessages: + msg337179
2019-03-04 23:58:01eric.snowsetmessages: + msg337159
2019-03-04 13:25:42vstinnersetmessages: + msg337117
2019-03-04 13:21:47vstinnersetmessages: + msg337116
2019-03-04 13:21:31vstinnersetmessages: + msg337115
2019-03-04 13:12:33vstinnersetmessages: + msg337110
2019-03-04 12:36:47vstinnersetpull_requests: + pull_request12157
2019-03-04 12:16:57vstinnersetmessages: + msg337096
2019-03-04 08:22:10vstinnersetmessages: + msg337075
2019-03-02 00:27:46db3lsetmessages: + msg336975
2019-03-01 20:22:41eric.snowsetmessages: + msg336952
2019-03-01 20:15:48eric.snowsetmessages: + msg336951
2019-03-01 19:35:15eric.snowsetmessages: + msg336948
2019-03-01 07:20:43db3lsetnosy: + db3l
messages: + msg336897
2019-02-27 01:19:48eric.snowsetpull_requests: + pull_request12086
2019-02-26 15:57:24eric.snowsetmessages: + msg336687
2019-02-26 14:37:46vstinnersetmessages: + msg336677
2019-02-26 04:11:45eric.snowsetmessages: + msg336603
2019-02-26 04:10:24eric.snowsetmessages: + msg336602
2019-02-26 02:29:06eric.snowsetmessages: + msg336596
2019-02-25 17:40:35eric.snowsetmessages: + msg336544
2019-02-24 23:56:13eric.snowsetpull_requests: + pull_request12057
2019-02-24 23:40:49eric.snowsetmessages: + msg336490
2019-01-18 20:43:41eric.snowsetpull_requests: + pull_request11362
2019-01-18 20:43:26eric.snowsetpull_requests: + pull_request11361
2019-01-18 20:43:08eric.snowsetpull_requests: + pull_request11360
2018-11-10 18:26:45Johan Dahlinsetnosy: + Johan Dahlin
2018-09-17 15:41:58eric.snowsetassignee: eric.snow
messages: + msg325548
2018-09-16 11:54:57naschemesetnosy: + nascheme
messages: + msg325481
2018-09-15 18:25:04eric.snowsetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request8758
2018-06-22 22:48:34eric.snowsetnosy: + emilyemorehouse
2018-05-23 13:17:57ncoghlansetmessages: + msg317404
2018-05-22 20:57:59vstinnersettitle: Add a cross-interpreter-safe mechanism to indicate that an object may be destroyed. -> [subinterpreters] Add a cross-interpreter-safe mechanism to indicate that an object may be destroyed.
2018-05-22 20:42:16pmppsetnosy: + pmpp
2018-05-22 20:39:54vstinnersetmessages: + msg317344
2018-05-22 19:36:39eric.snowsetmessages: + msg317334
versions: + Python 3.8, - Python 3.4
2018-05-22 19:34:41eric.snowcreate