Add a cross-interpreter-safe mechanism to indicate that an object may be destroyed. #77789

ericsnowcurrently · 2018-05-22T19:34:42Z

BPO	33608
Nosy	@nascheme, @ncoghlan, @vstinner, @phsilva, @pmp-p, @markshannon, @ericsnowcurrently, @serhiy-storchaka, @soltysh, @1st1, @koobs, @zooba, @phmc, @emilyemorehouse, @jdahlin
PRs	[WIP] bpo-33608: Factor out a private, per-interpreter _Py_AddPendingCall(). #9334 bpo-33608: Factor out a private, per-interpreter _Py_AddPendingCall(). #11617 bpo-33608: Factor out a private, per-interpreter _Py_AddPendingCall(). #11617 bpo-33608: Factor out a private, per-interpreter _Py_AddPendingCall(). #11617 bpo-33608: Use _Py_AddPendingCall() in _PyCrossInterpreterData_Release(). #12024 bpo-33608: Simplify DISPATCH by hoisting eval_breaker ahead of time. #12062 Revert: bpo-33608: Factor out a private, per-interpreter _Py_AddPendingCall(). (GH-11617) #12159 bpo-33608: Fix the Py_atomic_* macros. #12240 bpo-33608: Simplify DISPATCH by hoisting eval_breaker ahead of time. #12243 bpo-33608: Make sure locks in the runtime are properly re-created. #12245 bpo-33608: Deal with pending calls relative to runtime shutdown. #12246 bpo-33608: Minor cleanup related to pending calls. #12247 bpo-33608: Fix PyEval_InitThreads() warning #12346 bpo-33608: Factor out a private, per-interpreter _Py_AddPendingCall(). #12360 bpo-33608: make arm macros match x86 macros #12665 bpo-33608: Revert "Factor out a private, per-interpreter _Py_AddPendingCall()." #12806 bpo-33608: Normalize atomic macros so that they all expect an atomic struct #12877 bpo-33608: Factor out a private, per-interpreter _Py_AddPendingCall(). #13714 Revert "bpo-33608: Factor out a private, per-interpreter _Py_AddPendingCall(). (gh-13714)" #13780
Files	managers.patch take_gil.assert.patch fini_crash.c wrap_threadstate.diff

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/ericsnowcurrently'
closed_at = <Date 2020-04-08.21:58:41.738>
created_at = <Date 2018-05-22.19:34:41.672>
labels = ['expert-subinterpreters', '3.9']
title = 'Add a cross-interpreter-safe mechanism to indicate that an object may be destroyed.'
updated_at = <Date 2020-06-03.16:43:06.602>
user = 'https://github.com/ericsnowcurrently'

bugs.python.org fields:

activity = <Date 2020-06-03.16:43:06.602>
actor = 'vstinner'
assignee = 'eric.snow'
closed = True
closed_date = <Date 2020-04-08.21:58:41.738>
closer = 'vstinner'
components = ['Subinterpreters']
creation = <Date 2018-05-22.19:34:41.672>
creator = 'eric.snow'
dependencies = []
files = ['48333', '48334', '48731', '48732']
hgrepos = []
issue_num = 33608
keywords = ['patch']
message_count = 82.0
messages = ['317333', '317334', '317344', '317404', '325481', '325548', '336490', '336544', '336596', '336602', '336603', '336677', '336687', '336897', '336948', '336951', '336952', '336975', '337075', '337096', '337110', '337115', '337116', '337117', '337159', '337179', '337218', '337235', '337554', '337557', '337996', '338039', '340057', '340069', '340070', '340071', '340075', '340077', '340079', '340082', '340133', '340139', '340143', '340145', '340666', '340768', '342791', '342792', '344210', '344211', '344240', '344244', '344245', '344355', '344372', '344417', '344429', '344434', '344435', '344438', '344441', '344442', '344443', '344444', '344445', '344446', '357169', '357170', '357179', '358364', '358367', '358748', '364670', '365160', '365163', '365178', '366009', '366013', '366016', '366017', '366019', '366027']
nosy_count = 16.0
nosy_names = ['nascheme', 'ncoghlan', 'vstinner', 'phsilva', 'pmpp', 'Mark.Shannon', 'eric.snow', 'serhiy.storchaka', 'maciej.szulik', 'yselivanov', 'koobs', 'steve.dower', 'pconnell', 'emilyemorehouse', 'Johan Dahlin', 'shprotx']
pr_nums = ['9334', '11617', '11617', '11617', '12024', '12062', '12159', '12240', '12243', '12245', '12246', '12247', '12346', '12360', '12665', '12806', '12877', '13714', '13780']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue33608'
versions = ['Python 3.9']

ericsnowcurrently · 2018-05-22T19:34:42Z

In order to keep subinterpreters properly isolated, objects
from one interpreter should not be used in C-API calls in
another interpreter. That is relatively straight-forward
except in one case: indicating that the other interpreter
doesn't need the object to exist any more (similar to
PyBuffer_Release() but more general). I consider the
following solutions to be the most viable. Both make use
of recounts to protect cross-interpreter usage (with incref
before sharing).

manually switch interpreters (new private function)
a. acquire the GIL
b. if refcount > 1 then decref and release the GIL
c. switch
d. new thread (or re-use dedicated one)
e. decref
f. kill thread
g. switch back
h. release the GIL
change pending call mechanism (see Py_AddPendingCall) to
per-interpreter instead of global (add "interp" arg to
signature of new private C-API function)
a. queue a function that decrefs the object
new cross-interpreter-specific private C-API function
a. queue the object for decref (a la Py_AddPendingCall)
in owning interpreter

I favor #2, since it is more generally applicable. #3 would
probably be built on #2 anyway. #1 is relatively inefficient.
With #2, Py_AddPendingCall() would become a simple wrapper
around the new private function.

ericsnowcurrently · 2018-05-22T19:36:40Z

As a lesser (IMHO) alternative, we could also modify Py_DECREF
to respect a new "shared" marker of some sort (perhaps relative
to bpo-33607), but that would probably still require one of the
refcount-based solutions (and add a branch to Py_DECREF).

vstinner · 2018-05-22T20:39:54Z

"That is relatively straight-forward except in one case: indicating that the other interpreter doesn't need the object to exist any more (similar to PyBuffer_Release() but more general)"

Why an interpreter would access an object from a different interpreter? Each interpreter should have its own object space, no?

ncoghlan · 2018-05-23T13:17:58Z

Adding a low level callback based mechanism to ask another interpreter to do work seems like a good way to go to me.

As an example of why that might be needed, consider the case of sharing a buffer exporting object with another subinterpreter: when the memoryview in the subinterpreter is destroyed, it needs to request that the buffer view be released in the source interpreter that actually owns the original object.

nascheme · 2018-09-16T11:54:57Z

I would suggest that sharing of objects between interpreters should be stamped out. Could we have some #ifdef debug checking that would warn or assert so this doesn't happen? I know currently we share a lot of objects. However, in the long term, that does not seem like the correct design. Instead, each interpreter should have its own objects and any passing or sharing should be tightly controlled.

ericsnowcurrently · 2018-09-17T15:41:58Z

@neil, we're definitely on the same page. In fact, in a world where subinterpreters do not share a GIL, we can't ever use an object in one interpreter that was created in another (due to thread safety on refcounts). The role of "tightly controlling" passing/sharing objects (for a very loose definition of "sharing") falls to the channels described in PEP-554. [1]

However, there are several circumstances where interpreters may collaborate that involves one holding a reference (but not using it) to an object owned by the other. For instance, see PyBuffer_Release(). [2] This issue is about addressing that situation safely. It is definitely not about safely using objects from other interpreters.

[1] The low-level implementation, including channels, already exists in Modules/_xxsubinterpretersmodule.c.
[2] https://docs.python.org/3/c-api/buffer.html#c.PyBuffer_Release

ericsnowcurrently · 2019-02-24T23:40:50Z

New changeset ef4ac96 by Eric Snow in branch 'master':
bpo-33608: Factor out a private, per-interpreter _Py_AddPendingCall(). (GH-11617)
ef4ac96

ericsnowcurrently · 2019-02-25T17:40:35Z

#55826 has introduces a slight performance regression which will need to be resolved. If I can't find the problem right away then I'll probably temporarily revert the commit.

See https://mail.python.org/pipermail/python-dev/2019-February/156451.html.

ericsnowcurrently · 2019-02-26T02:29:07Z

At least, it *might* be a performance regression. Here are the two commits I tried:

after: ef4ac96 (PR bpo-11617, from this issue)
before: 463572c (the one right before)

After building each (a normal build, not debug), I ran the micro-benchmark Raymond referred to ("./python Tools/scripts/var_access_benchmark.py") multiple times. There was enough variability between runs from the same commit that I'm uncertain at this point if there really is any performance regression. For the most part the results between the two commits are really close. Also, the results from the "after" commit fall within the variability I've seen between runs of the "before" commit.

I'm going to try with the "performance" suite (https://github.com/python/performance) to see if it shows any regression.

ericsnowcurrently · 2019-02-26T04:10:24Z

Here's what I did (more or less):

$ python3 -m pip install --user perf
$ python3 -m perf system tune
$ git clone git@github.com:python/performance.git
$ git clone git@github.com:python/cpython.git
$ cd cpython
$ git checkout ef4ac967e2f3a9a18330cc6abe14adb4bc3d0465
$ ./configure
$ make -j8
$ ./python ../performance/pyperformance run --fast --output speed.after
$ git checkout HEAD^
$ make -j8
$ ./python ../performance/pyperformance run --fast --output speed.before
$ ./python ../performance/pyperformance compare speed.before speed.after -O table

Here are my results:

speed.before
============

Performance version: 0.7.0
Report on Linux-4.15.0-45-generic-x86_64-with-glibc2.26
Number of logical CPUs: 4
Start date: 2019-02-25 20:39:44.151699
End date: 2019-02-25 20:48:04.817516

speed.after
===========

Performance version: 0.7.0
Report on Linux-4.15.0-45-generic-x86_64-with-glibc2.26
Number of logical CPUs: 4
Start date: 2019-02-25 20:29:56.610702
End date: 2019-02-25 20:38:11.667461

+-------------------------+--------------+-------------+--------------+-----------------------+
| Benchmark | speed.before | speed.after | Change | Significance |
+=========================+==============+=============+==============+=======================+
| 2to3 | 447 ms | 440 ms | 1.02x faster | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| chaos | 155 ms | 156 ms | 1.01x slower | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| crypto_pyaes | 154 ms | 152 ms | 1.01x faster | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| deltablue | 10.7 ms | 10.5 ms | 1.01x faster | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| django_template | 177 ms | 172 ms | 1.03x faster | Significant (t=3.66) |
+-------------------------+--------------+-------------+--------------+-----------------------+
| dulwich_log | 105 ms | 106 ms | 1.01x slower | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| fannkuch | 572 ms | 573 ms | 1.00x slower | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| float | 149 ms | 146 ms | 1.02x faster | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| go | 351 ms | 349 ms | 1.00x faster | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| hexiom | 14.6 ms | 14.4 ms | 1.01x faster | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| html5lib | 126 ms | 122 ms | 1.03x faster | Significant (t=3.46) |
+-------------------------+--------------+-------------+--------------+-----------------------+
| json_dumps | 17.6 ms | 17.2 ms | 1.02x faster | Significant (t=2.65) |
+-------------------------+--------------+-------------+--------------+-----------------------+
| json_loads | 36.4 us | 36.1 us | 1.01x faster | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| logging_format | 15.2 us | 14.9 us | 1.02x faster | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| logging_silent | 292 ns | 296 ns | 1.01x slower | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| logging_simple | 13.7 us | 13.4 us | 1.02x faster | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| mako | 22.9 ms | 22.5 ms | 1.02x faster | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| meteor_contest | 134 ms | 134 ms | 1.00x faster | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| nbody | 157 ms | 161 ms | 1.03x slower | Significant (t=-3.85) |
+-------------------------+--------------+-------------+--------------+-----------------------+
| nqueens | 134 ms | 132 ms | 1.02x faster | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| pathlib | 30.1 ms | 31.0 ms | 1.03x slower | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| pickle | 11.5 us | 11.6 us | 1.01x slower | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| pickle_dict | 29.5 us | 30.5 us | 1.03x slower | Significant (t=-6.37) |
+-------------------------+--------------+-------------+--------------+-----------------------+
| pickle_list | 4.54 us | 4.58 us | 1.01x slower | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| pickle_pure_python | 652 us | 651 us | 1.00x faster | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| pidigits | 212 ms | 215 ms | 1.02x slower | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| python_startup | 11.6 ms | 11.5 ms | 1.01x faster | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| python_startup_no_site | 8.07 ms | 7.95 ms | 1.01x faster | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| raytrace | 729 ms | 722 ms | 1.01x faster | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| regex_compile | 249 ms | 247 ms | 1.01x faster | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| regex_dna | 203 ms | 204 ms | 1.00x slower | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| regex_effbot | 3.55 ms | 3.55 ms | 1.00x faster | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| regex_v8 | 28.3 ms | 28.3 ms | 1.00x slower | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| richards | 105 ms | 105 ms | 1.00x faster | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| scimark_fft | 429 ms | 436 ms | 1.01x slower | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| scimark_lu | 238 ms | 237 ms | 1.00x faster | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| scimark_monte_carlo | 144 ms | 139 ms | 1.04x faster | Significant (t=3.61) |
+-------------------------+--------------+-------------+--------------+-----------------------+
| scimark_sor | 265 ms | 260 ms | 1.02x faster | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| scimark_sparse_mat_mult | 5.41 ms | 5.25 ms | 1.03x faster | Significant (t=4.26) |
+-------------------------+--------------+-------------+--------------+-----------------------+
| spectral_norm | 174 ms | 173 ms | 1.01x faster | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| sqlalchemy_declarative | 216 ms | 214 ms | 1.01x faster | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| sqlalchemy_imperative | 41.6 ms | 41.2 ms | 1.01x faster | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| sqlite_synth | 3.99 us | 3.91 us | 1.02x faster | Significant (t=2.49) |
+-------------------------+--------------+-------------+--------------+-----------------------+
| sympy_expand | 559 ms | 559 ms | 1.00x faster | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| sympy_integrate | 25.2 ms | 25.3 ms | 1.00x slower | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| sympy_str | 261 ms | 260 ms | 1.00x faster | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| sympy_sum | 136 ms | 138 ms | 1.01x slower | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| telco | 8.36 ms | 8.32 ms | 1.00x faster | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| tornado_http | 273 ms | 271 ms | 1.01x faster | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| unpack_sequence | 58.8 ns | 60.0 ns | 1.02x slower | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| unpickle | 21.5 us | 21.5 us | 1.00x faster | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| unpickle_list | 5.60 us | 5.55 us | 1.01x faster | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| unpickle_pure_python | 497 us | 481 us | 1.03x faster | Significant (t=5.04) |
+-------------------------+--------------+-------------+--------------+-----------------------+
| xml_etree_generate | 141 ms | 140 ms | 1.01x faster | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| xml_etree_iterparse | 131 ms | 133 ms | 1.01x slower | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| xml_etree_parse | 186 ms | 187 ms | 1.00x slower | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+
| xml_etree_process | 115 ms | 113 ms | 1.01x faster | Not significant |
+-------------------------+--------------+-------------+--------------+-----------------------+

ericsnowcurrently · 2019-02-26T04:11:46Z

...so it doesn't appear that my PR introduces a performance regression.

vstinner · 2019-02-26T14:37:46Z

...so it doesn't appear that my PR introduces a performance regression.

IMHO there is no performance regression at all. Just noice in the result which doesn't come with std dev.

ericsnowcurrently · 2019-02-26T15:57:24Z

FYI, I have a couple of small follow-up changes to land before I close this issue.

db3l · 2019-03-01T07:20:43Z

I suspect changes for this issue may be creating test_io failures on my windows builders, most notably my x86 Windows 7 builder where test_io has been failing both attempts on almost all builds. It fails with a lock failure during interpreter shutdown, and commit ef4ac96 appears the most plausible commit out of the set introduced at the first failing build on Feb 24.

See https://buildbot.python.org/all/#/builders/58/builds/1977 for the first failure. test_io has failed both attempts on all but 3 of the subsequent 16 tests of the 3.x branch.

It might be worth noting that this builder is slow, so if there are timeouts involved or a race condition of any sort it might be triggering it more readily than other builders. I do, however, see the same failures - albeit less frequently and not usually on both tries - on the Win8 and Win10 builders.

For what it's worth one other oddity is that while having test_multiprocessing* failures are relatively common on the Win7 builder during the first round of tests due to exceeding the timeout, they usually pass on the retry, but over this same time frame have begun failing - not due to timeout - even on the second attempt, which is unusual. It might be coincidental but similar failures are also showing up sporadically on the Win8/Win10 builders where such failures are not common at all.

ericsnowcurrently · 2019-03-01T19:35:15Z

New changeset b05b711 by Eric Snow in branch 'master':
bpo-33608: Use _Py_AddPendingCall() in _PyCrossInterpreterData_Release(). (gh-12024)
b05b711

ericsnowcurrently · 2019-03-01T20:15:49Z

New changeset bda918b by Eric Snow in branch 'master':
bpo-33608: Simplify ceval's DISPATCH by hoisting eval_breaker ahead of time. (gh-12062)
bda918b

ericsnowcurrently · 2019-03-01T20:22:41Z

I've merged the PR for hoisting eval_breaker before the ceval loop starts. @Raymond, see if that addresses the performance regression you've been seeing.

I'll close this issue after I've sorted out the buildbot issues David mentioned (on his Windows builders).

db3l · 2019-03-02T00:27:47Z

If I can help with testing or builder access or anything just let me know. It appears that I can pretty reliably trigger the error through just the individual tests (test_daemon_threads_shutdown_std{out,err}_deadlock) in isolation, which is definitely less tedious than a full buildbot run to test a change.

BTW, while not directly related since it was only just merged, and I won't pretend to have any real understanding of the changes here, I do have a question about PR 12024 ... it appears HEAD_UNLOCK is used twice in _PyInterpreterState_LookUpID. Shouldn't one of those be HEAD_LOCK?

vstinner · 2019-03-04T08:22:10Z

I suspect changes for this issue may be creating test_io failures on my windows builders, (...)

I created bpo-36177 to track this regression.

vstinner · 2019-03-04T12:16:58Z

The commit ef4ac96 introduced a regression in test.test_multiprocessing_spawn.WithThreadsTestManagerRestart.test_rapid_restart: bpo-36114.

Would it be possible to revert it until the bug is properly understood and fixed?

vstinner · 2019-03-04T13:12:33Z

The commit ef4ac96 introduced a regression in test.test_multiprocessing_spawn.WithThreadsTestManagerRestart.test_rapid_restart: bpo-36114.

There is a very similar bug on Windows: bpo-36116.

I'm now also quite confident that bpo-36177 is also a regression caused by this issue.

vstinner · 2019-03-04T13:21:31Z

New changeset 4d61e6e by Victor Stinner in branch 'master':
Revert: bpo-33608: Factor out a private, per-interpreter _Py_AddPendingCall(). (GH-11617) (GH-12159)
4d61e6e

vstinner · 2019-03-04T13:21:47Z

Hi Eric, I'm sorry but I had to revert your recent work. It introduced regressions in at least in test_io and test_multiprocessing_spawn on Windows and FreeBSD. I simply don't have the bandwidth to investigate this regression yet, whereas we really need the CI to remain stable to catch other regressions (the master branch received multiple new features recently, and there are some other regressions by that way ;-)).

test_multiprocessing_spawn is *very* hard to reproduce on FreeBSD, but I can reliably reproduce it. It just takes time. The issue is a crash producing a coredump. I consider that the bug is important enough to justify a revert.

The revert is an opportunity to seat down and track the root cause rather than urging to push a "temporary" quickfix. This bug seems to be very deep in the Python internals: thread state during Python shutdown. So it will take time to fully understand it and fix it (or redesign your recent changes, I don't know).

Tell me if you need my help to reproduce the bug. The bug has been seen on FreeBSD but also Windows:

test_multiprocessing_spawn started to produce coredumps on FreeBSD: https://bugs.python.org/issue36114
test_multiprocessing_spawn started to fail randomly on Windows: https://bugs.python.org/issue36116
test_io started to fail randomly on Windows: https://bugs.python.org/issue36177

-- The Night's Watch

vstinner · 2019-03-04T13:25:42Z

For curious people, Pablo Galindo spent a few hours to investigate https://bugs.python.org/issue36114 and I spent a few more hours to finish the analysis. The bug indirectly crashed my laptop :-)
https://twitter.com/VictorStinner/status/1102528982036201478

That's what I mean by "I don't have the bandwidth": we already spent hours to identify the regression ;-)

ericsnowcurrently · 2019-03-04T23:58:01Z

That's okay, Victor. Thanks for jumping on this. I'll take a look when I get a chance.

vstinner · 2019-03-05T11:23:11Z

That's okay, Victor. Thanks for jumping on this. I'll take a look when I get a chance.

From what I saw, your first commit was enough to reproduce the crash.

If I recall correctly, Antoine Pitrou modified the GIL so threads exit immediately when Py_Finalize() is called. I'm thinking at:

void
PyEval_RestoreThread(PyThreadState *tstate)
{
    ...
    take_gil(tstate);
    /* _Py_Finalizing is protected by the GIL */
    if (_Py_IsFinalizing() && !_Py_CURRENTLY_FINALIZING(tstate)) {
        drop_gil(tstate);
        PyThread_exit_thread();
        Py_UNREACHABLE();
    }
    ...
    PyThreadState_Swap(tstate);
}

Problem: this code uses tstate, whereas the crash occurred because tstate pointed to freed memory:

"""
Thread 1 got the crash

(gdb) p *tstate
$3 = {
prev = 0xdbdbdbdbdbdbdbdb,
next = 0xdbdbdbdbdbdbdbdb,
interp = 0xdbdbdbdbdbdbdbdb,
...
}

...

Thread 1 (LWP 100696):
#0 0x0000000000368210 in take_gil (tstate=0x8027e2050) at Python/ceval_gil.h:216
#1 0x0000000000368a94 in PyEval_RestoreThread (tstate=0x8027e2050) at Python/ceval.c:281
...
"""

https://bugs.python.org/issue36114#msg337090

When this crash occurred, Py_Finalize() already completed in the main thread!

"""
void _Py_NO_RETURN
Py_Exit(int sts)
{
    if (Py_FinalizeEx() < 0) {  /* <==== DONE! */
        sts = 120;
    }

exit(sts);    /* \<=============== Crash occurred here! \*/

}
"""

Py_Finalize() is supposed to wait for threads before deleting Python thread states:

"""
int
Py_FinalizeEx(void)
{
    ...

/* The interpreter is still entirely intact at this point, and the
 * exit funcs may be relying on that.  In particular, if some thread
 * or exit func is still waiting to do an import, the import machinery
 * expects Py_IsInitialized() to return true.  So don't say the
 * interpreter is uninitialized until after the exit funcs have run.
 * Note that Threading.py uses an exit func to do a join on all the
 * threads created thru it, so this also protects pending imports in
 * the threads created via Threading.
 */
call_py_exitfuncs(interp);

...

/* Remaining threads (e.g. daemon threads) will automatically exit
   after taking the GIL (in PyEval_RestoreThread()). */
_PyRuntime.finalizing = tstate;
_PyRuntime.initialized = 0;
_PyRuntime.core_initialized = 0;
...

    /* Delete current thread. After this, many C API calls become crashy. */
    PyThreadState_Swap(NULL);

    PyInterpreterState_Delete(interp);

...

}
"""

The real problem for years are *deamon threads* which... BY DESIGN... remain alive after Py_Finalize() exit! But as I explained, they must exit as soon at they attempt to get GIL.

ericsnowcurrently · 2019-03-05T16:08:16Z

Thanks for taking a look Victor! That info is helpful. It will likely be a few days before I can sort this out.

Once I have addressed the problem I'll re-merge. I plan on using the "buildbot-custom" branch to make sure the buildbots are happy with the change before making that PR.

ericsnowcurrently · 2019-06-03T15:58:29Z

FWIW, the failures before were in test_io and test_multiprocessing_spawn, not test_asyncio.

ericsnowcurrently · 2019-06-03T16:08:00Z

I had really hoped to get this in for 3.8, as I may need it. However, realistically I doubt it will take just a few minutes to resolve this. With the beta 1 targeted for today it makes sense to revert my change. :(

(I may still try to convince our fearless RM to allow it in beta 2, but we'll see.)

vstinner · 2019-06-03T16:13:34Z

FWIW, the failures before were in test_io and test_multiprocessing_spawn,
not test_asyncio.

I am not aware of a test_asyncio issue related to this issue, only crashes
in multiprocessing. I bet on a similar crash than the previous attempt,
during shutdown.

vstinner · 2019-06-03T16:14:40Z

New changeset e225beb by Victor Stinner in branch 'master':
Revert "bpo-33608: Factor out a private, per-interpreter _Py_AddPendingCall(). (gh-13714)" (GH-13780)
e225beb

vstinner · 2019-06-03T16:23:20Z

What annoy me the most in this issue is that I am unable to reproduce it. Even directly on the FreeBSD CURRENT buildbot, I failed to reproduce rhe crash even when I stressed the buildbot and the test.

On the other side, when the CI runs the test suite, crashes are likely!?

These bugs are the worst.

IMHO we must make multiprocessing more deterministic. Python 3.8 is better, but there are still many known issues.
https://pythondev.readthedocs.io/unstable_tests.html

test_venv multiprocessing test misuses the multiprocessing issue, it emirs a ResourceWarning.

vstinner · 2019-06-03T16:40:31Z

My bet is that multiprocessing triggers a bug in daemon threads. Multiprocessing is using and abusing daemon threads. For example, as I wrote, test_venv misuses the multiprocessing API and implicitly let Python release (or maybe not!) "resources" where "resources" can be processes and threads, likely including daemon threads.

Eric: I would love to give you access to a VM where I can reproduce the bug but... I failed to reproduce the bug today :-(

I would suggest to write a stress test for daemon threads:

spawn daemon threads: 4x times threads than CPU to stress the system
each thread would sleep a random number of milliseconds and then execute a few pure Python instructions
spawns these threads, wait a random number of milliseconds, and then "exit Python"

The race condition is that daemon threads may or maybe run during Python finalization depending on the delay.

Maybe you can make the crash more likely by adding a sleep of a few *seconds* before the final exit. For example, at the exit of Py_RunMain().

Last time I saw the crash, it was a thread which uses a Python structure whereas the memory of the structure was freed (filled with a random pattern by debug hooks on memory allocators).

Ah, either use a debug build of Python, or use PYTHONMALLOC=debug, or -X dev command line option, to fill freed memory with a specific pattern, to trigger the crash.

... Good luck.

ericsnowcurrently · 2019-06-03T16:46:14Z

Well, for this PR I actually disabled the execution of pending calls during runtime finialization. I'd hoped it would help, but it did not. :(

Regardless, I'll need to look even more closely at what is different during runtime finalization with my PR.

ericsnowcurrently · 2019-06-03T16:47:07Z

Also, thanks for the suggestions, Victor!

phmc · 2019-11-21T13:48:28Z

Based on Victor's info from https://bugs.python.org/issue36114#msg337090 I believe the crash is essentially what's reproduced in the attached program.

From the root of a (built) cpython clone run:

gcc -c -o fini_crash.o -IInclude -I. fini_crash.c && gcc -o fini_crash fini_crash.o libpython3.9.a -lcrypt -lpthread -ldl -lutil -lm && ./fini_crash

The output should be:

MAIN: allow other thread to execute
OTHER: acquired GIL
OTHER: released GIL
MAIN: interpreter finalized
OTHER: attempt to acquire GIL...crash!
[1] 266749 segmentation fault (core dumped) ./fini_crash

And running it through valgrind:

$ valgrind --suppressions=Misc/valgrind-python.supp fini_crash                                                                                                                 -- COMMAND -- 13:4[12/5973]
==266836== Memcheck, a memory error detector
==266836== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==266836== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==266836== Command: fini_crash                    
==266836==                                            
MAIN: allow other thread to execute                       
OTHER: acquired GIL                                
OTHER: released GIL                                                                                                    
MAIN: interpreter finalized
OTHER: attempt to acquire GIL...crash!                                                                                 
==266836== Thread 2:                                                                                                   
==266836== Invalid read of size 8                                                                                      
==266836==    at 0x15607D: PyEval_RestoreThread (ceval.c:389)                                                                                                                                                                                  
==266836==    by 0x15479F: evil_main (in /home/phconnel/dev/cpython/fini_crash)
==266836==    by 0x48B94CE: start_thread (in /usr/lib/libpthread-2.30.so)
==266836==    by 0x4B232D2: clone (in /usr/lib/libc-2.30.so)
==266836==  Address 0x4d17270 is 16 bytes inside a block of size 264 free'd
==266836==    at 0x48399AB: free (vg_replace_malloc.c:540)
==266836==    by 0x1773FF: tstate_delete_common (pystate.c:829)
==266836==    by 0x1773FF: _PyThreadState_Delete (pystate.c:848)
==266836==    by 0x1773FF: zapthreads (pystate.c:311)
==266836==    by 0x1773FF: PyInterpreterState_Delete (pystate.c:321)
==266836==    by 0x174920: finalize_interp_delete (pylifecycle.c:1242)
==266836==    by 0x174920: Py_FinalizeEx.part.0 (pylifecycle.c:1400)
==266836==    by 0x15487B: main (in /home/phconnel/dev/cpython/fini_crash)
==266836==  Block was alloc'd at
==266836==    at 0x483877F: malloc (vg_replace_malloc.c:309)
==266836==    by 0x178D7C: new_threadstate (pystate.c:557)
==266836==    by 0x178D7C: PyThreadState_New (pystate.c:629)
==266836==    by 0x178D7C: PyGILState_Ensure (pystate.c:1288)
==266836==    by 0x154759: evil_main (in /home/phconnel/dev/cpython/fini_crash)
==266836==    by 0x48B94CE: start_thread (in /usr/lib/libpthread-2.30.so)
==266836==    by 0x4B232D2: clone (in /usr/lib/libc-2.30.so)
==266836== 
==266836== Invalid read of size 8
==266836==    at 0x156081: PyEval_RestoreThread (ceval.c:389)
==266836==    by 0x15479F: evil_main (in /home/phconnel/dev/cpython/fini_crash)
==266836==    by 0x48B94CE: start_thread (in /usr/lib/libpthread-2.30.so)
==266836==    by 0x4B232D2: clone (in /usr/lib/libc-2.30.so)
==266836==  Address 0x4c3a0f0 is 16 bytes inside a block of size 2,960 free'd
==266836==    at 0x48399AB: free (vg_replace_malloc.c:540)
==266836==    by 0x174920: finalize_interp_delete (pylifecycle.c:1242)
==266836==    by 0x174920: Py_FinalizeEx.part.0 (pylifecycle.c:1400)
==266836==    by 0x15487B: main (in /home/phconnel/dev/cpython/fini_crash)
==266836==  Block was alloc'd at
==266836==    at 0x483877F: malloc (vg_replace_malloc.c:309)
==266836==    by 0x177153: PyInterpreterState_New (pystate.c:205)
==266836==    by 0x1732BF: pycore_create_interpreter (pylifecycle.c:526)
==266836==    by 0x1732BF: pyinit_config.constprop.0 (pylifecycle.c:695)
==266836==    by 0x1766B7: pyinit_core (pylifecycle.c:879)
==266836==    by 0x1766B7: Py_InitializeFromConfig (pylifecycle.c:1055)
==266836==    by 0x1766B7: Py_InitializeEx (pylifecycle.c:1093)
==266836==    by 0x154801: main (in /home/phconnel/dev/cpython/fini_crash)
==266836==

phmc · 2019-11-21T13:55:58Z

Just to summarise, I'm fairly sure this is exactly what Victor saw: a daemon thread attempts to reacquire the GIL via Py_END_ALLOW_THREADS after interpreter finalisation. Obviously the threadstate pointer held by the thread is then invalid...so we crash.

So I see basically two options:

Don't (always) free threadstate structures in Py_Finalize, and figure out a way to avoid leaking them (if Python is re-initialized in the same process).
Ban this behaviour entirely, e.g. have Py_Finalize fail if there are live threads with threadstate objects.

The discussion so far assumes that we should support this, i.e. #1. Any thoughts on that? (I'll have a think about whether this is actually doable!)

phmc · 2019-11-21T16:02:09Z

The attached patch (wrap_threadstate.diff) is enough to stop the crash. It's a slightly dirty proof-of-concept, but equally could be the basis for a solution.

The main functional issue is that there's still a race on the Py_BLOCK_THREADS side: it's possible that the "is threadstate still valid" check can pass, but the interpreter is finalised while the daemon thread is trying to reacquire the GIL in PyEval_RestoreThread.

(The Py_UNBLOCK_THREADS side is non-racy as the GIL is held while the ts and wrapper updates are done).

ericsnowcurrently · 2019-12-13T22:31:46Z

Sorry for the delay, Phil. I'll try to take a look in the next couple of hours.

ericsnowcurrently · 2019-12-14T00:08:40Z

I'm out of time and this deserves some careful discussion. I'll get to it next Friday (or sooner if possible). Sorry!

ericsnowcurrently · 2019-12-20T23:46:05Z

Thanks for the detailed analysis, Phil. I think the results are pretty conclusive: daemon threads are the worst. :) But seriously, thanks.

As you demonstrated, it isn't just Python "daemon" threads that cause the problem. It is essentially any external access of the C-API once runtime finalization has started. The docs [1] aren't super clear about it, but there are some fundamental assumptions we make about runtime finalization:

no use of the C-API while Py_FinalizeEx() is executing (except for a few helpers like Py_Initialized)
only a small portion of the C-API is available afterward (at least until Py_Initialize() is run)

I guess the real question is what to do about this?

Given that this is essentially a separate problem, let's move further discussion and effort over related to sorting out problematic threads to bpo-36476, "Runtime finalization assumes all other threads have exited." @phil, would you mind attaching those same two files to that issue?

[1] https://docs.python.org/3/c-api/init.html#c.Py_FinalizeEx

vstinner · 2020-03-20T14:20:31Z

I partially reimplemented commit ef4ac96 in the following issues:

bpo-39984: Move some ceval fields from _PyRuntime.ceval to PyInterpreterState.ceval
bpo-40010: Inefficient signal handling in multithreaded applications
bpo-39877: Daemon thread is crashing in PyEval_RestoreThread() while the main thread is exiting the process
bpo-37127: Handling pending calls during runtime finalization may cause problems.

I cannot give a list of commits, I made too many of them :-D

The most important change is:

commit 50e6e99
Author: Victor Stinner <vstinner@python.org>
Date: Thu Mar 19 02:41:21 2020 +0100

bpo-39984: Move pending calls to PyInterpreterState (GH-19066)

If Py_AddPendingCall() is called in a subinterpreter, the function is
now scheduled to be called from the subinterpreter, rather than being
called from the main interpreter.

Each subinterpreter now has its own list of scheduled calls.

* Move pending and eval_breaker fields from _PyRuntimeState.ceval
  to PyInterpreterState.ceval.
* new_interpreter() now calls _PyEval_InitThreads() to create
  pending calls lock.
* Fix Py_AddPendingCall() for subinterpreters. It now calls
  _PyThreadState_GET() which works in a subinterpreter if the
  caller holds the GIL, and only falls back on
  PyGILState_GetThisThreadState() if _PyThreadState_GET()
  returns NULL.

My plan is now to fix pending calls in subinterpreters. Currently, they are only executed in the "main thread"... _PyRuntimeState.main_thread must be moved to PyInterpreterState, as it was done in commit ef4ac96.

This issue shows that it's dangerous to change too many things at once. Python internals were still very fragile: bpo-39877 and and bpo-37127 are good examples of that. I'm trying to move *very slowly*. Move pieces one by one.

I added _Py_ThreadCanHandlePendingCalls() function in commit d831688 (bpo-40010): it should ease moving _PyRuntimeState.main_thread to PyInterpreterState. I also added _Py_ThreadCanHandleSignals() in a previous commit, so it's easier to change which thread is allowed or not to handle signals and pending calls.

I also plan to revisit (removal) pending_calls.finalizing added by commit 842a2f0 (bpo-33608). I plan to work on that in bpo-37127.

markshannon · 2020-03-27T16:10:07Z

Just to add my 2 cents.

I think this a bad idea and is likely to be unsafe.
Having interpreters interfering with each other's objects is almost certain to lead to race conditions.

IMO, objects should *never* be shared across interpreters (once interpreters are expected to be able to run in parallel).

Any channel between two interpreters should consist of two objects, one per interpreter. The shared memory they use to communicated can be managed by the runtime. Likewise for inter-interpreter locks. The lock itself should be managed by the runtime, with each interpreter having its own lock object with a handle on the shared lock.

ericsnowcurrently · 2020-03-27T16:24:45Z

FYI, in bpo-39984 Victor moved pending calls to PyInterpreterState, which was part of my reverted change. However, there are a few other pieces of that change that need to be applied before this issue is resolved. I'm not sure when I'll get to it, but hopefully soon.

vstinner · 2020-03-27T17:31:30Z

Having interpreters interfering with each other's objects is almost certain to lead to race conditions.

To be honest, I don't understand the purpose of this issue. Some messages are talking about pending calls, some others are talking about channels, and a few are talking about "shared" objects.

Can someone either clarify the exact intent of this issue, or close this one and create a new issue with a clear explanation of what should be done and why?

vstinner · 2020-04-08T21:20:10Z

New changeset cfc3c2f by Victor Stinner in branch 'master':
bpo-37127: Remove _pending_calls.finishing (GH-19439)
cfc3c2f

vstinner · 2020-04-08T21:37:52Z

I created bpo-40231: Fix pending calls in subinterpreters.

vstinner · 2020-04-08T21:48:47Z

Pavel Kostyuchenko:

(...) The error seems to be caused not by those changes, but by lack of synchronization in the multiprocessing.managers.Server.

Pavel: would you mind to open a separated issue to suggest to add synchronization and/or avoid daemon thread in multiprocessing?

The concurrent.futures module was recently modified to avoid daemon threads in bpo-39812.

vstinner · 2020-04-08T21:50:37Z

FYI, after merging that PR I realized that the COMPUTE_EVAL_BREAKER macro isn't quite right.

I reworked this function in bpo-40010.

vstinner · 2020-04-08T21:58:42Z

This issue has a long history. A change has been applied and then reverted three times in a row. Pending calls are now per-interpreter.

The issue title is "Add a cross-interpreter-safe mechanism to indicate that an object may be destroyed." but I don't understand if pending calls are expected to be used to communicate between two interpreters. Why not using a UNIX pipe and exchange bytes through it? Py_AddPendingCall() is a weird concept. I would prefer to not abuse it.

Moreover, it's unclear if this issue attempts to *share* a same object between two interpreters. I would prefer to avoid that by any possible way.

I close this issue with a complex history.

If someone wants to continue to work on this topic, please open an issue with a very clear description of what should be done and how it is supposed to be used.

ericsnowcurrently · 2020-04-08T23:04:28Z

I close this issue with a complex history.

If someone wants to continue to work on this topic, please open an issue with a very clear description of what should be done and how it is supposed to be used.

Yeah, there is more to do. I'll create a new issue.

ericsnowcurrently added the 3.8 only security fixes label May 22, 2018

vstinner changed the title ~~Add a cross-interpreter-safe mechanism to indicate that an object may be destroyed.~~ [subinterpreters] Add a cross-interpreter-safe mechanism to indicate that an object may be destroyed. May 22, 2018

ericsnowcurrently self-assigned this Sep 17, 2018

vstinner added 3.9 only security fixes and removed 3.8 only security fixes labels Apr 8, 2020

vstinner closed this as completed Apr 8, 2020

vstinner added the topic-subinterpreters label Jun 3, 2020

ezio-melotti transferred this issue from another repository Apr 10, 2022

ericsnowcurrently mentioned this issue May 23, 2023

test__xxsubinterpreters is Occasionally Crashing #104341

Closed

Add a cross-interpreter-safe mechanism to indicate that an object may be destroyed. #77789

Add a cross-interpreter-safe mechanism to indicate that an object may be destroyed. #77789

Comments

ericsnowcurrently commented May 22, 2018

ericsnowcurrently commented May 22, 2018

ericsnowcurrently commented May 22, 2018

vstinner commented May 22, 2018

ncoghlan commented May 23, 2018

nascheme commented Sep 16, 2018

ericsnowcurrently commented Sep 17, 2018

ericsnowcurrently commented Feb 24, 2019

ericsnowcurrently commented Feb 25, 2019

ericsnowcurrently commented Feb 26, 2019

ericsnowcurrently commented Feb 26, 2019

ericsnowcurrently commented Feb 26, 2019

vstinner commented Feb 26, 2019

ericsnowcurrently commented Feb 26, 2019

db3l commented Mar 1, 2019

ericsnowcurrently commented Mar 1, 2019

ericsnowcurrently commented Mar 1, 2019

ericsnowcurrently commented Mar 1, 2019

db3l commented Mar 2, 2019

vstinner commented Mar 4, 2019

vstinner commented Mar 4, 2019

vstinner commented Mar 4, 2019

vstinner commented Mar 4, 2019

vstinner commented Mar 4, 2019

vstinner commented Mar 4, 2019

ericsnowcurrently commented Mar 4, 2019

vstinner commented Mar 5, 2019

ericsnowcurrently commented Mar 5, 2019

ericsnowcurrently commented Jun 3, 2019

ericsnowcurrently commented Jun 3, 2019

vstinner commented Jun 3, 2019

vstinner commented Jun 3, 2019

vstinner commented Jun 3, 2019

vstinner commented Jun 3, 2019

ericsnowcurrently commented Jun 3, 2019

ericsnowcurrently commented Jun 3, 2019

phmc mannequin commented Nov 21, 2019

phmc mannequin commented Nov 21, 2019

phmc mannequin commented Nov 21, 2019

ericsnowcurrently commented Dec 13, 2019

ericsnowcurrently commented Dec 14, 2019

ericsnowcurrently commented Dec 20, 2019

vstinner commented Mar 20, 2020

markshannon commented Mar 27, 2020

ericsnowcurrently commented Mar 27, 2020

vstinner commented Mar 27, 2020

vstinner commented Apr 8, 2020

vstinner commented Apr 8, 2020

vstinner commented Apr 8, 2020

vstinner commented Apr 8, 2020

vstinner commented Apr 8, 2020

ericsnowcurrently commented Apr 8, 2020