Message363512
Sometimes, test_multiprocessing_spawn does crash in PyEval_RestoreThread() on FreeBSD with a coredump. This issue should be the root cause of bpo-39088: "test_concurrent_futures crashed with python.core core dump on AMD64 FreeBSD Shared 3.x", where the second comment is a test_multiprocessing_spawn failure with "... After: ['python.core'] ..."
# Thread 1
(gdb) frame
#0 0x00000000003b518c in PyEval_RestoreThread (tstate=0x801f23790) at Python/ceval.c:387
387 _PyRuntimeState *runtime = tstate->interp->runtime;
(gdb) p tstate->interp
$3 = (PyInterpreterState *) 0xdddddddddddddddd
(gdb) info threads
Id Target Id Frame
* 1 LWP 100839 0x00000000003b518c in PyEval_RestoreThread (tstate=0x801f23790) at Python/ceval.c:387
2 LWP 100230 0x00000008006fbcfc in _fini () from /lib/libm.so.5
3 LWP 100192 _accept4 () at _accept4.S:3
# Thread 2
(gdb) thread 2
[Switching to thread 2 (LWP 100230)]
#0 0x00000008006fbcfc in _fini () from /lib/libm.so.5
(gdb) where
(...)
#4 0x0000000800859431 in exit (status=0) at /usr/src/lib/libc/stdlib/exit.c:74
#5 0x000000000048f3d8 in Py_Exit (sts=0) at Python/pylifecycle.c:2349
(...)
The problem is that Python already freed the memory of all PyThreadState structures, whereas PyEval_RestoreThread(tstate) dereferences tstate to get the _PyRuntimeState structure:
void
PyEval_RestoreThread(PyThreadState *tstate)
{
assert(tstate != NULL);
_PyRuntimeState *runtime = tstate->interp->runtime; // <==== HERE ===
struct _ceval_runtime_state *ceval = &runtime->ceval;
assert(gil_created(&ceval->gil));
int err = errno;
take_gil(ceval, tstate);
exit_thread_if_finalizing(tstate);
errno = err;
_PyThreadState_Swap(&runtime->gilstate, tstate);
}
I modified PyEval_RestoreThread(tstate) to get runtime from tstate: commit 01b1cc12e7c6a3d6a3d27ba7c731687d57aae92a. Extract of the change:
diff --git a/Python/ceval.c b/Python/ceval.c
index 9f4b43615e..ee13fd1ad7 100644
--- a/Python/ceval.c
+++ b/Python/ceval.c
@@ -384,7 +386,7 @@ PyEval_SaveThread(void)
void
PyEval_RestoreThread(PyThreadState *tstate)
{
- _PyRuntimeState *runtime = &_PyRuntime;
+ _PyRuntimeState *runtime = tstate->interp->runtime;
struct _ceval_runtime_state *ceval = &runtime->ceval;
if (tstate == NULL) {
@@ -394,7 +396,7 @@ PyEval_RestoreThread(PyThreadState *tstate)
int err = errno;
take_gil(ceval, tstate);
- exit_thread_if_finalizing(runtime, tstate);
+ exit_thread_if_finalizing(tstate);
errno = err;
_PyThreadState_Swap(&runtime->gilstate, tstate);
By the way, exit_thread_if_finalizing(tstate) also get runtime from state.
Before 01b1cc12e7c6a3d6a3d27ba7c731687d57aae92a, it was possible to call PyEval_RestoreThread() from a daemon thread even if tstate was a dangling pointer, since tstate wasn't dereferenced: _PyRuntime variable was accessed directly.
--
One simple fix is to access directly _PyRuntime in PyEval_RestoreThread() with a comment explaining why runtime is not get from tstate.
I'm concerned by the fact that only FreeBSD buildbot spotted the crash. test_multiprocessing_spawn seems to silently ignore crashes. The bug was only spotted because Python produced a coredump in the current directory. My Fedora 31 doesn't write coredump files in the current files, and so the issue is silently ignored even when using --fail-env-changed.
IMHO the most reliable solution is to drop support for daemon threads: they are dangerous by design. But that would be an incompatible change. Maybe we should at least deprecate daemon threads. Python 3.9 now denies spawning a daemon thread in a Python subinterpreter: bpo-37266. |
|
Date |
User |
Action |
Args |
2020-03-06 14:22:28 | vstinner | set | recipients:
+ vstinner, ncoghlan, eric.snow, pablogsal, nanjekyejoannah |
2020-03-06 14:22:28 | vstinner | set | messageid: <1583504548.68.0.765659096989.issue39877@roundup.psfhosted.org> |
2020-03-06 14:22:28 | vstinner | link | issue39877 messages |
2020-03-06 14:22:28 | vstinner | create | |
|