This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients eric.snow, nanjekyejoannah, ncoghlan, pablogsal, vstinner
Date 2020-03-06.14:22:28
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1583504548.68.0.765659096989.issue39877@roundup.psfhosted.org>
In-reply-to
Content
Sometimes, test_multiprocessing_spawn does crash in PyEval_RestoreThread() on FreeBSD with a coredump. This issue should be the root cause of bpo-39088: "test_concurrent_futures crashed with python.core core dump on AMD64 FreeBSD Shared 3.x", where the second comment is a test_multiprocessing_spawn failure with "...  After:  ['python.core'] ..."

# Thread 1

(gdb) frame
#0  0x00000000003b518c in PyEval_RestoreThread (tstate=0x801f23790) at Python/ceval.c:387
387         _PyRuntimeState *runtime = tstate->interp->runtime;
(gdb) p tstate->interp
$3 = (PyInterpreterState *) 0xdddddddddddddddd

(gdb) info threads
  Id   Target Id         Frame
* 1    LWP 100839        0x00000000003b518c in PyEval_RestoreThread (tstate=0x801f23790) at Python/ceval.c:387
  2    LWP 100230        0x00000008006fbcfc in _fini () from /lib/libm.so.5
  3    LWP 100192        _accept4 () at _accept4.S:3

# Thread 2

(gdb) thread 2
[Switching to thread 2 (LWP 100230)]
#0  0x00000008006fbcfc in _fini () from /lib/libm.so.5

(gdb) where
(...)
#4  0x0000000800859431 in exit (status=0) at /usr/src/lib/libc/stdlib/exit.c:74
#5  0x000000000048f3d8 in Py_Exit (sts=0) at Python/pylifecycle.c:2349
(...)

The problem is that Python already freed the memory of all PyThreadState structures, whereas PyEval_RestoreThread(tstate) dereferences tstate to get the _PyRuntimeState structure:

void
PyEval_RestoreThread(PyThreadState *tstate)
{
    assert(tstate != NULL);

    _PyRuntimeState *runtime = tstate->interp->runtime;  // <==== HERE ===
    struct _ceval_runtime_state *ceval = &runtime->ceval;
    assert(gil_created(&ceval->gil));

    int err = errno;
    take_gil(ceval, tstate);
    exit_thread_if_finalizing(tstate);
    errno = err;

    _PyThreadState_Swap(&runtime->gilstate, tstate);
}


I modified PyEval_RestoreThread(tstate) to get runtime from tstate: commit 01b1cc12e7c6a3d6a3d27ba7c731687d57aae92a. Extract of the change:

diff --git a/Python/ceval.c b/Python/ceval.c
index 9f4b43615e..ee13fd1ad7 100644
--- a/Python/ceval.c
+++ b/Python/ceval.c
@@ -384,7 +386,7 @@ PyEval_SaveThread(void)
 void
 PyEval_RestoreThread(PyThreadState *tstate)
 {
-    _PyRuntimeState *runtime = &_PyRuntime;
+    _PyRuntimeState *runtime = tstate->interp->runtime;
     struct _ceval_runtime_state *ceval = &runtime->ceval;
 
     if (tstate == NULL) {
@@ -394,7 +396,7 @@ PyEval_RestoreThread(PyThreadState *tstate)
 
     int err = errno;
     take_gil(ceval, tstate);
-    exit_thread_if_finalizing(runtime, tstate);
+    exit_thread_if_finalizing(tstate);
     errno = err;
 
     _PyThreadState_Swap(&runtime->gilstate, tstate);


By the way, exit_thread_if_finalizing(tstate) also get runtime from state.


Before 01b1cc12e7c6a3d6a3d27ba7c731687d57aae92a, it was possible to call PyEval_RestoreThread() from a daemon thread even if tstate was a dangling pointer, since tstate wasn't dereferenced: _PyRuntime variable was accessed directly.

--

One simple fix is to access directly _PyRuntime in PyEval_RestoreThread() with a comment explaining why runtime is not get from tstate.

I'm concerned by the fact that only FreeBSD buildbot spotted the crash. test_multiprocessing_spawn seems to silently ignore crashes. The bug was only spotted because Python produced a coredump in the current directory. My Fedora 31 doesn't write coredump files in the current files, and so the issue is silently ignored even when using --fail-env-changed.

IMHO the most reliable solution is to drop support for daemon threads: they are dangerous by design. But that would be an incompatible change. Maybe we should at least deprecate daemon threads. Python 3.9 now denies spawning a daemon thread in a Python subinterpreter: bpo-37266.
History
Date User Action Args
2020-03-06 14:22:28vstinnersetrecipients: + vstinner, ncoghlan, eric.snow, pablogsal, nanjekyejoannah
2020-03-06 14:22:28vstinnersetmessageid: <1583504548.68.0.765659096989.issue39877@roundup.psfhosted.org>
2020-03-06 14:22:28vstinnerlinkissue39877 messages
2020-03-06 14:22:28vstinnercreate