classification
Title: test_multiprocessing_spawn segfaults on AMD64 FreeBSD CURRENT Shared 3.x
Type: behavior Stage: resolved
Components: Interpreter Core, Tests Versions: Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: koobs, lukasz.langa, pablogsal, pitrou, vstinner
Priority: high Keywords: buildbot, patch

Created on 2019-06-02 22:54 by pablogsal, last changed 2019-06-04 12:46 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
stress.py vstinner, 2019-06-04 00:41
sleep_at_exit.patch vstinner, 2019-06-04 00:42
Messages (18)
msg344332 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-06-02 22:54
test_import (test.test_multiprocessing_spawn._TestImportStar) ... ok
----------------------------------------------------------------------
Ran 352 tests in 322.667s
OK (skipped=34)
Warning -- files was modified by test_multiprocessing_spawn
  Before: []
  After:  ['python.core']
0:11:39 load avg: 4.96 [202/423/1] test_winconsoleio skipped
test_winconsoleio skipped -- test only relevant on win32

https://buildbot.python.org/all/#/builders/168/builds/1124/steps/5/logs/stdio
msg344354 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-03 01:23
New coredump:
https://buildbot.python.org/all/#builders/168/builds/1129
msg344356 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-03 01:25
Maybe it's related to bpo-33608.
msg344385 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2019-06-03 06:52
Would be nice if someone could post a gdb backtrace of the core dump.
msg344410 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-06-03 10:43
I am unable to reproduce this locally, will post here a backtrace if I manage to do so.
msg344425 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-03 14:59
https://bugs.python.org/issue33608#msg340075:

> My procedure to reproduce the crash on FreeBSD:
> https://bugs.python.org/issue36114#msg337092

I ran this test for 20 min on the FreeBSD CURRENT buildbot: I failed to reproduce the bug.
msg344428 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-06-03 15:26
Adding the release manager to consider this
msg344449 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2019-06-03 17:04
FTR Victor reverted GH-13714 that triggers this.
msg344453 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-06-03 17:26
> FTR Victor reverted GH-13714 that triggers this.

Given the nature of the bugs, I would recommend to watch the buildbots
msg344495 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-03 23:46
> FTR Victor reverted GH-13714 that triggers this.

Sadly, this revert wasn't enough. New fresh coredump on "AMD64 FreeBSD CURRENT Shared 3.x":
https://buildbot.python.org/all/#builders/168/builds/1145

Warning -- Dangling processes: {<SpawnProcess
name='SharedMemoryManager-313' pid=12404 parent=11647 started>}
Warning -- Dangling processes: {<SpawnProcess
name='SharedMemoryManager-313' pid=12404 parent=11647 started>}
Warning -- multiprocessing.process._dangling was modified by
test_multiprocessing_spawn
  Before: set()
  After:  {<weakref at 0x80273a890; to 'SpawnProcess' at 0x8029f6b40>}
Warning -- files was modified by test_multiprocessing_spawn
  Before: []
  After:  ['python.core']
msg344499 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-06-03 23:57
CURRENT-amd64% lldb ./python -c python.core
(lldb) target create "./python" --core "python.core"
Core file '/home/pablo/cpython/python.core' (x86_64) was loaded.
(lldb) bt
* thread #1, name = 'python', stop reason = signal SIGBUS
  * frame #0: 0x000000000035cfad python`take_gil(tstate=0x0000000802b34010) at ceval_gil.h:216:13
    frame #1: 0x000000000035d499 python`PyEval_RestoreThread(tstate=0x0000000802b34010) at ceval.c:281:5
    frame #2: 0x000000000040c045 python
    frame #3: 0x000000000047cc51 python
    frame #4: 0x000000000034f59c python`_PyMethodDef_RawFastCallKeywords(method=0x0000000802b34010, self=0x00000008011c1578, args=0x0000000000000092, nargs=2, kwnames=<unavailable>) at call.c:653:18
    frame #5: 0x000000000034e549 python`_PyCFunction_FastCallKeywords(func=0x00000008011c8280, args=<unavailable>, nargs=<unavailable>, kwnames=<unavailable>) at call.c:732:14
    frame #6: 0x000000000036d850 python`call_function(pp_stack=0x00007fffdfbfbcc8, oparg=<unavailable>, kwnames=<unavailable>) at ceval.c:4673:9
    frame #7: 0x0000000000368d09 python`_PyEval_EvalFrameDefault(f=<unavailable>, throwflag=<unavailable>) at ceval.c:3294:19
    frame #8: 0x000000000036e59c python`_PyEval_EvalCodeWithName [inlined] PyEval_EvalFrameEx(f=<unavailable>, throwflag=0) at ceval.c:624:12
    frame #9: 0x000000000036e586 python`_PyEval_EvalCodeWithName(_co=<unavailable>, globals=<unavailable>, locals=<unavailable>, args=<unavailable>, argcount=2, kwnames=0x0000000000000000, kwargs=0x000000080266e5f8, kwcount=0, kwstep=1, defs=0x0000000802533558, defcount=1, kwdefs=0x0000000000000000, closure=0x0000000000000000, name=0x00000008017355f0, qualname=0x0000000802524d60) at ceval.c:4035
    frame #10: 0x000000000034e438 python`_PyFunction_FastCallKeywords(func=<unavailable>, stack=<unavailable>, nargs=<unavailable>, kwnames=<unavailable>) at call.c:435:12
    frame #11: 0x000000000036d9c7 python`call_function(pp_stack=0x00007fffdfbfc020, oparg=<unavailable>, kwnames=<unavailable>) at ceval.c:4721:17
    frame #12: 0x0000000000368bae python`_PyEval_EvalFrameDefault(f=<unavailable>, throwflag=<unavailable>) at ceval.c:3280:23
    frame #13: 0x000000000034ebaa python`function_code_fastcall(co=<unavailable>, args=<unavailable>, nargs=2, globals=<unavailable>) at call.c:285:14
    frame #14: 0x000000000036d9c7 python`call_function(pp_stack=0x00007fffdfbfc230, oparg=<unavailable>, kwnames=<unavailable>) at ceval.c:4721:17
    frame #15: 0x0000000000368bae python`_PyEval_EvalFrameDefault(f=<unavailable>, throwflag=<unavailable>) at ceval.c:3280:23
    frame #16: 0x000000000036e59c python`_PyEval_EvalCodeWithName [inlined] PyEval_EvalFrameEx(f=<unavailable>, throwflag=0) at ceval.c:624:12
    frame #17: 0x000000000036e586 python`_PyEval_EvalCodeWithName(_co=<unavailable>, globals=<unavailable>, locals=<unavailable>, args=<unavailable>, argcount=2, kwnames=0x0000000000000000, kwargs=0x000000080271eb58, kwcount=0, kwstep=1, defs=0x000000080251dbd8, defcount=2, kwdefs=0x0000000000000000, closure=0x0000000000000000, name=0x000000080187fd60, qualname=0x0000000802522340) at ceval.c:4035
    frame #18: 0x000000000034e438 python`_PyFunction_FastCallKeywords(func=<unavailable>, stack=<unavailable>, nargs=<unavailable>, kwnames=<unavailable>) at call.c:435:12
    frame #19: 0x000000000036d9c7 python`call_function(pp_stack=0x00007fffdfbfc520, oparg=<unavailable>, kwnames=<unavailable>) at ceval.c:4721:17
    frame #20: 0x0000000000368bae python`_PyEval_EvalFrameDefault(f=<unavailable>, throwflag=<unavailable>) at ceval.c:3280:23
    frame #21: 0x000000000034ebaa python`function_code_fastcall(co=<unavailable>, args=<unavailable>, nargs=2, globals=<unavailable>) at call.c:285:14
    frame #22: 0x000000000036d9c7 python`call_function(pp_stack=0x00007fffdfbfc730, oparg=<unavailable>, kwnames=<unavailable>) at ceval.c:4721:17
    frame #23: 0x0000000000368bae python`_PyEval_EvalFrameDefault(f=<unavailable>, throwflag=<unavailable>) at ceval.c:3280:23
    frame #24: 0x000000000034ebaa python`function_code_fastcall(co=<unavailable>, args=<unavailable>, nargs=2, globals=<unavailable>) at call.c:285:14
    frame #25: 0x000000000034f934 python`_PyObject_Call_Prepend(callable=0x00000008025b4100, obj=0x0000000801878b30, args=<unavailable>, kwargs=0x000000080267e980) at call.c:906:14
    frame #26: 0x000000000034e75f python`PyObject_Call(callable=0x00000008021008d8, args=0x00000008017a7cb0, kwargs=0x000000080267e980) at call.c:247:18
    frame #27: 0x00000000003690a9 python`_PyEval_EvalFrameDefault [inlined] do_call_core(func=<unavailable>, callargs=<unavailable>, kwdict=0x000000080267e980) at ceval.c:4775:12
    frame #28: 0x000000000036907e python`_PyEval_EvalFrameDefault(f=<unavailable>, throwflag=<unavailable>) at ceval.c:3353
    frame #29: 0x000000000034ebaa python`function_code_fastcall(co=<unavailable>, args=<unavailable>, nargs=1, globals=<unavailable>) at call.c:285:14
    frame #30: 0x000000000036d9c7 python`call_function(pp_stack=0x00007fffdfbfcba0, oparg=<unavailable>, kwnames=<unavailable>) at ceval.c:4721:17
    frame #31: 0x0000000000368bae python`_PyEval_EvalFrameDefault(f=<unavailable>, throwflag=<unavailable>) at ceval.c:3280:23
    frame #32: 0x000000000034ebaa python`function_code_fastcall(co=<unavailable>, args=<unavailable>, nargs=1, globals=<unavailable>) at call.c:285:14
    frame #33: 0x000000000036d9c7 python`call_function(pp_stack=0x00007fffdfbfcdb0, oparg=<unavailable>, kwnames=<unavailable>) at ceval.c:4721:17
    frame #34: 0x0000000000368bae python`_PyEval_EvalFrameDefault(f=<unavailable>, throwflag=<unavailable>) at ceval.c:3280:23
    frame #35: 0x000000000034ebaa python`function_code_fastcall(co=<unavailable>, args=<unavailable>, nargs=1, globals=<unavailable>) at call.c:285:14
    frame #36: 0x000000000034f934 python`_PyObject_Call_Prepend(callable=0x000000080152ff70, obj=0x0000000802681a10, args=<unavailable>, kwargs=0x0000000000000000) at call.c:906:14
    frame #37: 0x000000000034e75f python`PyObject_Call(callable=0x0000000802aa4a10, args=0x000000080106c050, kwargs=0x0000000000000000) at call.c:247:18
    frame #38: 0x00000000004b35e8 python
    frame #39: 0x00000000004049b9 python
    frame #40: 0x000000080059774b libthr.so.3`thread_start(curthread=0x0000000802b34010) at thr_create.c:291:16
(lldb) debug2: client_check_window_change: changed
debug2: channel 0: request window-change confirm 0
msg344502 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-04 00:27
I looked at the coredump with Pablo. In short, the main thread is calling Py_Exit() to exit the process and so released memory, and a daemon thread does crash on calling PyEval_RestoreThread() because tstate memory was freed.

The question is now if this bug is a regression compared to Python 3.7 or not. I'm trying to reproduce it on Linux by adding "sleep(1)" before exit, but my attempts are unsuccessful so far.
msg344505 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-04 00:30
I marked bpo-37143 "multiprocessing crashed with EXCEPTION_ACCESS_VIOLATION on Python on x86 Windows7 3.x" as a duplicate of this issue. Even if bpo-37143 looks very different (crash in a different file, on a different operating system), I'm now quite sure that it has the same root cause.
msg344508 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-06-04 00:40
Note that the core dump that we are talking about is something that we produced afterwards when trying to reproduce the issue. The core that is produced as part of the tests could be different. I am trying to get access to the test files.
msg344509 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-04 00:42
If you apply attached sleep_at_exit.patch and run attach stress.py, you should quickly get a crash:

$ git apply sleep_at_exit.patch
$ make && ./python stress.py 
Segmentation fault (core dumped)

That's a simplified example of the multiprocessing crash.
msg344510 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-04 00:56
I used git bisect and I found:

commit 396e0a8d9dc65453cb9d53500d0a620602656cfe (refs/bisect/bad)
Author: Eric Snow <ericsnowcurrently@gmail.com>
Date:   Fri May 31 21:16:47 2019 -0600

    bpo-36818: Add PyInterpreterState.runtime field. (gh-13129)
    
stress.py starts to crash at this change.
msg344511 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-04 01:01
> commit 396e0a8d9dc65453cb9d53500d0a620602656cfe (refs/bisect/bad)

Extract of this change:
 
 static inline void
-exit_thread_if_finalizing(_PyRuntimeState *runtime, PyThreadState *tstate)
+exit_thread_if_finalizing(PyThreadState *tstate)
 {
+    _PyRuntimeState *runtime = tstate->interp->runtime;
     /* _Py_Finalizing is protected by the GIL */
     if (runtime->finalizing != NULL && !_Py_CURRENTLY_FINALIZING(runtime, tstate)) {
         drop_gil(&runtime->ceval, tstate);
@@ -236,7 +237,7 @@ PyEval_AcquireLock(void)
         Py_FatalError("PyEval_AcquireLock: current thread state is NULL");
     }
     take_gil(ceval, tstate);
-    exit_thread_if_finalizing(runtime, tstate);
+    exit_thread_if_finalizing(tstate);
 }
 

This change is fine for regular Python threads. But for daemon threads, tstate is likely already corrupted, so it's no longer possible to get interp from interp, and so also not possible to get 'runtime'.
msg344559 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-04 12:46
The initial issue has been fixed by a revert.

Let's continue the discussion on bpo-36818 to maybe reapply commit 396e0a8d9dc65453cb9d53500d0a620602656cfe.
History
Date User Action Args
2019-06-04 12:46:17vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg344559

stage: resolved
2019-06-04 01:01:52vstinnersetmessages: + msg344511
2019-06-04 00:56:00vstinnersetmessages: + msg344510
2019-06-04 00:42:44vstinnersetfiles: + sleep_at_exit.patch
keywords: + patch
messages: + msg344509
2019-06-04 00:41:14vstinnersetfiles: + stress.py
2019-06-04 00:40:08pablogsalsetmessages: + msg344508
2019-06-04 00:30:53vstinnersetmessages: + msg344505
2019-06-04 00:27:15vstinnersetmessages: + msg344502
2019-06-03 23:57:24pablogsalsetmessages: + msg344499
2019-06-03 23:46:21vstinnersetmessages: + msg344495
2019-06-03 17:26:54pablogsalsetmessages: + msg344453
2019-06-03 17:04:35lukasz.langasetmessages: + msg344449
2019-06-03 15:26:02pablogsalsetnosy: + lukasz.langa
messages: + msg344428
2019-06-03 14:59:27vstinnersetmessages: + msg344425
2019-06-03 10:43:28pablogsalsetmessages: + msg344410
2019-06-03 06:52:40pitrousetmessages: + msg344385
2019-06-03 01:25:18vstinnersetmessages: + msg344356
2019-06-03 01:23:03vstinnersetnosy: + vstinner
messages: + msg344354
2019-06-02 23:00:23pablogsalsetnosy: + pitrou
2019-06-02 23:00:08pablogsalsetnosy: + koobs
2019-06-02 22:54:17pablogsalcreate