classification
Title: test_threading: ThreadJoinOnShutdown.test_reinit_tls_after_fork() crash with Python 3.8 on AIX
Type: Stage: resolved
Components: Tests Versions: Python 3.9, Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: vstinner
Priority: normal Keywords: patch

Created on 2020-03-25 22:34 by vstinner, last changed 2020-03-31 20:57 by vstinner.

Pull Requests
URL Status Linked Edit
PR 19200 closed vstinner, 2020-03-27 19:24
Messages (7)
msg365025 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-25 22:34
https://buildbot.python.org/all/#/builders/73/builds/208
---
Warning -- files was modified by test_threading
  Before: []
  After:  ['core']

1 test altered the execution environment:
    test_threading
---

The bug can be reproduced with:

$ ./configure --with-pydebug CC=gcc CFLAGS=-O0
$ make
(...
$ ./python -m test test_threading --fail-env-changed -m test.test_threading.ThreadJoinOnShutdown.test_reinit_tls_after_fork  -F -j 20
0:00:00 Run tests in parallel using 20 child processes
0:00:01 [  1] test_threading passed
0:00:01 [  2] test_threading passed
(...)
0:00:03 [ 17] test_threading passed
0:00:03 [ 18/1] test_threading failed (env changed)
Warning -- files was modified by test_threading
  Before: []
  After:  ['core']
Kill <TestWorkerProcess #1 running test=test_threading pid=14156142 time=1.6 sec>
(...)
Tests result: ENV CHANGED
msg365027 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-25 22:54
The master branch is also affected.
msg365028 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-25 23:02
Example of crash in master in a child process after os.fork().

The crash occurred in logging._releaseLock() called by logging._after_at_fork_child_reinit_locks().

Old issues about logging + fork

* bpo-36533 (closed): logging regression with threading + fork are mixed in 3.7.1rc2 (deadlock potential)
* bpo-37429 (open): Python hangs on fork when a logger is in use in a background thread
* bpo-6721 (open): Locks in the standard library should be sanitized on fork

--

(gdb) where
#0  0xd0563e00 in _internal_error () from /usr/lib/libpthreads.a(shr_xpg5.o)
#1  0xd0573270 in _event_notify_locked@AF34_24 () from /usr/lib/libpthreads.a(shr_xpg5.o)
#2  0xd057247c in _event_notify () from /usr/lib/libpthreads.a(shr_xpg5.o)
#3  0xd0582b2c in _cond_broadcast () from /usr/lib/libpthreads.a(shr_xpg5.o)
#4  0xd0584b0c in pthread_cond_signal@AF29_12 () from /usr/lib/libpthreads.a(shr_xpg5.o)
#5  0xd058320c in pthread_cond_signal () from /usr/lib/libpthreads.a(shr_xpg5.o)
#6  0x100a7568 in PyThread_release_lock (lock=0x200848f0) at Python/thread_pthread.h:687
#7  0x10249d38 in rlock_release (self=0x30295758, _unused_ignored=0xffffffff) at ./Modules/_threadmodule.c:357
#8  0x102109dc in method_vectorcall_NOARGS (func=0x0, args=0x302cdd80, nargsf=4294967295, kwnames=0x0) at Objects/descrobject.c:20
#9  0x10109d84 in call_function (tstate=0x200d5ca0, pp_stack=0x20e82aec, oparg=24, kwnames=0x0) at ./Include/cpython/abstract.h:118
#10 0x101164f4 in _PyEval_EvalFrameDefault (tstate=0x200d5ca0, f=0x302cdc48, throwflag=537746592) at Python/ceval.c:2586
#11 0x1011bd10 in function_code_fastcall (tstate=0x3027c972, co=0x3027d820, args=0x3052cef0, nargs=808246660, globals=0x302cdc48) at ./Include/internal/pycore_ceval.h:40
#12 0x1011c9a0 in _PyFunction_Vectorcall (func=0x302bd030, stack=0x3027c968, nargsf=0, kwnames=0x0) at Objects/call.c:366
#13 0x10109d84 in call_function (tstate=0x200d5ca0, pp_stack=0x20e82d08, oparg=56, kwnames=0x0) at ./Include/cpython/abstract.h:118
#14 0x1011662c in _PyEval_EvalFrameDefault (tstate=0x200d5ca0, f=0x3052cdb0, throwflag=537746592) at Python/ceval.c:2600
#15 0x1011bd10 in function_code_fastcall (tstate=0x3027d9ec, co=0x3027da00, args=0x0, nargs=810733296, globals=0x3052cdb0) at ./Include/internal/pycore_ceval.h:40
#16 0x1011c9a0 in _PyFunction_Vectorcall (func=0x302bd120, stack=0x3027d998, nargsf=0, kwnames=0x0) at Objects/call.c:366
#17 0x1015c848 in run_at_forkers (lst=0x302bd120, reverse=-1) at ./Include/cpython/abstract.h:118
#18 0x101682e8 in PyOS_AfterFork_Child () at ./Modules/posixmodule.c:479
#19 0x10168384 in os_fork (module=0x0, _unused_ignored=0xffffffff) at ./Modules/posixmodule.c:6264
#20 0x101e7644 in cfunction_vectorcall_NOARGS (func=0x38, args=0xffffffff, nargsf=806606368, kwnames=0x1) at Objects/methodobject.c:20
#21 0x10109d84 in call_function (tstate=0x200d5ca0, pp_stack=0x20e8304c, oparg=24, kwnames=0x0) at ./Include/cpython/abstract.h:118
#22 0x101165a8 in _PyEval_EvalFrameDefault (tstate=0x200d5ca0, f=0x306b39d0, throwflag=537746592) at Python/ceval.c:2569
#23 0x10119f80 in _PyEval_EvalCode (tstate=0x306b3b04, _co=0x306b65c8, globals=0x306a47b8, locals=0x306b3b10, args=0x306b3b08, argcount=-1, kwnames=0x30280030, kwargs=0xffffffff, kwcount=805326908, kwstep=0, defs=0x0, defcount=0, 
    kwdefs=0x0, closure=0x0, name=0x20e831f0, qualname=0x28000282) at ./Include/internal/pycore_ceval.h:40
#24 0x1011cb84 in _PyFunction_Vectorcall (func=0x0, stack=0x306a47b8, nargsf=0, kwnames=0x8) at Objects/call.c:395
#25 0x1011c270 in PyVectorcall_Call (callable=0x30625440, tuple=0x30005030, kwargs=0x306b5180) at Objects/call.c:230
#26 0x1011c510 in _PyObject_Call (tstate=0x30005030, callable=0x2003a67c <_ccCGOhKO.rw_+21820>, args=0x306b5180, kwargs=0x30625440) at Objects/call.c:265
#27 0x1011c71c in PyObject_Call (callable=0x0, args=0xffffffff, kwargs=0xffffffff) at Objects/call.c:292
#28 0x10116ad4 in _PyEval_EvalFrameDefault (tstate=0x200d5ca0, f=0x306d5760, throwflag=0) at Python/ceval.c:4198
#29 0x1011bd10 in function_code_fastcall (tstate=0x3029a7d2, co=0x302c07a8, args=0x3066edac, nargs=812472476, globals=0x306d5760) at ./Include/internal/pycore_ceval.h:40
#30 0x1011c9a0 in _PyFunction_Vectorcall (func=0x302c93a0, stack=0x3029a7b8, nargsf=1, kwnames=0x0) at Objects/call.c:366
#31 0x10109d84 in call_function (tstate=0x200d5ca0, pp_stack=0x20e8357c, oparg=56, kwnames=0x0) at ./Include/cpython/abstract.h:118
#32 0x101164f4 in _PyEval_EvalFrameDefault (tstate=0x200d5ca0, f=0x3066ec70, throwflag=537746592) at Python/ceval.c:2586
#33 0x1011bd10 in function_code_fastcall (tstate=0x302ac200, co=0x302c0a00, args=0x306cb16c, nargs=812051888, globals=0x3066ec70) at ./Include/internal/pycore_ceval.h:40
#34 0x1011c9a0 in _PyFunction_Vectorcall (func=0x302c9530, stack=0x302ac180, nargsf=1, kwnames=0x0) at Objects/call.c:366
#35 0x10109d84 in call_function (tstate=0x200d5ca0, pp_stack=0x20e8379c, oparg=56, kwnames=0x0) at ./Include/cpython/abstract.h:118
#36 0x101164f4 in _PyEval_EvalFrameDefault (tstate=0x200d5ca0, f=0x306cb030, throwflag=537746592) at Python/ceval.c:2586
#37 0x1011bd10 in function_code_fastcall (tstate=0x302a4930, co=0x302c0820, args=0x20e83928, nargs=812429680, globals=0x306cb030) at ./Include/internal/pycore_ceval.h:40
#38 0x1011c9a0 in _PyFunction_Vectorcall (func=0x302c93f0, stack=0x302a4928, nargsf=1, kwnames=0x0) at Objects/call.c:366
#39 0x10220394 in method_vectorcall (method=0x302c93f0, args=0x3000503c, nargsf=56, kwnames=0x0) at ./Include/cpython/abstract.h:118
#40 0x1011c270 in PyVectorcall_Call (callable=0x301e3ed0, tuple=0x30005030, kwargs=0x0) at Objects/call.c:230
#41 0x1011c510 in _PyObject_Call (tstate=0x30005030, callable=0x2003878c <_ccCGOhKO.rw_+13900>, args=0x0, kwargs=0x301e3ed0) at Objects/call.c:265
#42 0x1011c71c in PyObject_Call (callable=0x0, args=0xffffffff, kwargs=0xffffffff) at Objects/call.c:292
#43 0x1024a158 in t_bootstrap (boot_raw=0x301df820) at ./Modules/_threadmodule.c:1011
#44 0x100a683c in pythread_wrapper (arg=0xffffffff) at Python/thread_pthread.h:235
#45 0xd055df68 in _pthread_body () from /usr/lib/libpthreads.a(shr_xpg5.o)
#46 0x00000000 in ?? ()

(gdb) frame 10
#10 0x101164f4 in _PyEval_EvalFrameDefault (tstate=0x200d5ca0, f=0x302cdc48, throwflag=537746592) at Python/ceval.c:2586
2586                PyObject *v = GETLOCAL(oparg);
(gdb) p (char*)( ((PyASCIIObject*)f->f_code->co_name) + 1)
$16 = 0x3027ad28 "_releaseLock"
(gdb) p (char*)( ((PyASCIIObject*)f->f_code->co_filename) + 1)
$17 = 0x3026d4c0 "/home/vstinner/python/master/Lib/logging/__init__.py"

(gdb) frame 14
#14 0x1011662c in _PyEval_EvalFrameDefault (tstate=0x200d5ca0, f=0x3052cdb0, throwflag=537746592) at Python/ceval.c:2600
2600                PyObject *cell = freevars[oparg];
(gdb)  p (char*)( ((PyASCIIObject*)f->f_code->co_name) + 1)
$18 = 0x30278af0 "_after_at_fork_child_reinit_locks"
(gdb) p (char*)( ((PyASCIIObject*)f->f_code->co_filename) + 1)
$19 = 0x3026d4c0 "/home/vstinner/python/master/Lib/logging/__init__.py"


(gdb) info threads
  Id   Target Id         Frame 
  2    Thread 1745 (tid 70975785, running) aix-thread: ptrace (52, 70975785) returned -1 (errno = 3 The process does not exist.)
msg365031 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-25 23:23
I worked around the logging crash. Then I got a second crash in _PyThreadState_DeleteExcept() called in the child process after a fork.

PyThreadState_Clear() calls tstate->on_delete() which is release_sentinel() of Modules/_threadmodule.c.

Problem: lock objects are left in an inconsistent state after a fork: bpo-6721. This lock object should be replaced with a new lock object, or release_sentinel() should not be called.

(gdb) where
#0  0xd0563e00 in _internal_error () from /usr/lib/libpthreads.a(shr_xpg5.o)
#1  0xd0573270 in _event_notify_locked@AF34_24 () from /usr/lib/libpthreads.a(shr_xpg5.o)
#2  0xd057247c in _event_notify () from /usr/lib/libpthreads.a(shr_xpg5.o)
#3  0xd0582b2c in _cond_broadcast () from /usr/lib/libpthreads.a(shr_xpg5.o)
#4  0xd0584b0c in pthread_cond_signal@AF29_12 () from /usr/lib/libpthreads.a(shr_xpg5.o)
#5  0xd058320c in pthread_cond_signal () from /usr/lib/libpthreads.a(shr_xpg5.o)
#6  0x100a755c in PyThread_release_lock (lock=0x200bd4d0) at Python/thread_pthread.h:687
#7  0x1024add0 in release_sentinel (wr_raw=0x306c0bd0) at ./Modules/_threadmodule.c:1213
#8  0x100a406c in PyThreadState_Clear (tstate=0x200c0970) at Python/pystate.c:810
#9  0x100a4b90 in _PyThreadState_DeleteExcept (runtime=0x20000200 <_PyRuntime>, tstate=0x20068870) at Python/pystate.c:924
#10 0x1010ad8c in _PyEval_ReInitThreads (runtime=0x20000200 <_PyRuntime>) at Python/ceval.c:434
#11 0x101682a8 in PyOS_AfterFork_Child () at ./Modules/posixmodule.c:473
#12 0x10168384 in os_fork (module=0x0, _unused_ignored=0xffffffff) at ./Modules/posixmodule.c:6264
#13 0x101e7644 in cfunction_vectorcall_NOARGS (func=0x3008b4b0, args=0xffffffff, nargsf=0, kwnames=0x20067800) at Objects/methodobject.c:20
#14 0x10109d84 in call_function (tstate=0x200c9b00, pp_stack=0x2046f04c, oparg=24, kwnames=0x0) at ./Include/cpython/abstract.h:118
#15 0x101165a8 in _PyEval_EvalFrameDefault (tstate=0x200c9b00, f=0x306ce2f0, throwflag=537697024) at Python/ceval.c:2569
#16 0x10119f80 in _PyEval_EvalCode (tstate=0x306ce424, _co=0x3066ca00, globals=0x3066d0d8, locals=0x306ce430, args=0x306ce428, argcount=-1, kwnames=0x30670030, kwargs=0xffffffff, kwcount=805326908, kwstep=0, defs=0x0, defcount=0, 
    kwdefs=0x0, closure=0x0, name=0x2046f1f0, qualname=0x28000282) at ./Include/internal/pycore_ceval.h:40
#17 0x1011cb84 in _PyFunction_Vectorcall (func=0x0, stack=0x3066d0d8, nargsf=0, kwnames=0x8) at Objects/call.c:395
#18 0x1011c270 in PyVectorcall_Call (callable=0x3065f710, tuple=0x30005030, kwargs=0x306b6068) at Objects/call.c:230
#19 0x1011c510 in _PyObject_Call (tstate=0x30005030, callable=0x2003a64c <_cc37zUOE.rw_+21820>, args=0x306b6068, kwargs=0x3065f710) at Objects/call.c:265
#20 0x1011c71c in PyObject_Call (callable=0x0, args=0xffffffff, kwargs=0xffffffff) at Objects/call.c:292
#21 0x10116ad4 in _PyEval_EvalFrameDefault (tstate=0x200c9b00, f=0x306cba40, throwflag=0) at Python/ceval.c:4198
#22 0x1011bd10 in function_code_fastcall (tstate=0x3029d6b2, co=0x302bc0a0, args=0x30681914, nargs=812432252, globals=0x306cba40) at ./Include/internal/pycore_ceval.h:40
#23 0x1011c9a0 in _PyFunction_Vectorcall (func=0x302c7210, stack=0x3029d698, nargsf=1, kwnames=0x0) at Objects/call.c:366
#24 0x10109d84 in call_function (tstate=0x200c9b00, pp_stack=0x2046f57c, oparg=56, kwnames=0x0) at ./Include/cpython/abstract.h:118
#25 0x101164f4 in _PyEval_EvalFrameDefault (tstate=0x200c9b00, f=0x306817d8, throwflag=537697024) at Python/ceval.c:2586
#26 0x1011bd10 in function_code_fastcall (tstate=0x302aa200, co=0x302bc2f8, args=0x306cc5a4, nargs=812128536, globals=0x306817d8) at ./Include/internal/pycore_ceval.h:40
#27 0x1011c9a0 in _PyFunction_Vectorcall (func=0x302c73a0, stack=0x302aa180, nargsf=1, kwnames=0x0) at Objects/call.c:366
#28 0x10109d84 in call_function (tstate=0x200c9b00, pp_stack=0x2046f79c, oparg=56, kwnames=0x0) at ./Include/cpython/abstract.h:118
#29 0x101164f4 in _PyEval_EvalFrameDefault (tstate=0x200c9b00, f=0x306cc468, throwflag=537697024) at Python/ceval.c:2586
#30 0x1011bd10 in function_code_fastcall (tstate=0x302a4828, co=0x302bc118, args=0x2046f928, nargs=812434856, globals=0x306cc468) at ./Include/internal/pycore_ceval.h:40
#31 0x1011c9a0 in _PyFunction_Vectorcall (func=0x302c7260, stack=0x302a4820, nargsf=1, kwnames=0x0) at Objects/call.c:366
#32 0x10220394 in method_vectorcall (method=0x302c7260, args=0x3000503c, nargsf=56, kwnames=0x0) at ./Include/cpython/abstract.h:118
#33 0x1011c270 in PyVectorcall_Call (callable=0x306c0ea0, tuple=0x30005030, kwargs=0x0) at Objects/call.c:230
#34 0x1011c510 in _PyObject_Call (tstate=0x30005030, callable=0x2003875c <_cc37zUOE.rw_+13900>, args=0x0, kwargs=0x306c0ea0) at Objects/call.c:265
#35 0x1011c71c in PyObject_Call (callable=0x0, args=0xffffffff, kwargs=0xffffffff) at Objects/call.c:292
#36 0x1024a158 in t_bootstrap (boot_raw=0x306bf7a8) at ./Modules/_threadmodule.c:1011
#37 0x100a6830 in pythread_wrapper (arg=0xffffffff) at Python/thread_pthread.h:235
#38 0xd055df68 in _pthread_body () from /usr/lib/libpthreads.a(shr_xpg5.o)
#39 0x00000000 in ?? ()
msg365174 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-27 17:17
I created 3 follow-up issues:

* bpo-40089: Add _at_fork_reinit() method to locks
* bpo-40091: Crash in logging._after_at_fork_child_reinit_locks()
* bpo-40092: Crash in _PyThreadState_DeleteExcept() at fork in the process child
msg365414 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-31 19:52
Fixed by https://github.com/python/cpython/commit/a9f9687a7ce25e7c0c89f88f52db323104668ae0
msg365424 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-03-31 20:57
> Fixed by https://github.com/python/cpython/commit/a9f9687a7ce25e7c0c89f88f52db323104668ae0

Ooops, sorry. Only PR 19200 is fixed by this commit.

But this issue is made of 3 bugs. I prefer to leave it open until the 3 bugs are fixed.
History
Date User Action Args
2020-03-31 20:57:55vstinnersetstatus: closed -> open
resolution: fixed ->
messages: + msg365424
2020-03-31 19:52:48vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg365414

stage: patch review -> resolved
2020-03-27 19:24:52vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request18562
2020-03-27 17:17:50vstinnersetmessages: + msg365174
2020-03-25 23:23:17vstinnersetmessages: + msg365031
2020-03-25 23:02:19vstinnersetmessages: + msg365028
2020-03-25 22:54:48vstinnersetmessages: + msg365027
versions: + Python 3.9
2020-03-25 22:34:11vstinnercreate