Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_threading: ThreadJoinOnShutdown.test_reinit_tls_after_fork() crash with Python 3.8 on AIX #84249

Closed
vstinner opened this issue Mar 25, 2020 · 8 comments
Labels
3.8 only security fixes 3.9 only security fixes tests Tests in the Lib/test dir

Comments

@vstinner
Copy link
Member

BPO 40068
Nosy @vstinner, @erlend-aasland
PRs
  • [WIP] bpo-40068: test_reinit_tls_after_fork() detects crashes #19200
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2020-03-25.22:34:11.214>
    labels = ['3.8', 'tests', '3.9']
    title = 'test_threading: ThreadJoinOnShutdown.test_reinit_tls_after_fork() crash with Python 3.8 on AIX'
    updated_at = <Date 2021-05-23.07:47:55.604>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2021-05-23.07:47:55.604>
    actor = 'erlendaasland'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Tests']
    creation = <Date 2020-03-25.22:34:11.214>
    creator = 'vstinner'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 40068
    keywords = ['patch']
    message_count = 7.0
    messages = ['365025', '365027', '365028', '365031', '365174', '365414', '365424']
    nosy_count = 3.0
    nosy_names = ['vstinner', 'David.Edelsohn', 'erlendaasland']
    pr_nums = ['19200']
    priority = 'normal'
    resolution = None
    stage = 'resolved'
    status = 'open'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue40068'
    versions = ['Python 3.8', 'Python 3.9']

    @vstinner
    Copy link
    Member Author

    https://buildbot.python.org/all/#/builders/73/builds/208
    ---
    Warning -- files was modified by test_threading
    Before: []
    After: ['core']

    1 test altered the execution environment:
    test_threading
    ---

    The bug can be reproduced with:

    $ ./configure --with-pydebug CC=gcc CFLAGS=-O0
    $ make
    (...
    $ ./python -m test test_threading --fail-env-changed -m test.test_threading.ThreadJoinOnShutdown.test_reinit_tls_after_fork  -F -j 20
    0:00:00 Run tests in parallel using 20 child processes
    0:00:01 [  1] test_threading passed
    0:00:01 [  2] test_threading passed
    (...)
    0:00:03 [ 17] test_threading passed
    0:00:03 [ 18/1] test_threading failed (env changed)
    Warning -- files was modified by test_threading
      Before: []
      After:  ['core']
    Kill <TestWorkerProcess #1 running test=test_threading pid=14156142 time=1.6 sec>
    (...)
    Tests result: ENV CHANGED

    @vstinner vstinner added 3.8 only security fixes tests Tests in the Lib/test dir labels Mar 25, 2020
    @vstinner
    Copy link
    Member Author

    The master branch is also affected.

    @vstinner vstinner added the 3.9 only security fixes label Mar 25, 2020
    @vstinner
    Copy link
    Member Author

    Example of crash in master in a child process after os.fork().

    The crash occurred in logging._releaseLock() called by logging._after_at_fork_child_reinit_locks().

    Old issues about logging + fork

    • bpo-36533 (closed): logging regression with threading + fork are mixed in 3.7.1rc2 (deadlock potential)
    • bpo-37429 (open): Python hangs on fork when a logger is in use in a background thread
    • bpo-6721 (open): Locks in the standard library should be sanitized on fork

    --

    (gdb) where
    #0 0xd0563e00 in _internal_error () from /usr/lib/libpthreads.a(shr_xpg5.o)
    #1 0xd0573270 in _event_notify_locked@AF34_24 () from /usr/lib/libpthreads.a(shr_xpg5.o)
    #2 0xd057247c in _event_notify () from /usr/lib/libpthreads.a(shr_xpg5.o)
    #3 0xd0582b2c in _cond_broadcast () from /usr/lib/libpthreads.a(shr_xpg5.o)
    #4 0xd0584b0c in pthread_cond_signal@AF29_12 () from /usr/lib/libpthreads.a(shr_xpg5.o)
    #5 0xd058320c in pthread_cond_signal () from /usr/lib/libpthreads.a(shr_xpg5.o)
    #6 0x100a7568 in PyThread_release_lock (lock=0x200848f0) at Python/thread_pthread.h:687
    #7 0x10249d38 in rlock_release (self=0x30295758, _unused_ignored=0xffffffff) at ./Modules/_threadmodule.c:357
    #8 0x102109dc in method_vectorcall_NOARGS (func=0x0, args=0x302cdd80, nargsf=4294967295, kwnames=0x0) at Objects/descrobject.c:20
    #9 0x10109d84 in call_function (tstate=0x200d5ca0, pp_stack=0x20e82aec, oparg=24, kwnames=0x0) at ./Include/cpython/abstract.h:118
    #10 0x101164f4 in _PyEval_EvalFrameDefault (tstate=0x200d5ca0, f=0x302cdc48, throwflag=537746592) at Python/ceval.c:2586
    #11 0x1011bd10 in function_code_fastcall (tstate=0x3027c972, co=0x3027d820, args=0x3052cef0, nargs=808246660, globals=0x302cdc48) at ./Include/internal/pycore_ceval.h:40
    #12 0x1011c9a0 in _PyFunction_Vectorcall (func=0x302bd030, stack=0x3027c968, nargsf=0, kwnames=0x0) at Objects/call.c:366
    #13 0x10109d84 in call_function (tstate=0x200d5ca0, pp_stack=0x20e82d08, oparg=56, kwnames=0x0) at ./Include/cpython/abstract.h:118
    #14 0x1011662c in _PyEval_EvalFrameDefault (tstate=0x200d5ca0, f=0x3052cdb0, throwflag=537746592) at Python/ceval.c:2600
    #15 0x1011bd10 in function_code_fastcall (tstate=0x3027d9ec, co=0x3027da00, args=0x0, nargs=810733296, globals=0x3052cdb0) at ./Include/internal/pycore_ceval.h:40
    #16 0x1011c9a0 in _PyFunction_Vectorcall (func=0x302bd120, stack=0x3027d998, nargsf=0, kwnames=0x0) at Objects/call.c:366
    #17 0x1015c848 in run_at_forkers (lst=0x302bd120, reverse=-1) at ./Include/cpython/abstract.h:118
    #18 0x101682e8 in PyOS_AfterFork_Child () at ./Modules/posixmodule.c:479
    #19 0x10168384 in os_fork (module=0x0, _unused_ignored=0xffffffff) at ./Modules/posixmodule.c:6264
    #20 0x101e7644 in cfunction_vectorcall_NOARGS (func=0x38, args=0xffffffff, nargsf=806606368, kwnames=0x1) at Objects/methodobject.c:20
    #21 0x10109d84 in call_function (tstate=0x200d5ca0, pp_stack=0x20e8304c, oparg=24, kwnames=0x0) at ./Include/cpython/abstract.h:118
    #22 0x101165a8 in _PyEval_EvalFrameDefault (tstate=0x200d5ca0, f=0x306b39d0, throwflag=537746592) at Python/ceval.c:2569
    #23 0x10119f80 in _PyEval_EvalCode (tstate=0x306b3b04, _co=0x306b65c8, globals=0x306a47b8, locals=0x306b3b10, args=0x306b3b08, argcount=-1, kwnames=0x30280030, kwargs=0xffffffff, kwcount=805326908, kwstep=0, defs=0x0, defcount=0,
    kwdefs=0x0, closure=0x0, name=0x20e831f0, qualname=0x28000282) at ./Include/internal/pycore_ceval.h:40
    #24 0x1011cb84 in _PyFunction_Vectorcall (func=0x0, stack=0x306a47b8, nargsf=0, kwnames=0x8) at Objects/call.c:395
    #25 0x1011c270 in PyVectorcall_Call (callable=0x30625440, tuple=0x30005030, kwargs=0x306b5180) at Objects/call.c:230
    #26 0x1011c510 in _PyObject_Call (tstate=0x30005030, callable=0x2003a67c <ccCGOhKO.rw+21820>, args=0x306b5180, kwargs=0x30625440) at Objects/call.c:265
    #27 0x1011c71c in PyObject_Call (callable=0x0, args=0xffffffff, kwargs=0xffffffff) at Objects/call.c:292
    #28 0x10116ad4 in _PyEval_EvalFrameDefault (tstate=0x200d5ca0, f=0x306d5760, throwflag=0) at Python/ceval.c:4198
    #29 0x1011bd10 in function_code_fastcall (tstate=0x3029a7d2, co=0x302c07a8, args=0x3066edac, nargs=812472476, globals=0x306d5760) at ./Include/internal/pycore_ceval.h:40
    #30 0x1011c9a0 in _PyFunction_Vectorcall (func=0x302c93a0, stack=0x3029a7b8, nargsf=1, kwnames=0x0) at Objects/call.c:366
    #31 0x10109d84 in call_function (tstate=0x200d5ca0, pp_stack=0x20e8357c, oparg=56, kwnames=0x0) at ./Include/cpython/abstract.h:118
    #32 0x101164f4 in _PyEval_EvalFrameDefault (tstate=0x200d5ca0, f=0x3066ec70, throwflag=537746592) at Python/ceval.c:2586
    #33 0x1011bd10 in function_code_fastcall (tstate=0x302ac200, co=0x302c0a00, args=0x306cb16c, nargs=812051888, globals=0x3066ec70) at ./Include/internal/pycore_ceval.h:40
    #34 0x1011c9a0 in _PyFunction_Vectorcall (func=0x302c9530, stack=0x302ac180, nargsf=1, kwnames=0x0) at Objects/call.c:366
    #35 0x10109d84 in call_function (tstate=0x200d5ca0, pp_stack=0x20e8379c, oparg=56, kwnames=0x0) at ./Include/cpython/abstract.h:118
    #36 0x101164f4 in _PyEval_EvalFrameDefault (tstate=0x200d5ca0, f=0x306cb030, throwflag=537746592) at Python/ceval.c:2586
    #37 0x1011bd10 in function_code_fastcall (tstate=0x302a4930, co=0x302c0820, args=0x20e83928, nargs=812429680, globals=0x306cb030) at ./Include/internal/pycore_ceval.h:40
    #38 0x1011c9a0 in _PyFunction_Vectorcall (func=0x302c93f0, stack=0x302a4928, nargsf=1, kwnames=0x0) at Objects/call.c:366
    #39 0x10220394 in method_vectorcall (method=0x302c93f0, args=0x3000503c, nargsf=56, kwnames=0x0) at ./Include/cpython/abstract.h:118
    #40 0x1011c270 in PyVectorcall_Call (callable=0x301e3ed0, tuple=0x30005030, kwargs=0x0) at Objects/call.c:230
    #41 0x1011c510 in _PyObject_Call (tstate=0x30005030, callable=0x2003878c <ccCGOhKO.rw+13900>, args=0x0, kwargs=0x301e3ed0) at Objects/call.c:265
    #42 0x1011c71c in PyObject_Call (callable=0x0, args=0xffffffff, kwargs=0xffffffff) at Objects/call.c:292
    #43 0x1024a158 in t_bootstrap (boot_raw=0x301df820) at ./Modules/_threadmodule.c:1011
    #44 0x100a683c in pythread_wrapper (arg=0xffffffff) at Python/thread_pthread.h:235
    #45 0xd055df68 in _pthread_body () from /usr/lib/libpthreads.a(shr_xpg5.o)
    #46 0x00000000 in ?? ()

    (gdb) frame 10
    #10 0x101164f4 in _PyEval_EvalFrameDefault (tstate=0x200d5ca0, f=0x302cdc48, throwflag=537746592) at Python/ceval.c:2586
    2586 PyObject *v = GETLOCAL(oparg);
    (gdb) p (char*)( ((PyASCIIObject*)f->f_code->co_name) + 1)
    $16 = 0x3027ad28 "_releaseLock"
    (gdb) p (char*)( ((PyASCIIObject*)f->f_code->co_filename) + 1)
    $17 = 0x3026d4c0 "/home/vstinner/python/master/Lib/logging/init.py"

    (gdb) frame 14
    #14 0x1011662c in _PyEval_EvalFrameDefault (tstate=0x200d5ca0, f=0x3052cdb0, throwflag=537746592) at Python/ceval.c:2600
    2600 PyObject *cell = freevars[oparg];
    (gdb) p (char*)( ((PyASCIIObject*)f->f_code->co_name) + 1)
    $18 = 0x30278af0 "_after_at_fork_child_reinit_locks"
    (gdb) p (char*)( ((PyASCIIObject*)f->f_code->co_filename) + 1)
    $19 = 0x3026d4c0 "/home/vstinner/python/master/Lib/logging/init.py"

    (gdb) info threads
    Id Target Id Frame
    2 Thread 1745 (tid 70975785, running) aix-thread: ptrace (52, 70975785) returned -1 (errno = 3 The process does not exist.)

    @vstinner
    Copy link
    Member Author

    I worked around the logging crash. Then I got a second crash in _PyThreadState_DeleteExcept() called in the child process after a fork.

    PyThreadState_Clear() calls tstate->on_delete() which is release_sentinel() of Modules/_threadmodule.c.

    Problem: lock objects are left in an inconsistent state after a fork: bpo-6721. This lock object should be replaced with a new lock object, or release_sentinel() should not be called.

    (gdb) where
    #0 0xd0563e00 in _internal_error () from /usr/lib/libpthreads.a(shr_xpg5.o)
    #1 0xd0573270 in _event_notify_locked@AF34_24 () from /usr/lib/libpthreads.a(shr_xpg5.o)
    #2 0xd057247c in _event_notify () from /usr/lib/libpthreads.a(shr_xpg5.o)
    #3 0xd0582b2c in _cond_broadcast () from /usr/lib/libpthreads.a(shr_xpg5.o)
    #4 0xd0584b0c in pthread_cond_signal@AF29_12 () from /usr/lib/libpthreads.a(shr_xpg5.o)
    #5 0xd058320c in pthread_cond_signal () from /usr/lib/libpthreads.a(shr_xpg5.o)
    #6 0x100a755c in PyThread_release_lock (lock=0x200bd4d0) at Python/thread_pthread.h:687
    #7 0x1024add0 in release_sentinel (wr_raw=0x306c0bd0) at ./Modules/_threadmodule.c:1213
    #8 0x100a406c in PyThreadState_Clear (tstate=0x200c0970) at Python/pystate.c:810
    #9 0x100a4b90 in _PyThreadState_DeleteExcept (runtime=0x20000200 <_PyRuntime>, tstate=0x20068870) at Python/pystate.c:924
    #10 0x1010ad8c in _PyEval_ReInitThreads (runtime=0x20000200 <_PyRuntime>) at Python/ceval.c:434
    #11 0x101682a8 in PyOS_AfterFork_Child () at ./Modules/posixmodule.c:473
    #12 0x10168384 in os_fork (module=0x0, _unused_ignored=0xffffffff) at ./Modules/posixmodule.c:6264
    #13 0x101e7644 in cfunction_vectorcall_NOARGS (func=0x3008b4b0, args=0xffffffff, nargsf=0, kwnames=0x20067800) at Objects/methodobject.c:20
    #14 0x10109d84 in call_function (tstate=0x200c9b00, pp_stack=0x2046f04c, oparg=24, kwnames=0x0) at ./Include/cpython/abstract.h:118
    #15 0x101165a8 in _PyEval_EvalFrameDefault (tstate=0x200c9b00, f=0x306ce2f0, throwflag=537697024) at Python/ceval.c:2569
    #16 0x10119f80 in _PyEval_EvalCode (tstate=0x306ce424, _co=0x3066ca00, globals=0x3066d0d8, locals=0x306ce430, args=0x306ce428, argcount=-1, kwnames=0x30670030, kwargs=0xffffffff, kwcount=805326908, kwstep=0, defs=0x0, defcount=0,
    kwdefs=0x0, closure=0x0, name=0x2046f1f0, qualname=0x28000282) at ./Include/internal/pycore_ceval.h:40
    #17 0x1011cb84 in _PyFunction_Vectorcall (func=0x0, stack=0x3066d0d8, nargsf=0, kwnames=0x8) at Objects/call.c:395
    #18 0x1011c270 in PyVectorcall_Call (callable=0x3065f710, tuple=0x30005030, kwargs=0x306b6068) at Objects/call.c:230
    #19 0x1011c510 in _PyObject_Call (tstate=0x30005030, callable=0x2003a64c <cc37zUOE.rw+21820>, args=0x306b6068, kwargs=0x3065f710) at Objects/call.c:265
    #20 0x1011c71c in PyObject_Call (callable=0x0, args=0xffffffff, kwargs=0xffffffff) at Objects/call.c:292
    #21 0x10116ad4 in _PyEval_EvalFrameDefault (tstate=0x200c9b00, f=0x306cba40, throwflag=0) at Python/ceval.c:4198
    #22 0x1011bd10 in function_code_fastcall (tstate=0x3029d6b2, co=0x302bc0a0, args=0x30681914, nargs=812432252, globals=0x306cba40) at ./Include/internal/pycore_ceval.h:40
    #23 0x1011c9a0 in _PyFunction_Vectorcall (func=0x302c7210, stack=0x3029d698, nargsf=1, kwnames=0x0) at Objects/call.c:366
    #24 0x10109d84 in call_function (tstate=0x200c9b00, pp_stack=0x2046f57c, oparg=56, kwnames=0x0) at ./Include/cpython/abstract.h:118
    #25 0x101164f4 in _PyEval_EvalFrameDefault (tstate=0x200c9b00, f=0x306817d8, throwflag=537697024) at Python/ceval.c:2586
    #26 0x1011bd10 in function_code_fastcall (tstate=0x302aa200, co=0x302bc2f8, args=0x306cc5a4, nargs=812128536, globals=0x306817d8) at ./Include/internal/pycore_ceval.h:40
    #27 0x1011c9a0 in _PyFunction_Vectorcall (func=0x302c73a0, stack=0x302aa180, nargsf=1, kwnames=0x0) at Objects/call.c:366
    #28 0x10109d84 in call_function (tstate=0x200c9b00, pp_stack=0x2046f79c, oparg=56, kwnames=0x0) at ./Include/cpython/abstract.h:118
    #29 0x101164f4 in _PyEval_EvalFrameDefault (tstate=0x200c9b00, f=0x306cc468, throwflag=537697024) at Python/ceval.c:2586
    #30 0x1011bd10 in function_code_fastcall (tstate=0x302a4828, co=0x302bc118, args=0x2046f928, nargs=812434856, globals=0x306cc468) at ./Include/internal/pycore_ceval.h:40
    #31 0x1011c9a0 in _PyFunction_Vectorcall (func=0x302c7260, stack=0x302a4820, nargsf=1, kwnames=0x0) at Objects/call.c:366
    #32 0x10220394 in method_vectorcall (method=0x302c7260, args=0x3000503c, nargsf=56, kwnames=0x0) at ./Include/cpython/abstract.h:118
    #33 0x1011c270 in PyVectorcall_Call (callable=0x306c0ea0, tuple=0x30005030, kwargs=0x0) at Objects/call.c:230
    #34 0x1011c510 in _PyObject_Call (tstate=0x30005030, callable=0x2003875c <cc37zUOE.rw+13900>, args=0x0, kwargs=0x306c0ea0) at Objects/call.c:265
    #35 0x1011c71c in PyObject_Call (callable=0x0, args=0xffffffff, kwargs=0xffffffff) at Objects/call.c:292
    #36 0x1024a158 in t_bootstrap (boot_raw=0x306bf7a8) at ./Modules/_threadmodule.c:1011
    #37 0x100a6830 in pythread_wrapper (arg=0xffffffff) at Python/thread_pthread.h:235
    #38 0xd055df68 in _pthread_body () from /usr/lib/libpthreads.a(shr_xpg5.o)
    #39 0x00000000 in ?? ()

    @vstinner
    Copy link
    Member Author

    I created 3 follow-up issues:

    • bpo-40089: Add _at_fork_reinit() method to locks
    • bpo-40091: Crash in logging._after_at_fork_child_reinit_locks()
    • bpo-40092: Crash in _PyThreadState_DeleteExcept() at fork in the process child

    @vstinner
    Copy link
    Member Author

    Fixed by a9f9687

    @vstinner
    Copy link
    Member Author

    Fixed by a9f9687

    Ooops, sorry. Only PR 19200 is fixed by this commit.

    But this issue is made of 3 bugs. I prefer to leave it open until the 3 bugs are fixed.

    @vstinner vstinner reopened this Mar 31, 2020
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @vstinner
    Copy link
    Member Author

    vstinner commented Nov 3, 2022

    I don't have access to AIX. Nobody is working on this issue. AIX is not supported by PEP 11. I just close this old issue.

    @vstinner vstinner closed this as completed Nov 3, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes 3.9 only security fixes tests Tests in the Lib/test dir
    Projects
    Development

    No branches or pull requests

    1 participant