Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_multiprocessing_spawn segfaults on AMD64 FreeBSD CURRENT Shared 3.x #81316

Closed
pablogsal opened this issue Jun 2, 2019 · 18 comments
Closed
Labels
3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error

Comments

@pablogsal
Copy link
Member

BPO 37135
Nosy @pitrou, @vstinner, @ambv, @koobs, @pablogsal
Files
  • stress.py
  • sleep_at_exit.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-06-04.12:46:17.390>
    created_at = <Date 2019-06-02.22:54:17.945>
    labels = ['interpreter-core', '3.8', 'type-bug', 'tests']
    title = 'test_multiprocessing_spawn segfaults on AMD64 FreeBSD CURRENT Shared 3.x'
    updated_at = <Date 2019-06-04.12:46:17.389>
    user = 'https://github.com/pablogsal'

    bugs.python.org fields:

    activity = <Date 2019-06-04.12:46:17.389>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-06-04.12:46:17.390>
    closer = 'vstinner'
    components = ['Interpreter Core', 'Tests']
    creation = <Date 2019-06-02.22:54:17.945>
    creator = 'pablogsal'
    dependencies = []
    files = ['48387', '48388']
    hgrepos = []
    issue_num = 37135
    keywords = ['patch', 'buildbot']
    message_count = 18.0
    messages = ['344332', '344354', '344356', '344385', '344410', '344425', '344428', '344449', '344453', '344495', '344499', '344502', '344505', '344508', '344509', '344510', '344511', '344559']
    nosy_count = 5.0
    nosy_names = ['pitrou', 'vstinner', 'lukasz.langa', 'koobs', 'pablogsal']
    pr_nums = []
    priority = 'high'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue37135'
    versions = ['Python 3.8']

    @pablogsal
    Copy link
    Member Author

    test_import (test.test_multiprocessing_spawn._TestImportStar) ... ok
    ----------------------------------------------------------------------
    Ran 352 tests in 322.667s
    OK (skipped=34)
    Warning -- files was modified by test_multiprocessing_spawn
    Before: []
    After: ['python.core']
    0:11:39 load avg: 4.96 [202/423/1] test_winconsoleio skipped
    test_winconsoleio skipped -- test only relevant on win32

    https://buildbot.python.org/all/#/builders/168/builds/1124/steps/5/logs/stdio

    @pablogsal pablogsal added 3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error labels Jun 2, 2019
    @vstinner
    Copy link
    Member

    vstinner commented Jun 3, 2019

    @vstinner
    Copy link
    Member

    vstinner commented Jun 3, 2019

    Maybe it's related to bpo-33608.

    @pitrou
    Copy link
    Member

    pitrou commented Jun 3, 2019

    Would be nice if someone could post a gdb backtrace of the core dump.

    @pablogsal
    Copy link
    Member Author

    I am unable to reproduce this locally, will post here a backtrace if I manage to do so.

    @vstinner
    Copy link
    Member

    vstinner commented Jun 3, 2019

    https://bugs.python.org/issue33608#msg340075:

    My procedure to reproduce the crash on FreeBSD:
    https://bugs.python.org/issue36114#msg337092

    I ran this test for 20 min on the FreeBSD CURRENT buildbot: I failed to reproduce the bug.

    @pablogsal
    Copy link
    Member Author

    Adding the release manager to consider this

    @ambv
    Copy link
    Contributor

    ambv commented Jun 3, 2019

    FTR Victor reverted #57923 that triggers this.

    @pablogsal
    Copy link
    Member Author

    FTR Victor reverted #57923 that triggers this.

    Given the nature of the bugs, I would recommend to watch the buildbots

    @vstinner
    Copy link
    Member

    vstinner commented Jun 3, 2019

    FTR Victor reverted #57923 that triggers this.

    Sadly, this revert wasn't enough. New fresh coredump on "AMD64 FreeBSD CURRENT Shared 3.x":
    https://buildbot.python.org/all/#builders/168/builds/1145

    Warning -- Dangling processes: {<SpawnProcess
    name='SharedMemoryManager-313' pid=12404 parent=11647 started>}
    Warning -- Dangling processes: {<SpawnProcess
    name='SharedMemoryManager-313' pid=12404 parent=11647 started>}
    Warning -- multiprocessing.process._dangling was modified by
    test_multiprocessing_spawn
    Before: set()
    After: {<weakref at 0x80273a890; to 'SpawnProcess' at 0x8029f6b40>}
    Warning -- files was modified by test_multiprocessing_spawn
    Before: []
    After: ['python.core']

    @pablogsal
    Copy link
    Member Author

    CURRENT-amd64% lldb ./python -c python.core
    (lldb) target create "./python" --core "python.core"
    Core file '/home/pablo/cpython/python.core' (x86_64) was loaded.
    (lldb) bt

    • thread Support "bpo-" in Misc/NEWS #1, name = 'python', stop reason = signal SIGBUS
      • frame #0: 0x000000000035cfad pythontake_gil(tstate=0x0000000802b34010) at ceval_gil.h:216:13 frame #1: 0x000000000035d499 pythonPyEval_RestoreThread(tstate=0x0000000802b34010) at ceval.c:281:5
        frame Rename README to README.rst and enhance formatting #2: 0x000000000040c045 python
        frame bpo-29403: Fix mock's broken autospec behavior on method-bound builtin functions #3: 0x000000000047cc51 python
        frame Update Python Software Foundation Copyright Year. #4: 0x000000000034f59c python_PyMethodDef_RawFastCallKeywords(method=0x0000000802b34010, self=0x00000008011c1578, args=0x0000000000000092, nargs=2, kwnames=<unavailable>) at call.c:653:18 frame #5: 0x000000000034e549 python_PyCFunction_FastCallKeywords(func=0x00000008011c8280, args=, nargs=, kwnames=) at call.c:732:14
        frame Add Pycharm's .idea directory to gitignore #6: 0x000000000036d850 pythoncall_function(pp_stack=0x00007fffdfbfbcc8, oparg=<unavailable>, kwnames=<unavailable>) at ceval.c:4673:9 frame #7: 0x0000000000368d09 python_PyEval_EvalFrameDefault(f=, throwflag=) at ceval.c:3294:19
        frame Change some mercurial/ hg.python.org references. #8: 0x000000000036e59c python_PyEval_EvalCodeWithName [inlined] PyEval_EvalFrameEx(f=<unavailable>, throwflag=0) at ceval.c:624:12 frame #9: 0x000000000036e586 python_PyEval_EvalCodeWithName(_co=, globals=, locals=, args=, argcount=2, kwnames=0x0000000000000000, kwargs=0x000000080266e5f8, kwcount=0, kwstep=1, defs=0x0000000802533558, defcount=1, kwdefs=0x0000000000000000, closure=0x0000000000000000, name=0x00000008017355f0, qualname=0x0000000802524d60) at ceval.c:4035
        frame bpo-29474: Improve documentation for weakref.WeakValueDictionary #10: 0x000000000034e438 python_PyFunction_FastCallKeywords(func=<unavailable>, stack=<unavailable>, nargs=<unavailable>, kwnames=<unavailable>) at call.c:435:12 frame #11: 0x000000000036d9c7 pythoncall_function(pp_stack=0x00007fffdfbfc020, oparg=, kwnames=) at ceval.c:4721:17
        frame bpo-29524: Add Objects/call.c file #12: 0x0000000000368bae python_PyEval_EvalFrameDefault(f=<unavailable>, throwflag=<unavailable>) at ceval.c:3280:23 frame #13: 0x000000000034ebaa pythonfunction_code_fastcall(co=, args=, nargs=2, globals=) at call.c:285:14
        frame Disable Travis docs job until a fix is found #14: 0x000000000036d9c7 pythoncall_function(pp_stack=0x00007fffdfbfc230, oparg=<unavailable>, kwnames=<unavailable>) at ceval.c:4721:17 frame #15: 0x0000000000368bae python_PyEval_EvalFrameDefault(f=, throwflag=) at ceval.c:3280:23
        frame Make Travis docs build more lenient #16: 0x000000000036e59c python_PyEval_EvalCodeWithName [inlined] PyEval_EvalFrameEx(f=<unavailable>, throwflag=0) at ceval.c:624:12 frame #17: 0x000000000036e586 python_PyEval_EvalCodeWithName(_co=, globals=, locals=, args=, argcount=2, kwnames=0x0000000000000000, kwargs=0x000000080271eb58, kwcount=0, kwstep=1, defs=0x000000080251dbd8, defcount=2, kwdefs=0x0000000000000000, closure=0x0000000000000000, name=0x000000080187fd60, qualname=0x0000000802522340) at ceval.c:4035
        frame Rename Doc/README.txt to Doc/README.rst #18: 0x000000000034e438 python_PyFunction_FastCallKeywords(func=<unavailable>, stack=<unavailable>, nargs=<unavailable>, kwnames=<unavailable>) at call.c:435:12 frame #19: 0x000000000036d9c7 pythoncall_function(pp_stack=0x00007fffdfbfc520, oparg=, kwnames=) at ceval.c:4721:17
        frame Update link destination of the Mersenne Twister homepage #20: 0x0000000000368bae python_PyEval_EvalFrameDefault(f=<unavailable>, throwflag=<unavailable>) at ceval.c:3280:23 frame #21: 0x000000000034ebaa pythonfunction_code_fastcall(co=, args=, nargs=2, globals=) at call.c:285:14
        frame [backport to 3.6] bpo-29474: Improve documentation for weakref.WeakValueDictionary #22: 0x000000000036d9c7 pythoncall_function(pp_stack=0x00007fffdfbfc730, oparg=<unavailable>, kwnames=<unavailable>) at ceval.c:4721:17 frame #23: 0x0000000000368bae python_PyEval_EvalFrameDefault(f=, throwflag=) at ceval.c:3280:23
        frame bpo-28837: Fix lib2to3 handling of map/zip/filter #24: 0x000000000034ebaa pythonfunction_code_fastcall(co=<unavailable>, args=<unavailable>, nargs=2, globals=<unavailable>) at call.c:285:14 frame #25: 0x000000000034f934 python_PyObject_Call_Prepend(callable=0x00000008025b4100, obj=0x0000000801878b30, args=, kwargs=0x000000080267e980) at call.c:906:14
        frame [backport to 3.5] bpo-29529: Add .travis.yml to 3.5 branch #26: 0x000000000034e75f pythonPyObject_Call(callable=0x00000008021008d8, args=0x00000008017a7cb0, kwargs=0x000000080267e980) at call.c:247:18 frame #27: 0x00000000003690a9 python_PyEval_EvalFrameDefault [inlined] do_call_core(func=, callargs=, kwdict=0x000000080267e980) at ceval.c:4775:12
        frame bpo-28556: Various updates to typing #28: 0x000000000036907e python_PyEval_EvalFrameDefault(f=<unavailable>, throwflag=<unavailable>) at ceval.c:3353 frame #29: 0x000000000034ebaa pythonfunction_code_fastcall(co=, args=, nargs=1, globals=) at call.c:285:14
        frame Allow up to a 0.01% drop in coverage #30: 0x000000000036d9c7 pythoncall_function(pp_stack=0x00007fffdfbfcba0, oparg=<unavailable>, kwnames=<unavailable>) at ceval.c:4721:17 frame #31: 0x0000000000368bae python_PyEval_EvalFrameDefault(f=, throwflag=) at ceval.c:3280:23
        frame bpo-29576: Improve some deprecations in the importlib #32: 0x000000000034ebaa pythonfunction_code_fastcall(co=<unavailable>, args=<unavailable>, nargs=1, globals=<unavailable>) at call.c:285:14 frame #33: 0x000000000036d9c7 pythoncall_function(pp_stack=0x00007fffdfbfcdb0, oparg=, kwnames=) at ceval.c:4721:17
        frame bpo-29026: Clarify documentation of time.time #34: 0x0000000000368bae python_PyEval_EvalFrameDefault(f=<unavailable>, throwflag=<unavailable>) at ceval.c:3280:23 frame #35: 0x000000000034ebaa pythonfunction_code_fastcall(co=, args=, nargs=1, globals=) at call.c:285:14
        frame [backport to 3.5] bpo-28929: Link the documentation to its source file on GitHub #36: 0x000000000034f934 python_PyObject_Call_Prepend(callable=0x000000080152ff70, obj=0x0000000802681a10, args=<unavailable>, kwargs=0x0000000000000000) at call.c:906:14 frame #37: 0x000000000034e75f pythonPyObject_Call(callable=0x0000000802aa4a10, args=0x000000080106c050, kwargs=0x0000000000000000) at call.c:247:18
        frame [backport to 2.7] bpo-28929: Link the documentation to its source file on GitHub #38: 0x00000000004b35e8 python
        frame [backport to 3.6] bpo-29438: fixed use-after-free in key sharing dict #39: 0x00000000004049b9 python
        frame [backport to 3.5] bpo-29438: Fixed use-after-free in key sharing dict #40: 0x000000080059774b libthr.so.3`thread_start(curthread=0x0000000802b34010) at thr_create.c:291:16
        (lldb) debug2: client_check_window_change: changed
        debug2: channel 0: request window-change confirm 0

    @vstinner
    Copy link
    Member

    vstinner commented Jun 4, 2019

    I looked at the coredump with Pablo. In short, the main thread is calling Py_Exit() to exit the process and so released memory, and a daemon thread does crash on calling PyEval_RestoreThread() because tstate memory was freed.

    The question is now if this bug is a regression compared to Python 3.7 or not. I'm trying to reproduce it on Linux by adding "sleep(1)" before exit, but my attempts are unsuccessful so far.

    @vstinner
    Copy link
    Member

    vstinner commented Jun 4, 2019

    I marked bpo-37143 "multiprocessing crashed with EXCEPTION_ACCESS_VIOLATION on Python on x86 Windows7 3.x" as a duplicate of this issue. Even if bpo-37143 looks very different (crash in a different file, on a different operating system), I'm now quite sure that it has the same root cause.

    @pablogsal
    Copy link
    Member Author

    Note that the core dump that we are talking about is something that we produced afterwards when trying to reproduce the issue. The core that is produced as part of the tests could be different. I am trying to get access to the test files.

    @vstinner
    Copy link
    Member

    vstinner commented Jun 4, 2019

    If you apply attached sleep_at_exit.patch and run attach stress.py, you should quickly get a crash:

    $ git apply sleep_at_exit.patch
    $ make && ./python stress.py 
    Segmentation fault (core dumped)

    That's a simplified example of the multiprocessing crash.

    @vstinner
    Copy link
    Member

    vstinner commented Jun 4, 2019

    I used git bisect and I found:

    commit 396e0a8 (refs/bisect/bad)
    Author: Eric Snow <ericsnowcurrently@gmail.com>
    Date: Fri May 31 21:16:47 2019 -0600

    bpo-36818: Add PyInterpreterState.runtime field. (gh-13129)
    

    stress.py starts to crash at this change.

    @vstinner
    Copy link
    Member

    vstinner commented Jun 4, 2019

    commit 396e0a8 (refs/bisect/bad)

    Extract of this change:
     
     static inline void
    -exit_thread_if_finalizing(_PyRuntimeState *runtime, PyThreadState *tstate)
    +exit_thread_if_finalizing(PyThreadState *tstate)
     {
    +    _PyRuntimeState *runtime = tstate->interp->runtime;
         /* _Py_Finalizing is protected by the GIL */
         if (runtime->finalizing != NULL && !_Py_CURRENTLY_FINALIZING(runtime, tstate)) {
             drop_gil(&runtime->ceval, tstate);
    @@ -236,7 +237,7 @@ PyEval_AcquireLock(void)
             Py_FatalError("PyEval_AcquireLock: current thread state is NULL");
         }
         take_gil(ceval, tstate);
    -    exit_thread_if_finalizing(runtime, tstate);
    +    exit_thread_if_finalizing(tstate);
     }
     

    This change is fine for regular Python threads. But for daemon threads, tstate is likely already corrupted, so it's no longer possible to get interp from interp, and so also not possible to get 'runtime'.

    @vstinner
    Copy link
    Member

    vstinner commented Jun 4, 2019

    The initial issue has been fixed by a revert.

    Let's continue the discussion on bpo-36818 to maybe reapply commit 396e0a8.

    @vstinner vstinner closed this as completed Jun 4, 2019
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants