Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyOS_InterruptOccurred() now requires to hold the GIL: PyOS_Readline() crash #85003

Closed
JelleZijlstra opened this issue May 30, 2020 · 31 comments
Closed
Labels
3.8 only security fixes 3.9 only security fixes 3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@JelleZijlstra
Copy link
Member

BPO 40826
Nosy @db3l, @vstinner, @benjaminp, @serhiy-storchaka, @eryksun, @JelleZijlstra, @corona10, @remilapeyre
PRs
  • bpo-40826: Fix segfaults issue when close file descriptor 0 on REPL #20549
  • bpo-40826: Add _Py_EnsureTstateNotNULL() macro #20571
  • bpo-40826: PyOS_InterruptOccurred() requires GIL #20578
  • bpo-40826: Fix GIL usage in PyOS_Readline() #20579
  • bpo-40826: Add _PyOS_InterruptOccurred(tstate) function #20599
  • [3.9] bpo-40826: Fix GIL usage in PyOS_Readline() #20613
  • [3.8] bpo-40826: Fix GIL usage in PyOS_Readline() (GH-20613) #20616
  • [3.9] bpo-40826: PyOS_InterruptOccurred() requires GIL (GH-20578) #20618
  • bpo-40826: Fix test_repl.test_close_stdin() on Windows #20779
  • [3.9] bpo-40826: Fix test_repl.test_close_stdin() on Windows (GH-20779) #20785
  • [3.8] bpo-40826: Fix test_repl.test_close_stdin() on Windows (GH-20779) (GH-20785) #20787
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-06-10.17:46:11.537>
    created_at = <Date 2020-05-30.14:19:46.754>
    labels = ['interpreter-core', '3.10', '3.8', '3.9', 'type-crash']
    title = 'PyOS_InterruptOccurred() now requires to hold the GIL: PyOS_Readline() crash'
    updated_at = <Date 2020-06-10.20:13:01.555>
    user = 'https://github.com/JelleZijlstra'

    bugs.python.org fields:

    activity = <Date 2020-06-10.20:13:01.555>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-06-10.17:46:11.537>
    closer = 'vstinner'
    components = ['Interpreter Core']
    creation = <Date 2020-05-30.14:19:46.754>
    creator = 'JelleZijlstra'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 40826
    keywords = ['patch', '3.9regression']
    message_count = 31.0
    messages = ['370386', '370387', '370388', '370392', '370549', '370550', '370552', '370553', '370554', '370555', '370556', '370569', '370575', '370576', '370661', '370676', '370679', '370682', '370683', '370692', '370694', '370696', '370716', '370718', '371169', '371223', '371224', '371226', '371227', '371233', '371239']
    nosy_count = 9.0
    nosy_names = ['db3l', 'vstinner', 'benjamin.peterson', 'stutzbach', 'serhiy.storchaka', 'eryksun', 'JelleZijlstra', 'corona10', 'remi.lapeyre']
    pr_nums = ['20549', '20571', '20578', '20579', '20599', '20613', '20616', '20618', '20779', '20785', '20787']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue40826'
    versions = ['Python 3.8', 'Python 3.9', 'Python 3.10']

    @JelleZijlstra
    Copy link
    Member Author

    $ gdb ./python 
    ...
    (gdb) r
    ...
    Python 3.9.0b1 (tags/v3.9.0b1:97fe9cfd9f8, May 30 2020, 05:26:48) 
    ...
    >>> import io
    >>> io.FileIO(0).name
    0
    >>> 
    Program received signal SIGSEGV, Segmentation fault.
    _PyInterpreterState_GET () at ./Include/internal/pycore_pystate.h:100
    100         return tstate->interp;
    (gdb) bt
    #0  _PyInterpreterState_GET () at ./Include/internal/pycore_pystate.h:100
    #1  PyOS_InterruptOccurred () at ./Modules/signalmodule.c:1785
    #2  0x0000000000673905 in my_fgets (buf=buf@entry=0xa40780 "8LJ\367\377\177", len=len@entry=100, fp=fp@entry=0x7ffff74a48e0 <_IO_2_1_stdin_>) at Parser/myreadline.c:87
    #3  0x000000000067397b in PyOS_StdioReadline (sys_stdin=sys_stdin@entry=0x7ffff74a48e0 <_IO_2_1_stdin_>, sys_stdout=sys_stdout@entry=0x7ffff74a5620 <_IO_2_1_stdout_>, 
        prompt=prompt@entry=0x7ffff6f8d8e0 ">>> ") at Parser/myreadline.c:269
    #4  0x0000000000673b4f in PyOS_Readline (sys_stdin=0x7ffff74a48e0 <_IO_2_1_stdin_>, sys_stdout=0x7ffff74a5620 <_IO_2_1_stdout_>, prompt=0x7ffff6f8d8e0 ">>> ") at Parser/myreadline.c:355
    #5  0x00000000005d90d9 in tok_nextc (tok=0xa3fd30) at Parser/tokenizer.c:856
    #6  0x00000000005dad02 in tok_get (tok=tok@entry=0xa3fd30, p_start=p_start@entry=0x7fffffffd508, p_end=p_end@entry=0x7fffffffd510) at Parser/tokenizer.c:1194
    #7  0x00000000005dcb69 in PyTokenizer_Get (tok=0xa3fd30, p_start=p_start@entry=0x7fffffffd508, p_end=p_end@entry=0x7fffffffd510) at Parser/tokenizer.c:1842
    #8  0x0000000000653c73 in _PyPegen_fill_token (p=0x7ffff6f8f540) at Parser/pegen/pegen.c:551
    #9  0x0000000000670355 in statement_newline_rule (p=0x7ffff6f8f540) at Parser/pegen/parse.c:1143
    #10 interactive_rule (p=0x7ffff6f8f540) at Parser/pegen/parse.c:725
    #11 _PyPegen_parse (p=p@entry=0x7ffff6f8f540) at Parser/pegen/parse.c:19388
    #12 0x0000000000654b32 in _PyPegen_run_parser (p=0x7ffff6f8f540) at Parser/pegen/pegen.c:1037
    #13 0x0000000000654e4c in _PyPegen_run_parser_from_file_pointer (fp=fp@entry=0x70f74a48e0, start_rule=start_rule@entry=80, filename_ob=filename_ob@entry=0x7ffff6f84eb0, 
        enc=enc@entry=0x7ffff704ec60 "utf-8", ps1=ps1@entry=0x7ffff6f8d8e0 ">>> ", ps2=ps2@entry=0x90000000d0 <error: Cannot access memory at address 0x90000000d0>, flags=0x7fffffffd7b8, 
        errcode=0x7fffffffd6e4, arena=0x7ffff6ff6b30) at Parser/pegen/pegen.c:1097
    #14 0x00000000005d6bea in PyPegen_ASTFromFileObject (fp=0x70f74a48e0, fp@entry=0x7ffff74a48e0 <_IO_2_1_stdin_>, filename_ob=filename_ob@entry=0x7ffff6f84eb0, mode=80, mode@entry=256, 
        enc=enc@entry=0x7ffff704ec60 "utf-8", ps1=ps1@entry=0x7ffff6f8d8e0 ">>> ", ps2=0x90000000d0 <error: Cannot access memory at address 0x90000000d0>, ps2@entry=0x7ffff6f8dbe0 "... ", 
        flags=<optimized out>, errcode=<optimized out>, arena=<optimized out>) at Parser/pegen/peg_api.c:52
    #15 0x00000000005460d9 in PyRun_InteractiveOneObjectEx (fp=fp@entry=0x7ffff74a48e0 <_IO_2_1_stdin_>, filename=filename@entry=0x7ffff6f84eb0, flags=flags@entry=0x7fffffffd7b8) at Python/pythonrun.c:243
    #16 0x000000000054631e in PyRun_InteractiveLoopFlags (fp=fp@entry=0x7ffff74a48e0 <_IO_2_1_stdin_>, filename_str=filename_str@entry=0x673e32 "<stdin>", flags=flags@entry=0x7fffffffd7b8)
        at Python/pythonrun.c:122
    #17 0x0000000000546d4c in PyRun_AnyFileExFlags (fp=0x7ffff74a48e0 <_IO_2_1_stdin_>, filename=0x673e32 "<stdin>", closeit=0, flags=0x7fffffffd7b8) at Python/pythonrun.c:81
    #18 0x0000000000429fb7 in pymain_run_stdin (cf=0x7fffffffd7b8, config=0x9bd800) at Modules/main.c:467
    #19 pymain_run_python (exitcode=0x7fffffffd7b0) at Modules/main.c:556
    #20 Py_RunMain () at Modules/main.c:632
    #21 0x000000000042a2d6 in pymain_main (args=0x7fffffffd8a0) at Modules/main.c:662
    #22 Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:686
    #23 0x00007ffff7100830 in __libc_start_main (main=0x41ef30 <main>, argc=1, argv=0x7fffffffd9f8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffd9e8)
        at ../csu/libc-start.c:291
    #24 0x0000000000429089 in _start ()
    (gdb) 

    Same happens with Python 3.9.0b1+ (heads/3.9:588efc29c5d, May 30 2020, 14:16:10) (current HEAD of the 3.9 branch).

    In previous versions of Python this would exit the interpreter but not segfault.

    @JelleZijlstra JelleZijlstra added 3.10 only security fixes 3.9 only security fixes labels May 30, 2020
    @tirkarthi tirkarthi added type-crash A hard crash of the interpreter, possibly with a core dump labels May 30, 2020
    @corona10
    Copy link
    Member

    Thanks for report, I can reproduce it

    @serhiy-storchaka
    Copy link
    Member

    Simpler reproducer:

    import os
    os.close(0)

    @serhiy-storchaka serhiy-storchaka added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label May 30, 2020
    @serhiy-storchaka serhiy-storchaka changed the title Segfaults on io.FileIO(0).name in 3.9 Segfaults when close file descriptor 0 May 30, 2020
    @serhiy-storchaka serhiy-storchaka added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label May 30, 2020
    @serhiy-storchaka serhiy-storchaka changed the title Segfaults on io.FileIO(0).name in 3.9 Segfaults when close file descriptor 0 May 30, 2020
    @corona10
    Copy link
    Member

    FYI this change fix this issue.

    --- a/Modules/signalmodule.c
    +++ b/Modules/signalmodule.c
    @@ -1782,7 +1782,11 @@ PyOS_FiniInterrupts(void)
     int
     PyOS_InterruptOccurred(void)
     {
    -    PyInterpreterState *interp = _PyInterpreterState_GET();
    +    PyThreadState *tstate = _PyThreadState_GET();
    +    if (!tstate) {
    +        return 0;
    +    }
    +    PyInterpreterState *interp = tstate->interp;
         if (!_Py_ThreadCanHandleSignals(interp)) {
             return 0;
         }

    @vstinner
    Copy link
    Member

    vstinner commented Jun 1, 2020

    The following change modified PyOS_InterruptOccurred() to require the hold the GIL:

    commit d831688
    Author: Victor Stinner <vstinner@python.org>
    Date: Fri Mar 20 14:50:35 2020 +0100

    bpo-40010: Optimize pending calls in multithreaded applications (GH-19091)
    
    If a thread different than the main thread schedules a pending call
    (Py_AddPendingCall()), the bytecode evaluation loop is no longer
    interrupted at each bytecode instruction to check for pending calls
    which cannot be executed. Only the main thread can execute pending
    calls.
    
    Previously, the bytecode evaluation loop was interrupted at each
    instruction until the main thread executes pending calls.
    
    * Add _Py_ThreadCanHandlePendingCalls() function.
    * SIGNAL_PENDING_CALLS() now only sets eval_breaker to 1 if the
      current thread can execute pending calls. Only the main thread can
      execute pending calls.
    

    --

    PyOS_InterruptOccurred() is part of the limited C API, but it's not even documented :-( So it's not easy to say if it's a backward incompatible change. At least, as the author of the change, I can say that the change was deliberate.

    bpo-40010 rationale is that only the main thread of the main interpreter must call Python signal handlers. So when another thread or another interpreter (running the main thread) asks "should the Python signal handlers be executed", the answer is always "no", even if yes, Python got a signal (which requires to execute at least one Python signal handler).

    The change is that PyOS_InterruptOccurred() now requires to hold the GIL to get current Python thread state: _PyThreadState_GET() returns NULL when the GIL is released.

    Modifying PyOS_InterruptOccurred() to always return 0 if the GIL is released (if _PyThreadState_GET() returns NULL) is wrong: if the function is called by the main thread of the main interpreter, it must return 1.

    --

    By the way, I rewrote the C signal handler in Python 3.9 to not require to get the current Python thread state. In Python 3.8, it was modified and started to require the current Python thread if writing into the wakeup file descriptor failed to schedule a "pending call".

    commit b54a99d
    Author: Victor Stinner <vstinner@python.org>
    Date: Wed Apr 8 23:35:05 2020 +0200

    bpo-40082: trip_signal() uses the main interpreter (GH-19441)
    
    Fix the signal handler: it now always uses the main interpreter,
    rather than trying to get the current Python thread state.
    
    The following function now accepts an interpreter, instead of a
    Python thread state:
    
    * _PyEval_SignalReceived()
    * _Py_ThreadCanHandleSignals()
    * _PyEval_AddPendingCall()
    * COMPUTE_EVAL_BREAKER()
    * SET_GIL_DROP_REQUEST(), RESET_GIL_DROP_REQUEST()
    * SIGNAL_PENDING_CALLS(), UNSIGNAL_PENDING_CALLS()
    * SIGNAL_PENDING_SIGNALS(), UNSIGNAL_PENDING_SIGNALS()
    * SIGNAL_ASYNC_EXC(), UNSIGNAL_ASYNC_EXC()
    
    Py_AddPendingCall() now uses the main interpreter if it fails to the
    current Python thread state.
    
    Convert _PyThreadState_GET() and PyInterpreterState_GET_UNSAFE()
    macros to static inline functions.
    

    @vstinner vstinner changed the title Segfaults when close file descriptor 0 PyOS_InterruptOccurred() now requires to hold the GIL: PyOS_Readline() crash Jun 1, 2020
    @vstinner vstinner changed the title Segfaults when close file descriptor 0 PyOS_InterruptOccurred() now requires to hold the GIL: PyOS_Readline() crash Jun 1, 2020
    @vstinner
    Copy link
    Member

    vstinner commented Jun 1, 2020

    PyOS_InterruptOccurred() now requires to hold the GIL

    This change impacted gdb:

    gdbpy_check_quit_flag() called PyOS_InterruptOccurred() without holding the GIL. It worked in Python 3.8, but started to crash in Python 3.9 beta1.

    gdb had another issue, it started by always releasing the GIL and always called the Python C API with the GIL released (if I understood correctly).

    gdb was fixed by calling PyGILState_Ensure()/PyGILState_Release() when calling the Python C API, especially PyOS_InterruptOccurred().

    Maybe the minimum would be to *document* that the GIL must be held to call PyOS_InterruptOccurred(), and that it's not a new requirements.

    @vstinner
    Copy link
    Member

    vstinner commented Jun 1, 2020

    New changeset 3026cad by Victor Stinner in branch 'master':
    bpo-40826: Add _Py_EnsureTstateNotNULL() macro (GH-20571)
    3026cad

    @vstinner
    Copy link
    Member

    vstinner commented Jun 1, 2020

    I can reproduce PyOS_InterruptOccurred() crash in Python 3.8 if I remove readline.cpython-38d-x86_64-linux-gnu.so and I disable EINTR error checking in my_fgets():

    diff --git a/Parser/myreadline.c b/Parser/myreadline.c
    index 43e5583b8b..2712dedacd 100644
    --- a/Parser/myreadline.c
    +++ b/Parser/myreadline.c
    @@ -73,7 +73,7 @@ my_fgets(char *buf, int len, FILE *fp)
                 clearerr(fp);
                 return -1; /* EOF */
             }
    -#ifdef EINTR
    +#if 0
             if (err == EINTR) {
                 int s;
                 PyEval_RestoreThread(_PyOS_ReadlineTState);
    vstinner@apu$ ./python
    Python 3.8.3+ (heads/3.8-dirty:00a240bf7f, Jun  1 2020, 17:00:22) 
    [GCC 10.1.1 20200507 (Red Hat 10.1.1-1)] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> 
    ^C
    Erreur de segmentation (core dumped)

    I cannot reproduce the issue in Python 3.7.

    ---

    Python 3.7:
    ---

    int
    PyOS_InterruptOccurred(void)
    {
        if (_Py_atomic_load_relaxed(&Handlers[SIGINT].tripped)) {
            if (PyThread_get_thread_ident() != main_thread)
                return 0;
            _Py_atomic_store_relaxed(&Handlers[SIGINT].tripped, 0);
            return 1;
        }
        return 0;
    }

    Python 3.8:
    ---

    static int
    is_main(_PyRuntimeState *runtime)
    {
        unsigned long thread = PyThread_get_thread_ident();
        PyInterpreterState *interp = _PyRuntimeState_GetThreadState(runtime)->interp;
        return (thread == runtime->main_thread
                && interp == runtime->interpreters.main);
    }
    
    int
    PyOS_InterruptOccurred(void)
    {
        if (_Py_atomic_load_relaxed(&Handlers[SIGINT].tripped)) {
            _PyRuntimeState *runtime = &_PyRuntime;
            if (!is_main(runtime)) {
                return 0;
            }
            _Py_atomic_store_relaxed(&Handlers[SIGINT].tripped, 0);
            return 1;
        }
        return 0;
    }

    is_main() function was added in Python 3.8 by:

    commit 64d6cc8
    Author: Eric Snow <ericsnowcurrently@gmail.com>
    Date: Sat Feb 23 15:40:43 2019 -0700

    bpo-35724: Explicitly require the main interpreter for signal-handling. (GH-11530)
    
    Ensure that the main interpreter is active (in the main thread) for signal-handling operations. This is increasingly relevant as people use subinterpreters more.
    
    https://bugs.python.org/issue35724
    

    @vstinner
    Copy link
    Member

    vstinner commented Jun 1, 2020

    It seems like this issue was introduced by this change:

    commit 7281898
    Author: Victor Stinner <vstinner@python.org>
    Date: Thu Mar 26 22:28:11 2020 +0100

    bpo-38644: Pass tstate explicitly in signalmodule.c (GH-19184)
    
    PyOS_InterruptOccurred() now checks _Py_ThreadCanHandleSignals()
    before checking if SIGINT is tripped.
    

    Include/internal/pycore_pyerrors.h | 2 +
    Modules/signalmodule.c | 152 ++++++++++++++++++++++---------------
    Python/ceval.c | 4 +-
    3 files changed, 93 insertions(+), 65 deletions(-)

    Before:
    ---

    static inline int
    _Py_IsMainInterpreter(PyThreadState* tstate)
    {   
        /* Use directly _PyRuntime rather than tstate->interp->runtime, since
           this function is used in performance critical code path (ceval) */
        return (tstate->interp == _PyRuntime.interpreters.main);
    }   
    
    static inline int                          
    _Py_ThreadCanHandleSignals(PyThreadState *tstate)
    {        
        return (_Py_IsMainThread() && _Py_IsMainInterpreter(tstate));
    }   
                        
    static int                                 
    thread_can_handle_signals(void)
    {        
        PyThreadState *tstate = _PyThreadState_GET();              
        return _Py_ThreadCanHandleSignals(tstate);
    }    
    
    int
    PyOS_InterruptOccurred(void)
    {
        if (_Py_atomic_load_relaxed(&Handlers[SIGINT].tripped)) {
            if (!thread_can_handle_signals()) {
                return 0;
            }
            _Py_atomic_store_relaxed(&Handlers[SIGINT].tripped, 0);
            return 1;
        }
        return 0;
    }
    ---
    
    After:
    ---                
    static inline int 
    _Py_IsMainInterpreter(PyThreadState* tstate)
    {                               
        /* Use directly _PyRuntime rather than tstate->interp->runtime, since 
           this function is used in performance critical code path (ceval) */ 
        return (tstate->interp == _PyRuntime.interpreters.main);
    }
    
    static inline int                                                       
    _Py_ThreadCanHandleSignals(PyThreadState *tstate)                          
    {                                                                              
        return (_Py_IsMainThread() && _Py_IsMainInterpreter(tstate));
    }
    
    int
    PyOS_InterruptOccurred(void)
    {
        PyThreadState *tstate = _PyThreadState_GET();
        if (!_Py_ThreadCanHandleSignals(tstate)) {
            return 0;
        }
    
        if (!_Py_atomic_load_relaxed(&Handlers[SIGINT].tripped)) {
            return 0;
        }
    
        _Py_atomic_store_relaxed(&Handlers[SIGINT].tripped, 0);
        return 1;
    }

    The difference is that tstate->interp is now checked *before* checking if SIGINT signal is tripped.

    @vstinner
    Copy link
    Member

    vstinner commented Jun 1, 2020

    So PyOS_InterruptOccurred() must be called with the GIL held since 3.8, it's just that the nobody noticed the bug before.

    If SIGGINT is tripped and the GIL is released, PyOS_InterruptOccurred() does also crash in Python 3.8.

    --

    One way to see the bug in Python 3.8 without having to trip SIGINT.

    Apply this patch:

    diff --git a/Modules/signalmodule.c b/Modules/signalmodule.c
    index 0c9a2671fe..b850af3163 100644
    --- a/Modules/signalmodule.c
    +++ b/Modules/signalmodule.c
    @@ -1729,11 +1729,12 @@ PyOS_FiniInterrupts(void)
     int
     PyOS_InterruptOccurred(void)
     {
    +    _PyRuntimeState *runtime = &_PyRuntime;
    +    if (!is_main(runtime)) {
    +        return 0;
    +    }
    +
         if (_Py_atomic_load_relaxed(&Handlers[SIGINT].tripped)) {
    -        _PyRuntimeState *runtime = &_PyRuntime;
    -        if (!is_main(runtime)) {
    -            return 0;
    -        }
             _Py_atomic_store_relaxed(&Handlers[SIGINT].tripped, 0);
             return 1;
         }
    $ make
    $ find -name "*readline*so" -delete
    $ ./python
    Python 3.8.3+ (heads/3.8-dirty:00a240bf7f, Jun  1 2020, 17:24:09) 
    [GCC 10.1.1 20200507 (Red Hat 10.1.1-1)] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import os; os.close(0)
    >>>
    Erreur de segmentation (core dumped)

    Reproduce the bug in Python 3.8 without modifying the code, using gdb to trigger events SIGINT at the right place:

    $ gdb ./python
    (gdb) b my_fgets
    Breakpoint 1 at 0x68b941: file Parser/myreadline.c, line 39.

    (gdb) run
    Python 3.8.3+ (heads/3.8:00a240bf7f, Jun 1 2020, 17:27:24)
    [GCC 10.1.1 20200507 (Red Hat 10.1.1-1)] on linux
    Type "help", "copyright", "credits" or "license" for more information.

    >>

    Breakpoint 1, my_fgets (...)
    (gdb) p (void)close(0)

    (gdb) delete 1

    (gdb) signal SIGINT
    Continuing with signal SIGINT.

    Program received signal SIGSEGV, Segmentation fault.
    is_main (runtime=0x7ff6c0 <_PyRuntime>) at ./Modules/signalmodule.c:193
    193 PyInterpreterState *interp = _PyRuntimeState_GetThreadState(runtime)->interp;

    @vstinner
    Copy link
    Member

    vstinner commented Jun 1, 2020

    I add "Python 3.8" version: it's also affected.

    @vstinner vstinner added 3.8 only security fixes labels Jun 1, 2020
    @vstinner
    Copy link
    Member

    vstinner commented Jun 1, 2020

    New changeset cbe1296 by Victor Stinner in branch 'master':
    bpo-40826: PyOS_InterruptOccurred() requires GIL (GH-20578)
    cbe1296

    @vstinner
    Copy link
    Member

    vstinner commented Jun 1, 2020

    See also bpo-40839: Disallow calling PyDict_GetItem() with the GIL released.

    @vstinner
    Copy link
    Member

    vstinner commented Jun 1, 2020

    New changeset c353764 by Victor Stinner in branch 'master':
    bpo-40826: Fix GIL usage in PyOS_Readline() (GH-20579)
    c353764

    @vstinner
    Copy link
    Member

    vstinner commented Jun 3, 2020

    New changeset fa7ab6a by Victor Stinner in branch 'master':
    bpo-40826: Add _PyOS_InterruptOccurred(tstate) function (GH-20599)
    fa7ab6a

    @vstinner
    Copy link
    Member

    vstinner commented Jun 3, 2020

    New changeset 5d2396c by Victor Stinner in branch '3.9':
    [3.9] bpo-40826: Fix GIL usage in PyOS_Readline() (GH-20613)
    5d2396c

    @vstinner
    Copy link
    Member

    vstinner commented Jun 3, 2020

    New changeset 6f7346b by Victor Stinner in branch '3.8':
    [3.9] bpo-40826: Fix GIL usage in PyOS_Readline() (GH-20613) (GH-20616)
    6f7346b

    @vstinner
    Copy link
    Member

    vstinner commented Jun 3, 2020

    New changeset 6d62dc1 by Victor Stinner in branch '3.9':
    [3.9] bpo-40826: PyOS_InterruptOccurred() requires GIL (GH-20578) (GH-20618)
    6d62dc1

    @vstinner
    Copy link
    Member

    vstinner commented Jun 3, 2020

    Ok, I fixed bugs in the REPL on Linux and Windows in 3.8, 3.9 and master branches.

    I also modified PyOS_InterruptOccurred() to fail with a fatal error and a nice error message, rather than crashing with a core dump, if it's called with the GIL released. I changed in 3.9 and master branches.

    Thanks Jelle Zijlstra for the bug report! I'm now closing the issue.

    @db3l
    Copy link
    Contributor

    db3l commented Jun 4, 2020

    I haven't had a chance to look too deeply, but the final set of commits (starting with fa7ab6a) appear to be the common thread with all branches now failing with timeout exceptions in test_repl on the Windows 10 buildbot.

    The first instance was in the 3.x branch (https://buildbot.python.org/all/#/builders/129/builds/1191) but they all seem similar.

    -- David

    @db3l
    Copy link
    Contributor

    db3l commented Jun 4, 2020

    It appears the recent commit is causing a CRT exception dialog in test_close_stdin (test_repl.TestInteractiveInterpreter). The dialog can't get answered, which is what leads to the eventual timeout.

    The assertion is "_osfile(fh) & FOPEN" from osfinfo.cpp on line 258. It actually appears twice (if the first is ignored). If both are ignored the test passes.

    I was somewhat surprised to see the dialog, as I believe CRT exception dialogs are disabled during test runs (others are present in test logs). Perhaps the child interpreter used by test_repl doesn't inherit that behavior. If so, this was probably always a risk, there just weren't any assertions occurring before this in the child interpreter.

    @eryksun
    Copy link
    Contributor

    eryksun commented Jun 4, 2020

    Perhaps the child interpreter used by test_repl doesn't inherit
    that behavior.

    The OS error mode is inherited, unless the child is created with CREATE_DEFAULT_ERROR_MODE flag. But the CRT error mode isn't inherited. (For the spawn* family, in principle the CRT could pass its error modes in reserved STARTUPINFO fields, like it does for file descriptors, but it doesn't do that, and subprocess wouldn't be privy to that protocol anyway.)

    For example:

        >>> import sys, subprocess
        >>> from msvcrt import *
        >>> SEM_FAILCRITICALERRORS, CRTDBG_MODE_FILE, CRTDBG_MODE_WNDW
        (1, 1, 4)
    
        >>> SetErrorMode(SEM_FAILCRITICALERRORS)
        0
        >>> CrtSetReportMode(CRT_ERROR, CRTDBG_MODE_FILE)
        4

    The return values are the previous values, which are respectively 0 (default) for the OS and CRTDBG_MODE_WNDW (message box) for the CRT. Now check the initial values in a child process:

    >>> subprocess.call(f'{sys.executable} -q')
    
        >>> from msvcrt import *
        >>> SetErrorMode(SEM_FAILCRITICALERRORS)
        1
        >>> CrtSetReportMode(CRT_ERROR, CRTDBG_MODE_FILE)
        4

    The OS error mode of the parent was inherited, but the parent's CRT error mode was not.

    libregrtest has a convenient function to suppress error reporting in the child. For example:

    >>> import sys, subprocess
    >>> subprocess.call(f'{sys.executable} -q')
    
        >>> import os
        >>> from test.libregrtest import setup
        >>> setup.suppress_msvcrt_asserts(0)
        >>> os.close(0)
        >>>
        0

    @vstinner
    Copy link
    Member

    vstinner commented Jun 4, 2020

    I am confused. Code which uses the closed file descriptor 0 uses "_Py_BEGIN_SUPPRESS_IPH":

        _Py_BEGIN_SUPPRESS_IPH
        hStdIn = (HANDLE)_get_osfhandle(fileno(sys_stdin));
        hStdErr = (HANDLE)_get_osfhandle(fileno(stderr));
        _Py_END_SUPPRESS_IPH
    

    and

    _Py_BEGIN_SUPPRESS_IPH
    handle = (HANDLE)_get_osfhandle(fileno(fp));
    _Py_END_SUPPRESS_IPH
    

    Macros defined as:

    extern _invalid_parameter_handler _Py_silent_invalid_parameter_handler;
    #define _Py_BEGIN_SUPPRESS_IPH { _invalid_parameter_handler _Py_old_handler = \
        _set_thread_local_invalid_parameter_handler(_Py_silent_invalid_parameter_handler);
    #define _Py_END_SUPPRESS_IPH _set_thread_local_invalid_parameter_handler(_Py_old_handler); }
    with PC/invalid_parameter_handler.c:

    static void __cdecl _silent_invalid_parameter_handler(
    wchar_t const* expression,
    wchar_t const* function,
    wchar_t const* file,
    unsigned int line,
    uintptr_t pReserved) { }

    _invalid_parameter_handler _Py_silent_invalid_parameter_handler = _silent_invalid_parameter_handler;

    The purpose of _Py_BEGIN_SUPPRESS_IPH is to suppress such popup, no?

    @vstinner vstinner reopened this Jun 4, 2020
    @vstinner vstinner reopened this Jun 4, 2020
    @eryksun
    Copy link
    Contributor

    eryksun commented Jun 4, 2020

    The purpose of _Py_BEGIN_SUPPRESS_IPH is to suppress such popup, no?

    That macro suppresses the CRT's invalid parameter handler, which by default terminates abnormally with a fastfail (status code 0xC0000409), even in a release build.

    In a debug build, we still have message boxes from failed asserts that call _CrtDbgReportW. For example, the following stack trace shows _CrtDbgReportW called from _get_osfhandle:

    Call Site
    win32u!NtUserWaitMessage
    user32!DialogBox2
    user32!InternalDialogBox
    user32!SoftModalMessageBox
    user32!MessageBoxWorker
    user32!MessageBoxTimeoutW
    user32!MessageBoxW
    ucrtbased!__acrt_MessageBoxW
    ucrtbased!__crt_char_traits<wchar_t>::message_box<HWND__ * __ptr64,
                        wchar_t const * __ptr64 const & __ptr64,
                        wchar_t const * __ptr64 const & __ptr64,
                        unsigned int const & __ptr64>
    ucrtbased!common_show_message_box<wchar_t>
    ucrtbased!__acrt_show_wide_message_box
    ucrtbased!common_message_window<wchar_t>
    ucrtbased!__acrt_MessageWindowW
    ucrtbased!_VCrtDbgReportW
    ucrtbased!_CrtDbgReportW
    ucrtbased!_get_osfhandle
    python310_d!PyOS_StdioReadline
    

    _get_osfhandle validates that the fd is open via

    _VALIDATE_CLEAR_OSSERR_RETURN(_osfile(fh) & FOPEN, EBADF, -1);
    

    which uses the following macro:

        #define _VALIDATE_CLEAR_OSSERR_RETURN(expr, errorcode, retexpr)                \
            {                                                                          \
                int _Expr_val=!!(expr);                                                \
                _ASSERT_EXPR((_Expr_val), _CRT_WIDE(#expr));                           \
                if (!(_Expr_val))                                                      \
                {                                                                      \
                    _doserrno = 0L;                                                    \
                    errno = errorcode;                                                 \
                    _INVALID_PARAMETER(_CRT_WIDE(#expr));                              \
                    return retexpr;                                                    \
                }                                                                      \
            }

    In a debug build, _ASSERT_EXPR expands as:

        #define _ASSERT_EXPR(expr, msg) \
            (void)(                                                                                     \
                (!!(expr)) ||                                                                           \
                (1 != _CrtDbgReportW(_CRT_ASSERT, _CRT_WIDE(__FILE__), __LINE__, NULL, L"%ls", msg)) || \
                (_CrtDbgBreak(), 0)                                                                     \
            )

    which is where _CrtDbgReportW gets called. By default it reports the assertion failure with a message box. The suppression of this in libregrtest either ignores this report or sends it to stderr:

        for m in [msvcrt.CRT_WARN, msvcrt.CRT_ERROR, msvcrt.CRT_ASSERT]:
            if verbose:
                msvcrt.CrtSetReportMode(m, msvcrt.CRTDBG_MODE_FILE)
                msvcrt.CrtSetReportFile(m, msvcrt.CRTDBG_FILE_STDERR)
            else:
                msvcrt.CrtSetReportMode(m, 0)

    msvcrt.CrtSetReportMode and msvcrt.CrtSetReportFile only exist in a debug build.

    @db3l
    Copy link
    Contributor

    db3l commented Jun 10, 2020

    So a script I use on the buildbot intended to prevent hanging on assertions was failing to work in this particular case, which I've corrected. That changes the failure mode for new Win10 buildbot runs.

    The test in question (test_close_stdin) still fails, but does so immediately (return code not 0, as the child process aborts) rather than timing out.

    If the assertion is expected, and something to be ignored, the test itself still needs to handle that.

    @vstinner
    Copy link
    Member

    New changeset f6e58ae by Victor Stinner in branch 'master':
    bpo-40826: Fix test_repl.test_close_stdin() on Windows (GH-20779)
    f6e58ae

    @vstinner
    Copy link
    Member

    Thanks for the bug report David Bolen and thanks for the analysis Eryk Sun. The bug should be be fixed in the master branch, and backports to 3.8 and 3.9 are coming.

    @vstinner
    Copy link
    Member

    New changeset 4a4f660 by Victor Stinner in branch '3.9':
    bpo-40826: Fix test_repl.test_close_stdin() on Windows (GH-20779) (GH-20785)
    4a4f660

    @vstinner
    Copy link
    Member

    New changeset c7a6c7b by Victor Stinner in branch '3.8':
    bpo-40826: Fix test_repl.test_close_stdin() on Windows (GH-20779) (GH-20785) (GH-20787)
    c7a6c7b

    @eryksun
    Copy link
    Contributor

    eryksun commented Jun 10, 2020

    Why doesn't SuppressCrashReport suppress the hard error dialog (i.e. SEM_FAILCRITICALERRORS)? This dialog gets created by the NtRaiseHardError system call. For example:

    >>> NtRaiseHardError = ctypes.windll.ntdll.NtRaiseHardError
    >>> response = (ctypes.c_ulong * 1)()
    

    With the default OS error mode, the following raises a hard error dialog for STATUS_UNSUCCESSFUL (0xC0000001), with abort/retry/ignore options:

        >>> NtRaiseHardError(0xC000_0001, 0, 0, None, 0, response)
        0
        >>> response[0]
        2

    The response value is limited to abort (2), retry (7), ignore (4) -- or simply returned (0) when the process hard error mode is disabled:

        >>> msvcrt.SetErrorMode(msvcrt.SEM_FAILCRITICALERRORS)
        0
        >>> NtRaiseHardError(0xC000_0001, 0, 0, None, 0, response)
        0
        >>> response[0]
        0

    NtRaiseHardError also checks for an error-mode override flag (0x1000_0000) in the status code, which is used in cases such as WinAPI FatalAppExitW. For example, the following will raise the dialog regardless of the hard error mode:

        >>> NtRaiseHardError(0xC000_0001 | 0x1000_0000, 0, 0, None, 0, response)
        0
        >>> response[0]
        2

    @vstinner
    Copy link
    Member

    Why doesn't SuppressCrashReport suppress the hard error dialog (i.e. SEM_FAILCRITICALERRORS)? This dialog gets created by the NtRaiseHardError system call.

    Would you mind to open a separated issue?

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes 3.9 only security fixes 3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants