classification
Title: PyOS_InterruptOccurred() now requires to hold the GIL: PyOS_Readline() crash
Type: crash Stage: resolved
Components: Interpreter Core Versions: Python 3.10, Python 3.9, Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Jelle Zijlstra, benjamin.peterson, corona10, db3l, eryksun, remi.lapeyre, serhiy.storchaka, stutzbach, vstinner
Priority: normal Keywords: 3.9regression, patch

Created on 2020-05-30 14:19 by Jelle Zijlstra, last changed 2020-06-10 20:13 by vstinner. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 20549 closed corona10, 2020-05-31 04:58
PR 20571 merged vstinner, 2020-06-01 13:29
PR 20578 merged vstinner, 2020-06-01 18:13
PR 20579 merged vstinner, 2020-06-01 18:32
PR 20599 merged vstinner, 2020-06-02 14:02
PR 20613 merged vstinner, 2020-06-03 15:08
PR 20616 merged vstinner, 2020-06-03 15:55
PR 20618 merged vstinner, 2020-06-03 16:36
PR 20779 merged vstinner, 2020-06-10 14:25
PR 20785 merged vstinner, 2020-06-10 16:58
PR 20787 merged vstinner, 2020-06-10 17:27
Messages (31)
msg370386 - (view) Author: Jelle Zijlstra (Jelle Zijlstra) * (Python triager) Date: 2020-05-30 14:19
$ gdb ./python 
...
(gdb) r
...
Python 3.9.0b1 (tags/v3.9.0b1:97fe9cfd9f8, May 30 2020, 05:26:48) 
...
>>> import io
>>> io.FileIO(0).name
0
>>> 
Program received signal SIGSEGV, Segmentation fault.
_PyInterpreterState_GET () at ./Include/internal/pycore_pystate.h:100
100         return tstate->interp;
(gdb) bt
#0  _PyInterpreterState_GET () at ./Include/internal/pycore_pystate.h:100
#1  PyOS_InterruptOccurred () at ./Modules/signalmodule.c:1785
#2  0x0000000000673905 in my_fgets (buf=buf@entry=0xa40780 "8LJ\367\377\177", len=len@entry=100, fp=fp@entry=0x7ffff74a48e0 <_IO_2_1_stdin_>) at Parser/myreadline.c:87
#3  0x000000000067397b in PyOS_StdioReadline (sys_stdin=sys_stdin@entry=0x7ffff74a48e0 <_IO_2_1_stdin_>, sys_stdout=sys_stdout@entry=0x7ffff74a5620 <_IO_2_1_stdout_>, 
    prompt=prompt@entry=0x7ffff6f8d8e0 ">>> ") at Parser/myreadline.c:269
#4  0x0000000000673b4f in PyOS_Readline (sys_stdin=0x7ffff74a48e0 <_IO_2_1_stdin_>, sys_stdout=0x7ffff74a5620 <_IO_2_1_stdout_>, prompt=0x7ffff6f8d8e0 ">>> ") at Parser/myreadline.c:355
#5  0x00000000005d90d9 in tok_nextc (tok=0xa3fd30) at Parser/tokenizer.c:856
#6  0x00000000005dad02 in tok_get (tok=tok@entry=0xa3fd30, p_start=p_start@entry=0x7fffffffd508, p_end=p_end@entry=0x7fffffffd510) at Parser/tokenizer.c:1194
#7  0x00000000005dcb69 in PyTokenizer_Get (tok=0xa3fd30, p_start=p_start@entry=0x7fffffffd508, p_end=p_end@entry=0x7fffffffd510) at Parser/tokenizer.c:1842
#8  0x0000000000653c73 in _PyPegen_fill_token (p=0x7ffff6f8f540) at Parser/pegen/pegen.c:551
#9  0x0000000000670355 in statement_newline_rule (p=0x7ffff6f8f540) at Parser/pegen/parse.c:1143
#10 interactive_rule (p=0x7ffff6f8f540) at Parser/pegen/parse.c:725
#11 _PyPegen_parse (p=p@entry=0x7ffff6f8f540) at Parser/pegen/parse.c:19388
#12 0x0000000000654b32 in _PyPegen_run_parser (p=0x7ffff6f8f540) at Parser/pegen/pegen.c:1037
#13 0x0000000000654e4c in _PyPegen_run_parser_from_file_pointer (fp=fp@entry=0x70f74a48e0, start_rule=start_rule@entry=80, filename_ob=filename_ob@entry=0x7ffff6f84eb0, 
    enc=enc@entry=0x7ffff704ec60 "utf-8", ps1=ps1@entry=0x7ffff6f8d8e0 ">>> ", ps2=ps2@entry=0x90000000d0 <error: Cannot access memory at address 0x90000000d0>, flags=0x7fffffffd7b8, 
    errcode=0x7fffffffd6e4, arena=0x7ffff6ff6b30) at Parser/pegen/pegen.c:1097
#14 0x00000000005d6bea in PyPegen_ASTFromFileObject (fp=0x70f74a48e0, fp@entry=0x7ffff74a48e0 <_IO_2_1_stdin_>, filename_ob=filename_ob@entry=0x7ffff6f84eb0, mode=80, mode@entry=256, 
    enc=enc@entry=0x7ffff704ec60 "utf-8", ps1=ps1@entry=0x7ffff6f8d8e0 ">>> ", ps2=0x90000000d0 <error: Cannot access memory at address 0x90000000d0>, ps2@entry=0x7ffff6f8dbe0 "... ", 
    flags=<optimized out>, errcode=<optimized out>, arena=<optimized out>) at Parser/pegen/peg_api.c:52
#15 0x00000000005460d9 in PyRun_InteractiveOneObjectEx (fp=fp@entry=0x7ffff74a48e0 <_IO_2_1_stdin_>, filename=filename@entry=0x7ffff6f84eb0, flags=flags@entry=0x7fffffffd7b8) at Python/pythonrun.c:243
#16 0x000000000054631e in PyRun_InteractiveLoopFlags (fp=fp@entry=0x7ffff74a48e0 <_IO_2_1_stdin_>, filename_str=filename_str@entry=0x673e32 "<stdin>", flags=flags@entry=0x7fffffffd7b8)
    at Python/pythonrun.c:122
#17 0x0000000000546d4c in PyRun_AnyFileExFlags (fp=0x7ffff74a48e0 <_IO_2_1_stdin_>, filename=0x673e32 "<stdin>", closeit=0, flags=0x7fffffffd7b8) at Python/pythonrun.c:81
#18 0x0000000000429fb7 in pymain_run_stdin (cf=0x7fffffffd7b8, config=0x9bd800) at Modules/main.c:467
#19 pymain_run_python (exitcode=0x7fffffffd7b0) at Modules/main.c:556
#20 Py_RunMain () at Modules/main.c:632
#21 0x000000000042a2d6 in pymain_main (args=0x7fffffffd8a0) at Modules/main.c:662
#22 Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:686
#23 0x00007ffff7100830 in __libc_start_main (main=0x41ef30 <main>, argc=1, argv=0x7fffffffd9f8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffd9e8)
    at ../csu/libc-start.c:291
#24 0x0000000000429089 in _start ()
(gdb) 


Same happens with Python 3.9.0b1+ (heads/3.9:588efc29c5d, May 30 2020, 14:16:10) (current HEAD of the 3.9 branch).

In previous versions of Python this would exit the interpreter but not segfault.
msg370387 - (view) Author: Dong-hee Na (corona10) * (Python committer) Date: 2020-05-30 15:01
Thanks for report, I can reproduce it
msg370388 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-05-30 16:11
Simpler reproducer:

import os
os.close(0)
msg370392 - (view) Author: Dong-hee Na (corona10) * (Python committer) Date: 2020-05-30 17:16
FYI this change fix this issue.

--- a/Modules/signalmodule.c
+++ b/Modules/signalmodule.c
@@ -1782,7 +1782,11 @@ PyOS_FiniInterrupts(void)
 int
 PyOS_InterruptOccurred(void)
 {
-    PyInterpreterState *interp = _PyInterpreterState_GET();
+    PyThreadState *tstate = _PyThreadState_GET();
+    if (!tstate) {
+        return 0;
+    }
+    PyInterpreterState *interp = tstate->interp;
     if (!_Py_ThreadCanHandleSignals(interp)) {
         return 0;
     }
msg370549 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-01 12:53
The following change modified PyOS_InterruptOccurred() to require the hold the GIL:

commit d83168854e19d0381fa57db25fca6c622917624f
Author: Victor Stinner <vstinner@python.org>
Date:   Fri Mar 20 14:50:35 2020 +0100

    bpo-40010: Optimize pending calls in multithreaded applications (GH-19091)
    
    If a thread different than the main thread schedules a pending call
    (Py_AddPendingCall()), the bytecode evaluation loop is no longer
    interrupted at each bytecode instruction to check for pending calls
    which cannot be executed. Only the main thread can execute pending
    calls.
    
    Previously, the bytecode evaluation loop was interrupted at each
    instruction until the main thread executes pending calls.
    
    * Add _Py_ThreadCanHandlePendingCalls() function.
    * SIGNAL_PENDING_CALLS() now only sets eval_breaker to 1 if the
      current thread can execute pending calls. Only the main thread can
      execute pending calls.

--

PyOS_InterruptOccurred() is part of the limited C API, but it's not even documented :-( So it's not easy to say if it's a backward incompatible change. At least, as the author of the change, I can say that the change was deliberate.

bpo-40010 rationale is that only the main thread of the main interpreter must call Python signal handlers. So when another thread or another interpreter (running the main thread) asks "should the Python signal handlers be executed", the answer is always "no", even if yes, Python got a signal (which requires to execute at least one Python signal handler).

The change is that PyOS_InterruptOccurred() now requires to hold the GIL to get current Python thread state: _PyThreadState_GET() returns NULL when the GIL is released.

Modifying PyOS_InterruptOccurred() to always return 0 if the GIL is released (if _PyThreadState_GET() returns NULL) is wrong: if the function is called by the main thread of the main interpreter, it must return 1.

--

By the way, I rewrote the C signal handler in Python 3.9 to not require to get the current Python thread state. In Python 3.8, it was modified and started to require the current Python thread if writing into the wakeup file descriptor failed to schedule a "pending call".

commit b54a99d6432de93de85be2b42a63774f8b4581a0
Author: Victor Stinner <vstinner@python.org>
Date:   Wed Apr 8 23:35:05 2020 +0200

    bpo-40082: trip_signal() uses the main interpreter (GH-19441)
    
    Fix the signal handler: it now always uses the main interpreter,
    rather than trying to get the current Python thread state.
    
    The following function now accepts an interpreter, instead of a
    Python thread state:
    
    * _PyEval_SignalReceived()
    * _Py_ThreadCanHandleSignals()
    * _PyEval_AddPendingCall()
    * COMPUTE_EVAL_BREAKER()
    * SET_GIL_DROP_REQUEST(), RESET_GIL_DROP_REQUEST()
    * SIGNAL_PENDING_CALLS(), UNSIGNAL_PENDING_CALLS()
    * SIGNAL_PENDING_SIGNALS(), UNSIGNAL_PENDING_SIGNALS()
    * SIGNAL_ASYNC_EXC(), UNSIGNAL_ASYNC_EXC()
    
    Py_AddPendingCall() now uses the main interpreter if it fails to the
    current Python thread state.
    
    Convert _PyThreadState_GET() and PyInterpreterState_GET_UNSAFE()
    macros to static inline functions.
msg370550 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-01 13:00
> PyOS_InterruptOccurred() now requires to hold the GIL

This change impacted gdb:

* https://sourceware.org/pipermail/gdb-patches/2020-May/169110.html
* https://bugzilla.redhat.com/show_bug.cgi?id=1829702

gdbpy_check_quit_flag() called PyOS_InterruptOccurred() without holding the GIL. It worked in Python 3.8, but started to crash in Python 3.9 beta1.

gdb had another issue, it started by always releasing the GIL and always called the Python C API with the GIL released (if I understood correctly).

gdb was fixed by calling PyGILState_Ensure()/PyGILState_Release() when calling the Python C API, especially PyOS_InterruptOccurred().

Maybe the minimum would be to *document* that the GIL must be held to call PyOS_InterruptOccurred(), and that it's not a new requirements.
msg370552 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-01 14:02
New changeset 3026cad59b87751a9215111776cac8e819458fce by Victor Stinner in branch 'master':
bpo-40826: Add _Py_EnsureTstateNotNULL() macro (GH-20571)
https://github.com/python/cpython/commit/3026cad59b87751a9215111776cac8e819458fce
msg370553 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-01 15:03
I can reproduce PyOS_InterruptOccurred() crash in Python 3.8 if I remove readline.cpython-38d-x86_64-linux-gnu.so and I disable EINTR error checking in my_fgets():

diff --git a/Parser/myreadline.c b/Parser/myreadline.c
index 43e5583b8b..2712dedacd 100644
--- a/Parser/myreadline.c
+++ b/Parser/myreadline.c
@@ -73,7 +73,7 @@ my_fgets(char *buf, int len, FILE *fp)
             clearerr(fp);
             return -1; /* EOF */
         }
-#ifdef EINTR
+#if 0
         if (err == EINTR) {
             int s;
             PyEval_RestoreThread(_PyOS_ReadlineTState);


vstinner@apu$ ./python
Python 3.8.3+ (heads/3.8-dirty:00a240bf7f, Jun  1 2020, 17:00:22) 
[GCC 10.1.1 20200507 (Red Hat 10.1.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 
^C
Erreur de segmentation (core dumped)


I cannot reproduce the issue in Python 3.7.

---

Python 3.7:
---
int
PyOS_InterruptOccurred(void)
{
    if (_Py_atomic_load_relaxed(&Handlers[SIGINT].tripped)) {
        if (PyThread_get_thread_ident() != main_thread)
            return 0;
        _Py_atomic_store_relaxed(&Handlers[SIGINT].tripped, 0);
        return 1;
    }
    return 0;
}
---

Python 3.8:
---
static int
is_main(_PyRuntimeState *runtime)
{
    unsigned long thread = PyThread_get_thread_ident();
    PyInterpreterState *interp = _PyRuntimeState_GetThreadState(runtime)->interp;
    return (thread == runtime->main_thread
            && interp == runtime->interpreters.main);
}

int
PyOS_InterruptOccurred(void)
{
    if (_Py_atomic_load_relaxed(&Handlers[SIGINT].tripped)) {
        _PyRuntimeState *runtime = &_PyRuntime;
        if (!is_main(runtime)) {
            return 0;
        }
        _Py_atomic_store_relaxed(&Handlers[SIGINT].tripped, 0);
        return 1;
    }
    return 0;
}
---

is_main() function was added in Python 3.8 by:

commit 64d6cc826dacebc2493b1bb5e8cb97828eb76f81
Author: Eric Snow <ericsnowcurrently@gmail.com>
Date:   Sat Feb 23 15:40:43 2019 -0700

    bpo-35724: Explicitly require the main interpreter for signal-handling. (GH-11530)
    
    Ensure that the main interpreter is active (in the main thread) for signal-handling operations. This is increasingly relevant as people use subinterpreters more.
    
    https://bugs.python.org/issue35724
msg370554 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-01 15:23
It seems like this issue was introduced by this change:

commit 728189884e0e128c4ffc57b785b04584d57a90c0
Author: Victor Stinner <vstinner@python.org>
Date:   Thu Mar 26 22:28:11 2020 +0100

    bpo-38644: Pass tstate explicitly in signalmodule.c (GH-19184)
    
    PyOS_InterruptOccurred() now checks _Py_ThreadCanHandleSignals()
    before checking if SIGINT is tripped.

 Include/internal/pycore_pyerrors.h |   2 +
 Modules/signalmodule.c             | 152 ++++++++++++++++++++++---------------
 Python/ceval.c                     |   4 +-
 3 files changed, 93 insertions(+), 65 deletions(-)

Before:
---
static inline int
_Py_IsMainInterpreter(PyThreadState* tstate)
{   
    /* Use directly _PyRuntime rather than tstate->interp->runtime, since
       this function is used in performance critical code path (ceval) */
    return (tstate->interp == _PyRuntime.interpreters.main);
}   

static inline int                          
_Py_ThreadCanHandleSignals(PyThreadState *tstate)
{        
    return (_Py_IsMainThread() && _Py_IsMainInterpreter(tstate));
}   
                    
static int                                 
thread_can_handle_signals(void)
{        
    PyThreadState *tstate = _PyThreadState_GET();              
    return _Py_ThreadCanHandleSignals(tstate);
}    

int
PyOS_InterruptOccurred(void)
{
    if (_Py_atomic_load_relaxed(&Handlers[SIGINT].tripped)) {
        if (!thread_can_handle_signals()) {
            return 0;
        }
        _Py_atomic_store_relaxed(&Handlers[SIGINT].tripped, 0);
        return 1;
    }
    return 0;
}
---

After:
---                
static inline int 
_Py_IsMainInterpreter(PyThreadState* tstate)
{                               
    /* Use directly _PyRuntime rather than tstate->interp->runtime, since 
       this function is used in performance critical code path (ceval) */ 
    return (tstate->interp == _PyRuntime.interpreters.main);
}

static inline int                                                       
_Py_ThreadCanHandleSignals(PyThreadState *tstate)                          
{                                                                              
    return (_Py_IsMainThread() && _Py_IsMainInterpreter(tstate));
}

int
PyOS_InterruptOccurred(void)
{
    PyThreadState *tstate = _PyThreadState_GET();
    if (!_Py_ThreadCanHandleSignals(tstate)) {
        return 0;
    }

    if (!_Py_atomic_load_relaxed(&Handlers[SIGINT].tripped)) {
        return 0;
    }

    _Py_atomic_store_relaxed(&Handlers[SIGINT].tripped, 0);
    return 1;
}
---

The difference is that tstate->interp is now checked *before* checking if SIGINT signal is tripped.
msg370555 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-01 15:36
So PyOS_InterruptOccurred() must be called with the GIL held since 3.8, it's just that the nobody noticed the bug before.

If SIGGINT is tripped and the GIL is released, PyOS_InterruptOccurred() does also crash in Python 3.8.

--

One way to see the bug in Python 3.8 without having to trip SIGINT.

Apply this patch:

diff --git a/Modules/signalmodule.c b/Modules/signalmodule.c
index 0c9a2671fe..b850af3163 100644
--- a/Modules/signalmodule.c
+++ b/Modules/signalmodule.c
@@ -1729,11 +1729,12 @@ PyOS_FiniInterrupts(void)
 int
 PyOS_InterruptOccurred(void)
 {
+    _PyRuntimeState *runtime = &_PyRuntime;
+    if (!is_main(runtime)) {
+        return 0;
+    }
+
     if (_Py_atomic_load_relaxed(&Handlers[SIGINT].tripped)) {
-        _PyRuntimeState *runtime = &_PyRuntime;
-        if (!is_main(runtime)) {
-            return 0;
-        }
         _Py_atomic_store_relaxed(&Handlers[SIGINT].tripped, 0);
         return 1;
     }

$ make
$ find -name "*readline*so" -delete
$ ./python
Python 3.8.3+ (heads/3.8-dirty:00a240bf7f, Jun  1 2020, 17:24:09) 
[GCC 10.1.1 20200507 (Red Hat 10.1.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os; os.close(0)
>>>
Erreur de segmentation (core dumped)

---

Reproduce the bug in Python 3.8 without modifying the code, using gdb to trigger events SIGINT at the right place:

$ gdb ./python
(gdb) b my_fgets
Breakpoint 1 at 0x68b941: file Parser/myreadline.c, line 39.

(gdb) run
Python 3.8.3+ (heads/3.8:00a240bf7f, Jun  1 2020, 17:27:24) 
[GCC 10.1.1 20200507 (Red Hat 10.1.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 

Breakpoint 1, my_fgets (...)
(gdb) p (void)close(0)

(gdb) delete 1

(gdb) signal SIGINT
Continuing with signal SIGINT.

Program received signal SIGSEGV, Segmentation fault.
is_main (runtime=0x7ff6c0 <_PyRuntime>) at ./Modules/signalmodule.c:193
193	    PyInterpreterState *interp = _PyRuntimeState_GetThreadState(runtime)->interp;
msg370556 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-01 15:38
I add "Python 3.8" version: it's also affected.
msg370569 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-01 18:34
New changeset cbe129692293251e7fbcea9ff0d822824d90c140 by Victor Stinner in branch 'master':
bpo-40826: PyOS_InterruptOccurred() requires GIL (GH-20578)
https://github.com/python/cpython/commit/cbe129692293251e7fbcea9ff0d822824d90c140
msg370575 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-01 18:57
See also bpo-40839: Disallow calling PyDict_GetItem() with the GIL released.
msg370576 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-01 18:59
New changeset c353764fd564e401cf47a5d9efab18c72c60014e by Victor Stinner in branch 'master':
bpo-40826: Fix GIL usage in PyOS_Readline() (GH-20579)
https://github.com/python/cpython/commit/c353764fd564e401cf47a5d9efab18c72c60014e
msg370661 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-03 12:40
New changeset fa7ab6aa0f9a4f695e5525db5a113cd21fa93787 by Victor Stinner in branch 'master':
bpo-40826: Add _PyOS_InterruptOccurred(tstate) function (GH-20599)
https://github.com/python/cpython/commit/fa7ab6aa0f9a4f695e5525db5a113cd21fa93787
msg370676 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-03 15:49
New changeset 5d2396c8cf68fba0a949c6ce474a505e3aba9c1f by Victor Stinner in branch '3.9':
[3.9] bpo-40826: Fix GIL usage in PyOS_Readline() (GH-20613)
https://github.com/python/cpython/commit/5d2396c8cf68fba0a949c6ce474a505e3aba9c1f
msg370679 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-03 16:28
New changeset 6f7346bb3983cd7a6aa97eeeafffb3cecd5292b8 by Victor Stinner in branch '3.8':
[3.9] bpo-40826: Fix GIL usage in PyOS_Readline() (GH-20613) (GH-20616)
https://github.com/python/cpython/commit/6f7346bb3983cd7a6aa97eeeafffb3cecd5292b8
msg370682 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-03 18:17
New changeset 6d62dc1ea4e191b8486e80a85ca0694215375424 by Victor Stinner in branch '3.9':
[3.9] bpo-40826: PyOS_InterruptOccurred() requires GIL (GH-20578) (GH-20618)
https://github.com/python/cpython/commit/6d62dc1ea4e191b8486e80a85ca0694215375424
msg370683 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-03 18:18
Ok, I fixed bugs in the REPL on Linux and Windows in 3.8, 3.9 and master branches.

I also modified PyOS_InterruptOccurred() to fail with a fatal error and a nice error message, rather than crashing with a core dump, if it's called with the GIL released. I changed in 3.9 and master branches.

Thanks Jelle Zijlstra for the bug report! I'm now closing the issue.
msg370692 - (view) Author: David Bolen (db3l) Date: 2020-06-04 01:34
I haven't had a chance to look too deeply, but the final set of commits (starting with fa7ab6aa) appear to be the common thread with all branches now failing with timeout exceptions in test_repl on the Windows 10 buildbot.

The first instance was in the 3.x branch (https://buildbot.python.org/all/#/builders/129/builds/1191) but they all seem similar.

-- David
msg370694 - (view) Author: David Bolen (db3l) Date: 2020-06-04 02:53
It appears the recent commit is causing a CRT exception dialog in test_close_stdin (test_repl.TestInteractiveInterpreter).  The dialog can't get answered, which is what leads to the eventual timeout.

The assertion is "_osfile(fh) & FOPEN" from osfinfo.cpp on line 258.  It actually appears twice (if the first is ignored).  If both are ignored the test passes.

I was somewhat surprised to see the dialog, as I believe CRT exception dialogs are disabled during test runs (others are present in test logs).  Perhaps the child interpreter used by test_repl doesn't inherit that behavior.  If so, this was probably always a risk, there just weren't any assertions occurring before this in the child interpreter.
msg370696 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-06-04 07:08
> Perhaps the child interpreter used by test_repl doesn't inherit 
> that behavior.

The OS error mode is inherited, unless the child is created with CREATE_DEFAULT_ERROR_MODE flag. But the CRT error mode isn't inherited. (For the spawn* family, in principle the CRT could pass its error modes in reserved STARTUPINFO fields, like it does for file descriptors, but it doesn't do that, and subprocess wouldn't be privy to that protocol anyway.) 

For example:

    >>> import sys, subprocess
    >>> from msvcrt import *
    >>> SEM_FAILCRITICALERRORS, CRTDBG_MODE_FILE, CRTDBG_MODE_WNDW
    (1, 1, 4)

    >>> SetErrorMode(SEM_FAILCRITICALERRORS)
    0
    >>> CrtSetReportMode(CRT_ERROR, CRTDBG_MODE_FILE)
    4

The return values are the previous values, which are respectively 0 (default) for the OS and CRTDBG_MODE_WNDW (message box) for the CRT. Now check the initial values in a child process:

    >>> subprocess.call(f'{sys.executable} -q')

    >>> from msvcrt import *
    >>> SetErrorMode(SEM_FAILCRITICALERRORS)
    1
    >>> CrtSetReportMode(CRT_ERROR, CRTDBG_MODE_FILE)
    4

The OS error mode of the parent was inherited, but the parent's CRT error mode was not.

libregrtest has a convenient function to suppress error reporting in the child. For example:

    >>> import sys, subprocess
    >>> subprocess.call(f'{sys.executable} -q')

    >>> import os
    >>> from test.libregrtest import setup
    >>> setup.suppress_msvcrt_asserts(0)
    >>> os.close(0)
    >>>
    0
msg370716 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-04 16:31
I am confused. Code which uses the closed file descriptor 0 uses "_Py_BEGIN_SUPPRESS_IPH":

        _Py_BEGIN_SUPPRESS_IPH
        hStdIn = (HANDLE)_get_osfhandle(fileno(sys_stdin));
        hStdErr = (HANDLE)_get_osfhandle(fileno(stderr));
        _Py_END_SUPPRESS_IPH

and

    _Py_BEGIN_SUPPRESS_IPH
    handle = (HANDLE)_get_osfhandle(fileno(fp));
    _Py_END_SUPPRESS_IPH

Macros defined as:

extern _invalid_parameter_handler _Py_silent_invalid_parameter_handler;
#define _Py_BEGIN_SUPPRESS_IPH { _invalid_parameter_handler _Py_old_handler = \
    _set_thread_local_invalid_parameter_handler(_Py_silent_invalid_parameter_handler);
#define _Py_END_SUPPRESS_IPH _set_thread_local_invalid_parameter_handler(_Py_old_handler); }

with PC/invalid_parameter_handler.c:

static void __cdecl _silent_invalid_parameter_handler(
    wchar_t const* expression,
    wchar_t const* function,
    wchar_t const* file,
    unsigned int line,
    uintptr_t pReserved) { }

_invalid_parameter_handler _Py_silent_invalid_parameter_handler = _silent_invalid_parameter_handler;


The purpose of _Py_BEGIN_SUPPRESS_IPH is to suppress such popup, no?
msg370718 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-06-04 17:59
> The purpose of _Py_BEGIN_SUPPRESS_IPH is to suppress such popup, no?

That macro suppresses the CRT's invalid parameter handler, which by default  terminates abnormally with a fastfail (status code 0xC0000409), even in a release build. 

In a debug build, we still have message boxes from failed asserts that call _CrtDbgReportW. For example, the following stack trace shows _CrtDbgReportW called from _get_osfhandle:

    Call Site
    win32u!NtUserWaitMessage
    user32!DialogBox2
    user32!InternalDialogBox
    user32!SoftModalMessageBox
    user32!MessageBoxWorker
    user32!MessageBoxTimeoutW
    user32!MessageBoxW
    ucrtbased!__acrt_MessageBoxW
    ucrtbased!__crt_char_traits<wchar_t>::message_box<HWND__ * __ptr64,
                        wchar_t const * __ptr64 const & __ptr64,
                        wchar_t const * __ptr64 const & __ptr64,
                        unsigned int const & __ptr64>
    ucrtbased!common_show_message_box<wchar_t>
    ucrtbased!__acrt_show_wide_message_box
    ucrtbased!common_message_window<wchar_t>
    ucrtbased!__acrt_MessageWindowW
    ucrtbased!_VCrtDbgReportW
    ucrtbased!_CrtDbgReportW
    ucrtbased!_get_osfhandle
    python310_d!PyOS_StdioReadline

_get_osfhandle validates that the fd is open via

    _VALIDATE_CLEAR_OSSERR_RETURN(_osfile(fh) & FOPEN, EBADF, -1);

which uses the following macro:

    #define _VALIDATE_CLEAR_OSSERR_RETURN(expr, errorcode, retexpr)                \
        {                                                                          \
            int _Expr_val=!!(expr);                                                \
            _ASSERT_EXPR((_Expr_val), _CRT_WIDE(#expr));                           \
            if (!(_Expr_val))                                                      \
            {                                                                      \
                _doserrno = 0L;                                                    \
                errno = errorcode;                                                 \
                _INVALID_PARAMETER(_CRT_WIDE(#expr));                              \
                return retexpr;                                                    \
            }                                                                      \
        }

In a debug build, _ASSERT_EXPR expands as:

    #define _ASSERT_EXPR(expr, msg) \
        (void)(                                                                                     \
            (!!(expr)) ||                                                                           \
            (1 != _CrtDbgReportW(_CRT_ASSERT, _CRT_WIDE(__FILE__), __LINE__, NULL, L"%ls", msg)) || \
            (_CrtDbgBreak(), 0)                                                                     \
        )

which is where _CrtDbgReportW gets called. By default it reports the assertion failure with a message box. The suppression of this in libregrtest either ignores this report or sends it to stderr:

    for m in [msvcrt.CRT_WARN, msvcrt.CRT_ERROR, msvcrt.CRT_ASSERT]:
        if verbose:
            msvcrt.CrtSetReportMode(m, msvcrt.CRTDBG_MODE_FILE)
            msvcrt.CrtSetReportFile(m, msvcrt.CRTDBG_FILE_STDERR)
        else:
            msvcrt.CrtSetReportMode(m, 0)

msvcrt.CrtSetReportMode and msvcrt.CrtSetReportFile only exist in a debug build.
msg371169 - (view) Author: David Bolen (db3l) Date: 2020-06-10 07:21
So a script I use on the buildbot intended to prevent hanging on assertions was failing to work in this particular case, which I've corrected.  That changes the failure mode for new Win10 buildbot runs.

The test in question (test_close_stdin) still fails, but does so immediately (return code not 0, as the child process aborts) rather than timing out.  

If the assertion is expected, and something to be ignored, the test itself still needs to handle that.
msg371223 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-10 16:49
New changeset f6e58aefde2e57e4cb11ea7743955da53a3f1e80 by Victor Stinner in branch 'master':
bpo-40826: Fix test_repl.test_close_stdin() on Windows (GH-20779)
https://github.com/python/cpython/commit/f6e58aefde2e57e4cb11ea7743955da53a3f1e80
msg371224 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-10 17:00
Thanks for the bug report David Bolen and thanks for the analysis Eryk Sun. The bug should be be fixed in the master branch, and backports to 3.8 and 3.9 are coming.
msg371226 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-10 17:25
New changeset 4a4f660cfde8b683634c53e6214a6baa51de43b1 by Victor Stinner in branch '3.9':
bpo-40826: Fix test_repl.test_close_stdin() on Windows (GH-20779) (GH-20785)
https://github.com/python/cpython/commit/4a4f660cfde8b683634c53e6214a6baa51de43b1
msg371227 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-10 17:45
New changeset c7a6c7b5279f766e65b7cc9dc5bebb73acee6672 by Victor Stinner in branch '3.8':
bpo-40826: Fix test_repl.test_close_stdin() on Windows (GH-20779) (GH-20785) (GH-20787)
https://github.com/python/cpython/commit/c7a6c7b5279f766e65b7cc9dc5bebb73acee6672
msg371233 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-06-10 18:41
Why doesn't SuppressCrashReport suppress the hard error dialog (i.e. SEM_FAILCRITICALERRORS)? This dialog gets created by the NtRaiseHardError system call. For example:

    >>> NtRaiseHardError = ctypes.windll.ntdll.NtRaiseHardError
    >>> response = (ctypes.c_ulong * 1)()

With the default OS error mode, the following raises a hard error dialog for STATUS_UNSUCCESSFUL (0xC0000001), with abort/retry/ignore options:

    >>> NtRaiseHardError(0xC000_0001, 0, 0, None, 0, response)
    0
    >>> response[0]
    2

The response value is limited to abort (2), retry (7), ignore (4) -- or simply returned (0) when the process hard error mode is disabled: 

    >>> msvcrt.SetErrorMode(msvcrt.SEM_FAILCRITICALERRORS)
    0
    >>> NtRaiseHardError(0xC000_0001, 0, 0, None, 0, response)
    0
    >>> response[0]
    0

NtRaiseHardError also checks for an error-mode override flag (0x1000_0000) in the status code, which is used in cases such as WinAPI FatalAppExitW. For example, the following will raise the dialog regardless of the hard error mode:

    >>> NtRaiseHardError(0xC000_0001 | 0x1000_0000, 0, 0, None, 0, response)
    0
    >>> response[0]
    2
msg371239 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-06-10 20:13
> Why doesn't SuppressCrashReport suppress the hard error dialog (i.e. SEM_FAILCRITICALERRORS)? This dialog gets created by the NtRaiseHardError system call.

Would you mind to open a separated issue?
History
Date User Action Args
2020-06-10 20:13:01vstinnersetmessages: + msg371239
2020-06-10 18:41:22eryksunsetmessages: + msg371233
2020-06-10 17:46:11vstinnersetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2020-06-10 17:45:53vstinnersetmessages: + msg371227
2020-06-10 17:27:35vstinnersetpull_requests: + pull_request19982
2020-06-10 17:25:04vstinnersetmessages: + msg371226
2020-06-10 17:00:22vstinnersetmessages: + msg371224
2020-06-10 16:58:19vstinnersetpull_requests: + pull_request19981
2020-06-10 16:49:29vstinnersetmessages: + msg371223
2020-06-10 14:25:28vstinnersetstage: resolved -> patch review
pull_requests: + pull_request19975
2020-06-10 07:21:24db3lsetmessages: + msg371169
2020-06-04 17:59:45eryksunsetmessages: + msg370718
2020-06-04 16:31:56vstinnersetstatus: closed -> open
resolution: fixed -> (no value)
messages: + msg370716
2020-06-04 07:08:46eryksunsetnosy: + eryksun
messages: + msg370696
2020-06-04 02:53:35db3lsetmessages: + msg370694
2020-06-04 01:34:32db3lsetnosy: + db3l
messages: + msg370692
2020-06-03 18:18:16vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg370683

stage: patch review -> resolved
2020-06-03 18:17:13vstinnersetmessages: + msg370682
2020-06-03 16:36:53vstinnersetpull_requests: + pull_request19844
2020-06-03 16:28:26vstinnersetmessages: + msg370679
2020-06-03 15:55:20vstinnersetpull_requests: + pull_request19842
2020-06-03 15:49:29vstinnersetmessages: + msg370676
2020-06-03 15:08:28vstinnersetpull_requests: + pull_request19840
2020-06-03 12:40:03vstinnersetmessages: + msg370661
2020-06-02 14:02:25vstinnersetpull_requests: + pull_request19827
2020-06-01 18:59:43vstinnersetmessages: + msg370576
2020-06-01 18:57:09vstinnersetmessages: + msg370575
2020-06-01 18:34:23vstinnersetmessages: + msg370569
2020-06-01 18:32:13vstinnersetpull_requests: + pull_request19818
2020-06-01 18:13:34vstinnersetpull_requests: + pull_request19817
2020-06-01 15:38:55vstinnersetmessages: + msg370556
versions: + Python 3.8
2020-06-01 15:36:27vstinnersetmessages: + msg370555
2020-06-01 15:23:52vstinnersetmessages: + msg370554
2020-06-01 15:03:02vstinnersetmessages: + msg370553
2020-06-01 14:02:47vstinnersetmessages: + msg370552
2020-06-01 13:29:12vstinnersetpull_requests: + pull_request19811
2020-06-01 13:00:20vstinnersetmessages: + msg370550
2020-06-01 12:55:31vstinnersettitle: Segfaults when close file descriptor 0 -> PyOS_InterruptOccurred() now requires to hold the GIL: PyOS_Readline() crash
2020-06-01 12:53:37vstinnersetmessages: + msg370549
2020-05-31 04:58:39corona10setkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request19793
2020-05-30 17:16:57corona10setmessages: + msg370392
2020-05-30 16:11:19serhiy.storchakasettitle: Segfaults on io.FileIO(0).name in 3.9 -> Segfaults when close file descriptor 0
nosy: + serhiy.storchaka

messages: + msg370388

components: + Interpreter Core
2020-05-30 15:01:45corona10setmessages: + msg370387
stage: needs patch
2020-05-30 15:01:08corona10setnosy: + corona10
2020-05-30 14:37:15remi.lapeyresetnosy: + vstinner, remi.lapeyre
2020-05-30 14:32:01xtreaksettype: crash
2020-05-30 14:19:46Jelle Zijlstracreate