classification
Title: Infinite loop uninterruptable on Windows in 3.10
Type: behavior Stage: resolved
Components: Windows Versions: Python 3.10, Python 3.9
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Mark.Shannon, eryksun, gvanrossum, miss-islington, paul.moore, serhiy.storchaka, steve.dower, tim.golden, vstinner, zach.ware
Priority: Keywords: patch

Created on 2020-11-09 17:25 by gvanrossum, last changed 2020-11-14 00:44 by gvanrossum. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 23257 merged vstinner, 2020-11-13 13:17
PR 23259 merged miss-islington, 2020-11-13 13:49
Messages (16)
msg380597 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-11-09 17:25
This code cannot be interrupted with ^C on Windows (e.g. in the REPL)

while True:
    pass

This seems to be a regression, it works in earlier versions.
msg380600 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-11-09 17:38
It also cannot be interrupted in 3.9. But 3.8 and earlier work correctly.
msg380606 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-11-09 18:35
See bpo-40010. COMPUTE_EVAL_BREAKER() in Python/ceval.c -- or _Py_ThreadCanHandleSignals in Include/internal/pycore_pystate.h -- needs to take into account that SIGNAL_PENDING_SIGNALS() gets called on a completely new thread in Windows.
msg380608 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-11-09 18:51
So you're saying this war broken by https://github.com/python/cpython/pull/19087 ?
msg380609 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-11-09 18:57
Yes, if I force the return value of _Py_ThreadCanHandleSignals to 1, the loop is broken by a KeyboardInterrupt.
msg380853 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-11-12 21:41
Can you think of a fix? (Presumably restore some code that was deleted from 3.9?)
msg380864 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-11-13 00:50
This looks like the smallest change I can make to fix it:

diff --git a/Include/internal/pycore_pystate.h b/Include/internal/pycore_pystate.h
index 0cd5550cfd..9ff740b87b 100644
--- a/Include/internal/pycore_pystate.h
+++ b/Include/internal/pycore_pystate.h
@@ -34,7 +34,11 @@ _Py_IsMainInterpreter(PyThreadState* tstate)
 static inline int
 _Py_ThreadCanHandleSignals(PyInterpreterState *interp)
 {
+#ifndef MS_WINDOWS
     return (_Py_IsMainThread() && interp == _PyRuntime.interpreters.main);
+#else
+    return (interp == _PyRuntime.interpreters.main);
+#endif
 }

I'll need Victor(?) to confirm whether checking for the main interpreter is sufficient. We're calling this from a new thread that AFAICT never holds the GIL, and we pass the main interpreter in explicitly (from trip_signal) so this check always succeeds (here, but not necessarily in other places where it is called).
msg380869 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-11-13 03:45
Note that this issue only applies to a process that has a single Python thread. With multiple Python threads, the signal handler gets called when the main thread acquires the GIL. 

As far as Windows console-driven SIGINT and SIGBREAK are concerned, C signal_handler() is called on a new thread that the interpreter has never seen before and will never see again. But there's also signal.raise_signal(signal.SIGINT) to consider, executing on a Python thread. Another path to trip_signal() is via _thread.interrupt_main(), i.e. PyErr_SetInterrupt().

_Py_ThreadCanHandleSignals() checks whether the current thread is the main thread in the main interpreter. It gets called to guarantee this condition in multiple places, such as handle_signals() in Python/ceval.c. So the _Py_IsMainThread() call is required. In Windows, maybe the main-thread check could be bypassed in SIGNAL_PENDING_SIGNALS() for a non-Python thread (i.e. no tstate), by directly setting ceval2->eval_breaker instead of calling COMPUTE_EVAL_BREAKER() in this case.
msg380878 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-11-13 12:38
This issue reminds me bpo-40082 where I wrote: "The problem on Windows is that each CTRL+c is executed in a different thread." I fixed the issue with:

New changeset b54a99d6432de93de85be2b42a63774f8b4581a0 by Victor Stinner in branch 'master':
bpo-40082: trip_signal() uses the main interpreter (GH-19441)
https://github.com/python/cpython/commit/b54a99d6432de93de85be2b42a63774f8b4581a0

"Fix the signal handler: it now always uses the main interpreter,
rather than trying to get the current Python thread state."

Here we have to be careful. If _Py_ThreadCanHandleSignals() always return true, we may reintroduce bpo-40010 issue: ceval bytecode evaluation loop may be interrupted at every single instruction and call eval_frame_handle_pending() which does nothing.

COMPUTE_EVAL_BREAKER() decides if the loop must be interrupted.

Rather than modifying _Py_ThreadCanHandleSignals(), I would prefer to modify SIGNAL_PENDING_CALLS(). For example, rather than using COMPUTE_EVAL_BREAKER() complex logic to decide if the current Python thread must check if there is a pending signal, always interrupt and let the thread decide if it has something to do.

The second problem is to reset eval_breaker to 0. If there is no pending signal and no pending call, eval_frame_handle_pending() leaves eval_breaker unchanged. eval_frame_handle_pending() should also be updated to reset eval_breaker in this case.
msg380884 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-11-13 13:44
New changeset d96a7a83133250377219227b5cfab4dbdddc5d3a by Victor Stinner in branch 'master':
bpo-42296: On Windows, fix CTRL+C regression (GH-23257)
https://github.com/python/cpython/commit/d96a7a83133250377219227b5cfab4dbdddc5d3a
msg380886 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-11-13 14:08
> always interrupt and let the thread decide if it has something to do.

SIGNAL_PENDING_CALLS() is called on a Python thread via signal.raise_signal() or _thread.interrupt_main() / PyErr_SetInterrupt(). If you'd rather keep the COMPUTE_EVAL_BREAKER() call in that case, the console control-event case can be distinguished via PyGILState_GetThisThreadState(). It returns NULL if there's no thread state, i.e. WINAPI TlsGetValue returns NULL.
msg380887 - (view) Author: miss-islington (miss-islington) Date: 2020-11-13 14:11
New changeset e5729aef6ff67ae7ed05dffc0855477823826191 by Miss Islington (bot) in branch '3.9':
bpo-42296: On Windows, fix CTRL+C regression (GH-23257)
https://github.com/python/cpython/commit/e5729aef6ff67ae7ed05dffc0855477823826191
msg380892 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-11-13 14:30
> SIGNAL_PENDING_CALLS() is called on a Python thread via signal.raise_signal() or _thread.interrupt_main() / PyErr_SetInterrupt(). If you'd rather keep the COMPUTE_EVAL_BREAKER() call in that case, the console control-event case can be distinguished via PyGILState_GetThisThreadState(). It returns NULL if there's no thread state, i.e. WINAPI TlsGetValue returns NULL.

That sounds like a micro-optimization which is not worth it. The code is already quite complicated. I don't think that it's a big deal to call eval_frame_handle_pending() *once* when a signal is received whereas the "current" Python thread cannot handle it. This function is quite simple: when there is no nothing to do, it only reads 3 atomic variable and one tstate attribute. It's cheap.
msg380893 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-11-13 14:35
> This code cannot be interrupted with ^C on Windows (e.g. in the REPL)

I tested manually: it's now fixed in 3.9 and master branches.

Thanks for the bug report Guido.


Signal handling is hard. Threads + signals is worse! :-) I modified _PyEval_SignalReceived() to always set eval_breaker to 1. See comments of my commit for the long rationale ;-)


If someone knows how to write an automated test for that on Windows, please create a PR and open a new issue ;-)
msg380927 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-11-13 19:16
> That sounds like a micro-optimization which is not worth it.

In the back of my mind I was also thinking to generalize the behavior at runtime whenever a signal is tripped by a non-Python thread (e.g. a thread created by an extension module or ctypes), instead of special-casing Windows. 

To examine this, I created a C library in Linux that defines a test() function that creates two threads via pthread_create(). The first thread sleeps for 10 seconds, and the second thread sleeps for 5 seconds and then calls pthread_kill() to send a SIGINT to the first thread. In 3.8, calling the test() function via ctypes followed by executing an infinite loop will interrupt the loop with a KeyboardInterrupt as soon as the second thread sends SIGINT. But in 3.10, the loop never gets interrupted because the C signal handler isn't called on the main thread, so eval_breaker never gets set.
msg380947 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-11-14 00:44
Thanks for the quick fix. It works!
History
Date User Action Args
2020-11-14 00:44:58gvanrossumsetmessages: + msg380947
2020-11-13 19:22:23eryksunsetpriority: release blocker ->
status: open -> closed
stage: resolved
2020-11-13 19:16:12eryksunsetstatus: closed -> open
priority: release blocker
messages: + msg380927

keywords: + patch, - 3.9regression
resolution: fixed ->
stage: resolved -> (no value)
2020-11-13 14:36:05vstinnersetkeywords: + 3.9regression, - patch
2020-11-13 14:35:57vstinnersetstatus: open -> closed
priority: release blocker -> (no value)
messages: + msg380893

resolution: fixed
stage: resolved
2020-11-13 14:30:47vstinnersetmessages: + msg380892
2020-11-13 14:11:41miss-islingtonsetmessages: + msg380887
2020-11-13 14:08:17eryksunsetmessages: + msg380886
stage: patch review -> (no value)
2020-11-13 13:49:52miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request22156
2020-11-13 13:44:46vstinnersetmessages: + msg380884
2020-11-13 13:17:42vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request22154
2020-11-13 12:38:41vstinnersetmessages: + msg380878
2020-11-13 03:45:30eryksunsetmessages: + msg380869
2020-11-13 00:50:10steve.dowersetmessages: + msg380864
2020-11-12 21:41:25gvanrossumsetmessages: + msg380853
2020-11-09 18:57:32eryksunsetmessages: + msg380609
2020-11-09 18:51:42gvanrossumsetmessages: + msg380608
2020-11-09 18:35:21eryksunsetmessages: + msg380606
2020-11-09 18:18:15eryksunsetnosy: + vstinner
2020-11-09 17:38:07eryksunsetversions: + Python 3.9
nosy: + eryksun

messages: + msg380600

type: behavior
2020-11-09 17:28:06gvanrossumsetpriority: normal -> release blocker
2020-11-09 17:27:08gvanrossumsetnosy: + serhiy.storchaka
2020-11-09 17:26:06gvanrossumsetnosy: + paul.moore, tim.golden, zach.ware
components: + Windows
2020-11-09 17:25:49gvanrossumcreate