Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite loop uninterruptable on Windows in 3.10 #86462

Closed
gvanrossum opened this issue Nov 9, 2020 · 16 comments
Closed

Infinite loop uninterruptable on Windows in 3.10 #86462

gvanrossum opened this issue Nov 9, 2020 · 16 comments
Labels
3.9 only security fixes 3.10 only security fixes OS-windows type-bug An unexpected behavior, bug, or error

Comments

@gvanrossum
Copy link
Member

BPO 42296
Nosy @gvanrossum, @pfmoore, @vstinner, @tjguk, @markshannon, @zware, @serhiy-storchaka, @eryksun, @zooba, @miss-islington
PRs
  • bpo-42296: On Windows, fix CTRL+C regression #23257
  • [3.9] bpo-42296: On Windows, fix CTRL+C regression (GH-23257) #23259
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-11-13.19:22:23.856>
    created_at = <Date 2020-11-09.17:25:49.138>
    labels = ['3.10', 'type-bug', '3.9', 'OS-windows']
    title = 'Infinite loop uninterruptable on Windows in 3.10'
    updated_at = <Date 2020-11-14.00:44:58.378>
    user = 'https://github.com/gvanrossum'

    bugs.python.org fields:

    activity = <Date 2020-11-14.00:44:58.378>
    actor = 'gvanrossum'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-11-13.19:22:23.856>
    closer = 'eryksun'
    components = ['Windows']
    creation = <Date 2020-11-09.17:25:49.138>
    creator = 'gvanrossum'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 42296
    keywords = ['patch']
    message_count = 16.0
    messages = ['380597', '380600', '380606', '380608', '380609', '380853', '380864', '380869', '380878', '380884', '380886', '380887', '380892', '380893', '380927', '380947']
    nosy_count = 10.0
    nosy_names = ['gvanrossum', 'paul.moore', 'vstinner', 'tim.golden', 'Mark.Shannon', 'zach.ware', 'serhiy.storchaka', 'eryksun', 'steve.dower', 'miss-islington']
    pr_nums = ['23257', '23259']
    priority = None
    resolution = None
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue42296'
    versions = ['Python 3.9', 'Python 3.10']

    @gvanrossum
    Copy link
    Member Author

    This code cannot be interrupted with ^C on Windows (e.g. in the REPL)

    while True:
        pass

    This seems to be a regression, it works in earlier versions.

    @eryksun
    Copy link
    Contributor

    eryksun commented Nov 9, 2020

    It also cannot be interrupted in 3.9. But 3.8 and earlier work correctly.

    @eryksun eryksun added 3.9 only security fixes type-bug An unexpected behavior, bug, or error labels Nov 9, 2020
    @eryksun
    Copy link
    Contributor

    eryksun commented Nov 9, 2020

    See bpo-40010. COMPUTE_EVAL_BREAKER() in Python/ceval.c -- or _Py_ThreadCanHandleSignals in Include/internal/pycore_pystate.h -- needs to take into account that SIGNAL_PENDING_SIGNALS() gets called on a completely new thread in Windows.

    @gvanrossum
    Copy link
    Member Author

    So you're saying this war broken by #19087 ?

    @eryksun
    Copy link
    Contributor

    eryksun commented Nov 9, 2020

    Yes, if I force the return value of _Py_ThreadCanHandleSignals to 1, the loop is broken by a KeyboardInterrupt.

    @gvanrossum
    Copy link
    Member Author

    Can you think of a fix? (Presumably restore some code that was deleted from 3.9?)

    @zooba
    Copy link
    Member

    zooba commented Nov 13, 2020

    This looks like the smallest change I can make to fix it:

    diff --git a/Include/internal/pycore_pystate.h b/Include/internal/pycore_pystate.h
    index 0cd5550cfd..9ff740b87b 100644
    --- a/Include/internal/pycore_pystate.h
    +++ b/Include/internal/pycore_pystate.h
    @@ -34,7 +34,11 @@ _Py_IsMainInterpreter(PyThreadState* tstate)
     static inline int
     _Py_ThreadCanHandleSignals(PyInterpreterState *interp)
     {
    +#ifndef MS_WINDOWS
         return (_Py_IsMainThread() && interp == _PyRuntime.interpreters.main);
    +#else
    +    return (interp == _PyRuntime.interpreters.main);
    +#endif
     }

    I'll need Victor(?) to confirm whether checking for the main interpreter is sufficient. We're calling this from a new thread that AFAICT never holds the GIL, and we pass the main interpreter in explicitly (from trip_signal) so this check always succeeds (here, but not necessarily in other places where it is called).

    @eryksun
    Copy link
    Contributor

    eryksun commented Nov 13, 2020

    Note that this issue only applies to a process that has a single Python thread. With multiple Python threads, the signal handler gets called when the main thread acquires the GIL.

    As far as Windows console-driven SIGINT and SIGBREAK are concerned, C signal_handler() is called on a new thread that the interpreter has never seen before and will never see again. But there's also signal.raise_signal(signal.SIGINT) to consider, executing on a Python thread. Another path to trip_signal() is via _thread.interrupt_main(), i.e. PyErr_SetInterrupt().

    _Py_ThreadCanHandleSignals() checks whether the current thread is the main thread in the main interpreter. It gets called to guarantee this condition in multiple places, such as handle_signals() in Python/ceval.c. So the _Py_IsMainThread() call is required. In Windows, maybe the main-thread check could be bypassed in SIGNAL_PENDING_SIGNALS() for a non-Python thread (i.e. no tstate), by directly setting ceval2->eval_breaker instead of calling COMPUTE_EVAL_BREAKER() in this case.

    @vstinner
    Copy link
    Member

    This issue reminds me bpo-40082 where I wrote: "The problem on Windows is that each CTRL+c is executed in a different thread." I fixed the issue with:

    New changeset b54a99d by Victor Stinner in branch 'master':
    bpo-40082: trip_signal() uses the main interpreter (GH-19441)
    b54a99d

    "Fix the signal handler: it now always uses the main interpreter,
    rather than trying to get the current Python thread state."

    Here we have to be careful. If _Py_ThreadCanHandleSignals() always return true, we may reintroduce bpo-40010 issue: ceval bytecode evaluation loop may be interrupted at every single instruction and call eval_frame_handle_pending() which does nothing.

    COMPUTE_EVAL_BREAKER() decides if the loop must be interrupted.

    Rather than modifying _Py_ThreadCanHandleSignals(), I would prefer to modify SIGNAL_PENDING_CALLS(). For example, rather than using COMPUTE_EVAL_BREAKER() complex logic to decide if the current Python thread must check if there is a pending signal, always interrupt and let the thread decide if it has something to do.

    The second problem is to reset eval_breaker to 0. If there is no pending signal and no pending call, eval_frame_handle_pending() leaves eval_breaker unchanged. eval_frame_handle_pending() should also be updated to reset eval_breaker in this case.

    @vstinner
    Copy link
    Member

    New changeset d96a7a8 by Victor Stinner in branch 'master':
    bpo-42296: On Windows, fix CTRL+C regression (GH-23257)
    d96a7a8

    @eryksun
    Copy link
    Contributor

    eryksun commented Nov 13, 2020

    always interrupt and let the thread decide if it has something to do.

    SIGNAL_PENDING_CALLS() is called on a Python thread via signal.raise_signal() or _thread.interrupt_main() / PyErr_SetInterrupt(). If you'd rather keep the COMPUTE_EVAL_BREAKER() call in that case, the console control-event case can be distinguished via PyGILState_GetThisThreadState(). It returns NULL if there's no thread state, i.e. WINAPI TlsGetValue returns NULL.

    @miss-islington
    Copy link
    Contributor

    New changeset e5729ae by Miss Islington (bot) in branch '3.9':
    bpo-42296: On Windows, fix CTRL+C regression (GH-23257)
    e5729ae

    @vstinner
    Copy link
    Member

    SIGNAL_PENDING_CALLS() is called on a Python thread via signal.raise_signal() or _thread.interrupt_main() / PyErr_SetInterrupt(). If you'd rather keep the COMPUTE_EVAL_BREAKER() call in that case, the console control-event case can be distinguished via PyGILState_GetThisThreadState(). It returns NULL if there's no thread state, i.e. WINAPI TlsGetValue returns NULL.

    That sounds like a micro-optimization which is not worth it. The code is already quite complicated. I don't think that it's a big deal to call eval_frame_handle_pending() *once* when a signal is received whereas the "current" Python thread cannot handle it. This function is quite simple: when there is no nothing to do, it only reads 3 atomic variable and one tstate attribute. It's cheap.

    @vstinner
    Copy link
    Member

    This code cannot be interrupted with ^C on Windows (e.g. in the REPL)

    I tested manually: it's now fixed in 3.9 and master branches.

    Thanks for the bug report Guido.

    Signal handling is hard. Threads + signals is worse! :-) I modified _PyEval_SignalReceived() to always set eval_breaker to 1. See comments of my commit for the long rationale ;-)

    If someone knows how to write an automated test for that on Windows, please create a PR and open a new issue ;-)

    @eryksun
    Copy link
    Contributor

    eryksun commented Nov 13, 2020

    That sounds like a micro-optimization which is not worth it.

    In the back of my mind I was also thinking to generalize the behavior at runtime whenever a signal is tripped by a non-Python thread (e.g. a thread created by an extension module or ctypes), instead of special-casing Windows.

    To examine this, I created a C library in Linux that defines a test() function that creates two threads via pthread_create(). The first thread sleeps for 10 seconds, and the second thread sleeps for 5 seconds and then calls pthread_kill() to send a SIGINT to the first thread. In 3.8, calling the test() function via ctypes followed by executing an infinite loop will interrupt the loop with a KeyboardInterrupt as soon as the second thread sends SIGINT. But in 3.10, the loop never gets interrupted because the C signal handler isn't called on the main thread, so eval_breaker never gets set.

    @gvanrossum
    Copy link
    Member Author

    Thanks for the quick fix. It works!

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes 3.10 only security fixes OS-windows type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants