Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle KeyboardInterrupt in asyncio #83803

Closed
developerdong mannequin opened this issue Feb 13, 2020 · 15 comments
Closed

Handle KeyboardInterrupt in asyncio #83803

developerdong mannequin opened this issue Feb 13, 2020 · 15 comments
Assignees
Labels
3.11 only security fixes topic-asyncio type-bug An unexpected behavior, bug, or error

Comments

@developerdong
Copy link
Mannequin

developerdong mannequin commented Feb 13, 2020

BPO 39622
Nosy @asvetlov, @1st1, @kmaork, @developerdong
PRs
  • bpo-39622: Use a custom signal handler in asyncio.run() #18628
  • bpo-39622: Interrupt the main asyncio task on Ctrl+C #32105
  • Files
  • SleepTest.py
  • asyncio_bug_demo.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/asvetlov'
    closed_at = <Date 2022-03-30.12:16:00.133>
    created_at = <Date 2020-02-13.07:09:49.352>
    labels = ['3.11', 'type-bug', 'expert-asyncio']
    title = 'Handle KeyboardInterrupt in asyncio'
    updated_at = <Date 2022-03-30.12:16:00.132>
    user = 'https://github.com/developerdong'

    bugs.python.org fields:

    activity = <Date 2022-03-30.12:16:00.132>
    actor = 'asvetlov'
    assignee = 'asvetlov'
    closed = True
    closed_date = <Date 2022-03-30.12:16:00.133>
    closer = 'asvetlov'
    components = ['asyncio']
    creation = <Date 2020-02-13.07:09:49.352>
    creator = 'Zhibin Dong'
    dependencies = []
    files = ['48893', '48906']
    hgrepos = []
    issue_num = 39622
    keywords = ['patch']
    message_count = 15.0
    messages = ['361939', '362267', '362295', '362341', '362346', '362377', '362542', '362544', '362586', '362610', '362668', '362670', '362751', '363679', '416345']
    nosy_count = 4.0
    nosy_names = ['asvetlov', 'yselivanov', 'kmaork', 'Zhibin Dong']
    pr_nums = ['18628', '32105']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue39622'
    versions = ['Python 3.11']

    @developerdong
    Copy link
    Mannequin Author

    developerdong mannequin commented Feb 13, 2020

    As shown in the code, when 0 is passed to asyncio.sleep function, sometimes the program does not exit when I press <Ctrl-C>. I must press <Ctrl-C> again to close the program. However, when a number, such as 0.01, which is bigger than 0, is passed to the sleep function, the program will exit as expected when I press <Ctrl-C> just once.

    Is there any bug or just a wrong way to use? Thanks.

    @developerdong developerdong mannequin added 3.7 (EOL) end of life topic-asyncio type-bug An unexpected behavior, bug, or error labels Feb 13, 2020
    @kmaork
    Copy link
    Mannequin

    kmaork mannequin commented Feb 19, 2020

    On which OS did that happen?

    @developerdong
    Copy link
    Mannequin Author

    developerdong mannequin commented Feb 20, 2020

    Ubuntu 18.04.4 with kernel 5.3.0-28-generic. Additionally Python version is 3.7.3.

    @kmaork
    Copy link
    Mannequin

    kmaork mannequin commented Feb 20, 2020

    This behaved similarly on my machine, also ubuntu. But it also happened (less often) with small numbers, like sleep(0.0000000000000000000001).

    Also, I tried it on my windows 10 and experienced another unexpected behavior - only when using sleep(0), Ctrl-C would not work *at all* from time to time. Most of the times it would stop the program immediately. This also happened when using 0.0000000000000000000001 instead of 0.

    Unless 0.0000000000000000000001 turns into 0 somewhere along the way, I'd suspect that there is a piece of code inside asyncio's core that doesn't handle interrupting correctly (both on linux and windows), and using sleep with small numbers just makes it much more probable for this piece of code to be running when the Ctrl-C signal is sent.

    I'll try to dig in more.

    @kmaork
    Copy link
    Mannequin

    kmaork mannequin commented Feb 20, 2020

    I wrote a small script that checks the behavior of asyncio.sleep when called with small amounts of time (starting from 0). For each amount I checked 500 times whether SleepTest has stopped when interrupted with SIGINT. Below are the results:

    SLEEP TIME SIGINT FAILURE PERCENT
    0 26.6%
    1e-06 9.8%
    2e-06 11.8%
    3e-06 11.2%
    4e-06 9.6%
    5e-06 13.8%
    6e-06 5.6%
    7e-06 2.6%
    8e-06 1.4%
    9e-06 1.6%
    1e-05 2.2%
    1.1e-05 2.2%
    1.2e-05 2.0%
    1.3e-05 1.8%
    1.4e-05 0.8%

    It is worth mentioning that the failure amount increased when my CPU was busier. Maybe it is because of false positives in my script (although I think I used relatively generous timeouts when waiting for the ioloop to start running and for the process to die after a SIGINT) and maybe it is because of the nature of the bug in asyncio. I didn't put a lot of effort into distinguishing between the two, as increasing the timeouts also made my CPU less busy.

    Anyway, there is definitely a weird behavior here, and I still think there is a specific piece of code in asyncio that's causing it. Maybe I'll look at the code a bit, or try to think of other ways to approach this.

    @developerdong
    Copy link
    Mannequin Author

    developerdong mannequin commented Feb 21, 2020

    Thank you for your investigation, I will also try to see what went wrong.

    @kmaork
    Copy link
    Mannequin

    kmaork mannequin commented Feb 23, 2020

    After digging into asyncio, I stumbled upon this particularly suspicious block in BaseEventLoop._run_once: https://github.com/python/cpython/blob/v3.9.0a3/Lib/asyncio/base_events.py#L1873

    handle = self._ready.popleft()
    if handle._cancelled:
        continue
    if self._debug:
        ...
        handle._run()
        ...
    else:
        handle._run()

    As you can see, a callback is popped from the dequeue of ready callbacks, and only after a couple of lines that callback is called. The question arises, what happens if an exception is raised in between? Or more specifically, What happens to that callback if a KeyboardInterrupt is raised before it is called?
    Well, appparently it dies and becomes one with the universe. The chances of it happening are the highest when the ioloop is running very short coroutines (like sleep(0)), and are increased when debug is on (because more code is executed in between).

    This is how the bug we've been experiencing came to life:
    When SIGINT is received it raises a KeyboardInterrupt in the running frame. If the running frame is a coroutine, it stops, the exception climbs up the stack, and the ioloop shuts down. Otherwise, the KeyboardInterrupt is probably raised inside asyncio's code, somewhere inside run_forever. In that case, the ioloop stops and proceeds to cancel all of the running tasks. After cancelling all the tasks, asyncio actually reruns the ioloop so all tasks receive the CancelledError and handle it or just die (see asyncio.runners._cancel_all_tasks).
    Enter our bug; sometimes, randomly, the loop gets stuck waiting for all the cancelled tasks to finish. This behavior is caused by the flaw I described earlier - if the KeyboardInterrupt was raised after a callback was popped and before it was run, the callback is lost and the task that was waiting for it will wait forever.
    Depending on the running tasks, the event loop might hang on the select call (until a interrupted by a signal, like SIGINT). This is what happens in SleepTest.py. Another case might be that only a part of the ioloop gets stuck, and other parts that are not dependent on the lost call still run correctly (and run into a CancelledError). This behavior is demonstrated in the script I added to this thread, asyncio_bug_demo.py.

    I see two possible solutions:

    1. Make all the code inside run_forever signal safe
    2. Override the default SIGINT handler in asyncio.run with one more fitting the way asyncio works

    I find the second solution much easier to implement well, and I think it makes more sense. I think python's default SIGINT handler fits normal single-threaded applications very well, but not so much an event loop. When using an event loop it makes sense to handle a signal as an event an process it along with the other running tasks. This is fully supported by the ioloop with the help of signal.set_wakeup_fd.
    I have implemented the second solution and opened a PR, please review it and tell me what you think!

    @kmaork
    Copy link
    Mannequin

    kmaork mannequin commented Feb 23, 2020

    After opening the PR, I see that add_signal_handler is not supported on non-unix event loop. After looking at the code, it seems there is no problem to implement it for other platforms, as it relies on signal.set_wakeup_fd() which seems to support all platforms. Should I open an enhancement issue for implementing add_signal_handler for all platforms?

    @asvetlov
    Copy link
    Contributor

    The second solution doesn't help with breaking infinite loops like
    while True:
    pass

    We need a more robust idea.

    @developerdong
    Copy link
    Mannequin Author

    developerdong mannequin commented Feb 25, 2020

    Yes, I think you are right. In the second solution, the callback of SIGINT will be append to the end of one task loop. If that code is defined in a task by some user, KeyboardInterrupt will not be raised.

    @kmaork
    Copy link
    Mannequin

    kmaork mannequin commented Feb 25, 2020

    Damn, good catch. How about the following idea - register a normal signal handler (using signal.signal) that does something like:

    def sigint_handler():
        get_running_loop().interrupt()
    
    # in class BaseEventLoop
    def interrupt(self):
        # Might be a generally useful thread-safe way to interrupt a loop
        if self._is_inside_callback():
            _thread.interrupt_main() # All this behavior is only relevant to the main thread anyway
        else:
            self._interrupted = True

    And inside BaseEventLoop._run_once() add the following check:

    # in class BaseEventLoop
    def _check_interrupted(self):
        # This will be called by BaseEventLoop._run_once() before calling select,
        # and before popping any handle from the ready queue
        if self._interrupted:
            raise KeyboardInterrupt

    @kmaork
    Copy link
    Mannequin

    kmaork mannequin commented Feb 25, 2020

    While looking into this proposal, I realized that this will not wake up the loop from the select call. I think the safest solution would be to use the wakeup fd mechanism.

    An additional easy, but less secure solution would be to define an internal 'signal safe' context manager that will be used only in the critical parts.

    What do you think?

    @kmaork
    Copy link
    Mannequin

    kmaork mannequin commented Feb 26, 2020

    I have pushed an update to my PR. While implementing the new solution I made the following list:

    Examples for code segments where KeyboardInterrupt must not be raised:

    • Between popping and running a handle from the ready queue (a handle is lost)
    • After running a handle and before a subsequent call_soon finished running (the handle that should have been added by call_soon is lost)
    • After a handle is popped from the _scheduled heap and added to the _ready queue (a handle is lost)

    Code segments where KeyboardInterrupt must be raised:

    • During the select call (can be achieved using signal.default_int_handler or signal.set_wakeup_fd)
    • During a running callback (can be achieved using signal.default_int_handler)

    I think that while the loop is running, the signal handler should not raise a KeyboardInterrupt by default, as there are at least 3 unsafe code segments, and more might be added in the future. Currently, the only two segments that have to be directly interrupted by a SIGINT are the ones listed above, and I think this behavior should be allowed explicitly.
    This is what I did in the new revision of my PR.

    @kmaork kmaork mannequin added 3.8 only security fixes 3.9 only security fixes labels Feb 28, 2020
    @kmaork
    Copy link
    Mannequin

    kmaork mannequin commented Mar 8, 2020

    Hey there, it's been more than a week since I pushed the last changes, can anyone review them?

    @asvetlov
    Copy link
    Contributor

    New changeset f08a191 by Andrew Svetlov in branch 'main':
    bpo-39622: Interrupt the main asyncio task on Ctrl+C (GH-32105)
    f08a191

    @asvetlov asvetlov added 3.11 only security fixes and removed 3.7 (EOL) end of life 3.8 only security fixes labels Mar 30, 2022
    @asvetlov asvetlov removed the 3.9 only security fixes label Mar 30, 2022
    @asvetlov asvetlov changed the title KeyboardInterrupt is ignored when await asyncio.sleep(0) Handle KeyboardInterrupt in asyncio Mar 30, 2022
    @asvetlov asvetlov added 3.11 only security fixes and removed 3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes labels Mar 30, 2022
    @asvetlov asvetlov changed the title KeyboardInterrupt is ignored when await asyncio.sleep(0) Handle KeyboardInterrupt in asyncio Mar 30, 2022
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    potiuk added a commit to potiuk/airflow that referenced this issue Apr 26, 2022
    There is a bug in CPython (fixed in March 2022 but not yet released) that
    makes async.io handle SIGTERM improperly by using async unsafe
    functions and hanging the triggerer receive SIGPIPE while handling
    SIGTERN/SIGINT and deadlocking itself. Until the bug is handled
    we should rather rely on standard handling of the signals rather than
    adding our own signal handlers. Seems that even if our signal handler
    just run exit(0) - it caused a race condition that led to the hanging.
    
    More details:
       * https://bugs.python.org/issue39622
       * python/cpython#83803
    
    Fixes: apache#19260
    potiuk added a commit to apache/airflow that referenced this issue Apr 27, 2022
    There is a bug in CPython (fixed in March 2022 but not yet released) that
    makes async.io handle SIGTERM improperly by using async unsafe
    functions and hanging the triggerer receive SIGPIPE while handling
    SIGTERN/SIGINT and deadlocking itself. Until the bug is handled
    we should rather rely on standard handling of the signals rather than
    adding our own signal handlers. Seems that even if our signal handler
    just run exit(0) - it caused a race condition that led to the hanging.
    
    More details:
       * https://bugs.python.org/issue39622
       * python/cpython#83803
    
    Fixes: #19260
    ephraimbuddy pushed a commit to apache/airflow that referenced this issue May 8, 2022
    There is a bug in CPython (fixed in March 2022 but not yet released) that
    makes async.io handle SIGTERM improperly by using async unsafe
    functions and hanging the triggerer receive SIGPIPE while handling
    SIGTERN/SIGINT and deadlocking itself. Until the bug is handled
    we should rather rely on standard handling of the signals rather than
    adding our own signal handlers. Seems that even if our signal handler
    just run exit(0) - it caused a race condition that led to the hanging.
    
    More details:
       * https://bugs.python.org/issue39622
       * python/cpython#83803
    
    Fixes: #19260
    (cherry picked from commit 6bdbed6)
    ephraimbuddy pushed a commit to apache/airflow that referenced this issue May 21, 2022
    There is a bug in CPython (fixed in March 2022 but not yet released) that
    makes async.io handle SIGTERM improperly by using async unsafe
    functions and hanging the triggerer receive SIGPIPE while handling
    SIGTERN/SIGINT and deadlocking itself. Until the bug is handled
    we should rather rely on standard handling of the signals rather than
    adding our own signal handlers. Seems that even if our signal handler
    just run exit(0) - it caused a race condition that led to the hanging.
    
    More details:
       * https://bugs.python.org/issue39622
       * python/cpython#83803
    
    Fixes: #19260
    (cherry picked from commit 6bdbed6)
    leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this issue Oct 4, 2022
    There is a bug in CPython (fixed in March 2022 but not yet released) that
    makes async.io handle SIGTERM improperly by using async unsafe
    functions and hanging the triggerer receive SIGPIPE while handling
    SIGTERN/SIGINT and deadlocking itself. Until the bug is handled
    we should rather rely on standard handling of the signals rather than
    adding our own signal handlers. Seems that even if our signal handler
    just run exit(0) - it caused a race condition that led to the hanging.
    
    More details:
       * https://bugs.python.org/issue39622
       * python/cpython#83803
    
    Fixes: #19260
    (cherry picked from commit 6bdbed6c43df3c5473b168a75c50e0139cc13e88)
    GitOrigin-RevId: a0a2ddf60da6bb65865fd2bdc2260f342ac11c7d
    leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this issue Dec 7, 2022
    There is a bug in CPython (fixed in March 2022 but not yet released) that
    makes async.io handle SIGTERM improperly by using async unsafe
    functions and hanging the triggerer receive SIGPIPE while handling
    SIGTERN/SIGINT and deadlocking itself. Until the bug is handled
    we should rather rely on standard handling of the signals rather than
    adding our own signal handlers. Seems that even if our signal handler
    just run exit(0) - it caused a race condition that led to the hanging.
    
    More details:
       * https://bugs.python.org/issue39622
       * python/cpython#83803
    
    Fixes: #19260
    (cherry picked from commit 6bdbed6c43df3c5473b168a75c50e0139cc13e88)
    GitOrigin-RevId: a0a2ddf60da6bb65865fd2bdc2260f342ac11c7d
    leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this issue Jan 27, 2023
    There is a bug in CPython (fixed in March 2022 but not yet released) that
    makes async.io handle SIGTERM improperly by using async unsafe
    functions and hanging the triggerer receive SIGPIPE while handling
    SIGTERN/SIGINT and deadlocking itself. Until the bug is handled
    we should rather rely on standard handling of the signals rather than
    adding our own signal handlers. Seems that even if our signal handler
    just run exit(0) - it caused a race condition that led to the hanging.
    
    More details:
       * https://bugs.python.org/issue39622
       * python/cpython#83803
    
    Fixes: #19260
    GitOrigin-RevId: 6bdbed6c43df3c5473b168a75c50e0139cc13e88
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 only security fixes topic-asyncio type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant