classification
Title: KeyboardInterrupt is ignored when await asyncio.sleep(0)
Type: behavior Stage: patch review
Components: asyncio Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Zhibin Dong, asvetlov, kmaork, yselivanov
Priority: normal Keywords: patch

Created on 2020-02-13 07:09 by Zhibin Dong, last changed 2020-02-26 23:00 by kmaork.

Files
File name Uploaded Description Edit
SleepTest.py Zhibin Dong, 2020-02-13 07:09
asyncio_bug_demo.py kmaork, 2020-02-23 21:49
Pull Requests
URL Status Linked Edit
PR 18628 open kmaork, 2020-02-23 21:42
Messages (13)
msg361939 - (view) Author: Zhibin Dong (Zhibin Dong) * Date: 2020-02-13 07:09
As shown in the code, when 0 is passed to asyncio.sleep function, sometimes the program does not exit when I press <Ctrl-C>. I must press <Ctrl-C> again to close the program. However, when a number, such as 0.01, which is bigger than 0, is passed to the sleep function, the program will exit as expected when I press <Ctrl-C> just once.

Is there any bug or just a wrong way to use? Thanks.
msg362267 - (view) Author: Maor Kleinberger (kmaork) * Date: 2020-02-19 12:22
On which OS did that happen?
msg362295 - (view) Author: Zhibin Dong (Zhibin Dong) * Date: 2020-02-20 01:39
Ubuntu 18.04.4 with kernel 5.3.0-28-generic. Additionally Python version is 3.7.3.
msg362341 - (view) Author: Maor Kleinberger (kmaork) * Date: 2020-02-20 18:43
This behaved similarly on my machine, also ubuntu. But it also happened (less often) with small numbers, like sleep(0.0000000000000000000001).

Also, I tried it on my windows 10 and experienced another unexpected behavior - only when using sleep(0), Ctrl-C would not work *at all* from time to time. Most of the times it would stop the program immediately. This also happened when using 0.0000000000000000000001 instead of 0.

Unless 0.0000000000000000000001 turns into 0 somewhere along the way, I'd suspect that there is a piece of code inside asyncio's core that doesn't handle interrupting correctly (both on linux and windows), and using sleep with small numbers just makes it much more probable for this piece of code to be running when the Ctrl-C signal is sent.

I'll try to dig in more.
msg362346 - (view) Author: Maor Kleinberger (kmaork) * Date: 2020-02-20 20:20
I wrote a small script that checks the behavior of asyncio.sleep when called with small amounts of time (starting from 0). For each amount I checked 500 times whether SleepTest has stopped when interrupted with SIGINT. Below are the results:

SLEEP TIME  SIGINT FAILURE PERCENT
0           26.6%
1e-06       9.8%
2e-06       11.8%
3e-06       11.2%
4e-06       9.6%
5e-06       13.8%
6e-06       5.6%
7e-06       2.6%
8e-06       1.4%
9e-06       1.6%
1e-05       2.2%
1.1e-05     2.2%
1.2e-05     2.0%
1.3e-05     1.8%
1.4e-05     0.8%

It is worth mentioning that the failure amount increased when my CPU was busier. Maybe it is because of false positives in my script (although I think I used relatively generous timeouts when waiting for the ioloop to start running and for the process to die after a SIGINT) and maybe it is because of the nature of the bug in asyncio. I didn't put a lot of effort into distinguishing between the two, as increasing the timeouts also made my CPU less busy.

Anyway, there is definitely a weird behavior here, and I still think there is a specific piece of code in asyncio that's causing it. Maybe I'll look at the code a bit, or try to think of other ways to approach this.
msg362377 - (view) Author: Zhibin Dong (Zhibin Dong) * Date: 2020-02-21 02:38
Thank you for your investigation, I will also try to see what went wrong.
msg362542 - (view) Author: Maor Kleinberger (kmaork) * Date: 2020-02-23 21:49
After digging into asyncio, I stumbled upon this particularly suspicious block in BaseEventLoop._run_once: https://github.com/python/cpython/blob/v3.9.0a3/Lib/asyncio/base_events.py#L1873

handle = self._ready.popleft()
if handle._cancelled:
    continue
if self._debug:
    ...
    handle._run()
    ...
else:
    handle._run()

As you can see, a callback is popped from the dequeue of ready callbacks, and only after a couple of lines that callback is called. The question arises, what happens if an exception is raised in between? Or more specifically, What happens to that callback if a KeyboardInterrupt is raised before it is called?
Well, appparently it dies and becomes one with the universe. The chances of it happening are the highest when the ioloop is running very short coroutines (like sleep(0)), and are increased when debug is on (because more code is executed in between).

This is how the bug we've been experiencing came to life:
When SIGINT is received it raises a KeyboardInterrupt in the running frame. If the running frame is a coroutine, it stops, the exception climbs up the stack, and the ioloop shuts down. Otherwise, the KeyboardInterrupt is probably raised inside asyncio's code, somewhere inside run_forever. In that case, the ioloop stops and proceeds to cancel all of the running tasks. After cancelling all the tasks, asyncio actually reruns the ioloop so all tasks receive the CancelledError and handle it or just die (see asyncio.runners._cancel_all_tasks).
Enter our bug; sometimes, randomly, the loop gets stuck waiting for all the cancelled tasks to finish. This behavior is caused by the flaw I described earlier - if the KeyboardInterrupt was raised after a callback was popped and before it was run, the callback is lost and the task that was waiting for it will wait forever.
Depending on the running tasks, the event loop might hang on the select call (until a interrupted by a signal, like SIGINT). This is what happens in SleepTest.py. Another case might be that only a part of the ioloop gets stuck, and other parts that are not dependent on the lost call still run correctly (and run into a CancelledError). This behavior is demonstrated in the script I added to this thread, asyncio_bug_demo.py.

I see two possible solutions:
1. Make all the code inside run_forever signal safe
2. Override the default SIGINT handler in asyncio.run with one more fitting the way asyncio works

I find the second solution much easier to implement well, and I think it makes more sense. I think python's default SIGINT handler fits normal single-threaded applications very well, but not so much an event loop. When using an event loop it makes sense to handle a signal as an event an process it along with the other running tasks. This is fully supported by the ioloop with the help of signal.set_wakeup_fd.
I have implemented the second solution and opened a PR, please review it and tell me what you think!
msg362544 - (view) Author: Maor Kleinberger (kmaork) * Date: 2020-02-23 22:18
After opening the PR, I see that add_signal_handler is not supported on non-unix event loop. After looking at the code, it seems there is no problem to implement it for other platforms, as it relies on signal.set_wakeup_fd() which seems to support all platforms. Should I open an enhancement issue for implementing add_signal_handler for all platforms?
msg362586 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2020-02-24 14:08
The second solution doesn't help with breaking infinite loops like 
while True:
    pass

We need a more robust idea.
msg362610 - (view) Author: Zhibin Dong (Zhibin Dong) * Date: 2020-02-25 01:09
Yes, I think you are right. In the second solution, the callback of SIGINT will be append to the end of one task loop. If that code is defined in a task by some user, KeyboardInterrupt will not be raised.
msg362668 - (view) Author: Maor Kleinberger (kmaork) * Date: 2020-02-25 22:35
Damn, good catch. How about the following idea - register a normal signal handler (using signal.signal) that does something like:

def sigint_handler():
    get_running_loop().interrupt()

# in class BaseEventLoop
def interrupt(self):
    # Might be a generally useful thread-safe way to interrupt a loop
    if self._is_inside_callback():
        _thread.interrupt_main() # All this behavior is only relevant to the main thread anyway
    else:
        self._interrupted = True

And inside BaseEventLoop._run_once() add the following check:

# in class BaseEventLoop
def _check_interrupted(self):
    # This will be called by BaseEventLoop._run_once() before calling select,
    # and before popping any handle from the ready queue
    if self._interrupted:
        raise KeyboardInterrupt
msg362670 - (view) Author: Maor Kleinberger (kmaork) * Date: 2020-02-25 23:13
While looking into this proposal, I realized that this will not wake up the loop from the select call. I think the safest solution would be to use the wakeup fd mechanism.

An additional easy, but less secure solution would be to define an internal 'signal safe' context manager that will be used only in the critical parts.

What do you think?
msg362751 - (view) Author: Maor Kleinberger (kmaork) * Date: 2020-02-26 23:00
I have pushed an update to my PR. While implementing the new solution I made the following list:

Examples for code segments where KeyboardInterrupt must not be raised:
 - Between popping and running a handle from the ready queue (a handle is lost)
 - After running a handle and before a subsequent call_soon finished running (the handle that should have been added by call_soon is lost)
 - After a handle is popped from the _scheduled heap and added to the _ready queue (a handle is lost)

Code segments where KeyboardInterrupt must be raised:
 - During the select call (can be achieved using signal.default_int_handler or signal.set_wakeup_fd)
 - During a running callback (can be achieved using signal.default_int_handler)

I think that while the loop is running, the signal handler should not raise a KeyboardInterrupt by default, as there are at least 3 unsafe code segments, and more might be added in the future. Currently, the only two segments that have to be directly interrupted by a SIGINT are the ones listed above, and I think this behavior should be allowed explicitly.
This is what I did in the new revision of my PR.
History
Date User Action Args
2020-02-26 23:00:35kmaorksetmessages: + msg362751
2020-02-25 23:13:31kmaorksetmessages: + msg362670
2020-02-25 22:35:29kmaorksetmessages: + msg362668
2020-02-25 01:09:18Zhibin Dongsetmessages: + msg362610
2020-02-24 14:08:39asvetlovsetmessages: + msg362586
2020-02-23 22:18:07kmaorksetmessages: + msg362544
2020-02-23 21:49:07kmaorksetfiles: + asyncio_bug_demo.py

messages: + msg362542
2020-02-23 21:42:19kmaorksetkeywords: + patch
stage: patch review
pull_requests: + pull_request17992
2020-02-21 02:38:46Zhibin Dongsetmessages: + msg362377
2020-02-20 20:20:31kmaorksetmessages: + msg362346
2020-02-20 18:43:19kmaorksetmessages: + msg362341
2020-02-20 01:39:34Zhibin Dongsetmessages: + msg362295
2020-02-19 12:22:47kmaorksetnosy: + kmaork
messages: + msg362267
2020-02-13 07:09:49Zhibin Dongcreate