Message 362542 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	kmaork
Recipients	Zhibin Dong, asvetlov, kmaork, yselivanov
Date	2020-02-23.21:49:07
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1582494547.48.0.927626445308.issue39622@roundup.psfhosted.org>
In-reply-to

Content
After digging into asyncio, I stumbled upon this particularly suspicious block in BaseEventLoop._run_once: https://github.com/python/cpython/blob/v3.9.0a3/Lib/asyncio/base_events.py#L1873 handle = self._ready.popleft() if handle._cancelled: continue if self._debug: ... handle._run() ... else: handle._run() As you can see, a callback is popped from the dequeue of ready callbacks, and only after a couple of lines that callback is called. The question arises, what happens if an exception is raised in between? Or more specifically, What happens to that callback if a KeyboardInterrupt is raised before it is called? Well, appparently it dies and becomes one with the universe. The chances of it happening are the highest when the ioloop is running very short coroutines (like sleep(0)), and are increased when debug is on (because more code is executed in between). This is how the bug we've been experiencing came to life: When SIGINT is received it raises a KeyboardInterrupt in the running frame. If the running frame is a coroutine, it stops, the exception climbs up the stack, and the ioloop shuts down. Otherwise, the KeyboardInterrupt is probably raised inside asyncio's code, somewhere inside run_forever. In that case, the ioloop stops and proceeds to cancel all of the running tasks. After cancelling all the tasks, asyncio actually reruns the ioloop so all tasks receive the CancelledError and handle it or just die (see asyncio.runners._cancel_all_tasks). Enter our bug; sometimes, randomly, the loop gets stuck waiting for all the cancelled tasks to finish. This behavior is caused by the flaw I described earlier - if the KeyboardInterrupt was raised after a callback was popped and before it was run, the callback is lost and the task that was waiting for it will wait forever. Depending on the running tasks, the event loop might hang on the select call (until a interrupted by a signal, like SIGINT). This is what happens in SleepTest.py. Another case might be that only a part of the ioloop gets stuck, and other parts that are not dependent on the lost call still run correctly (and run into a CancelledError). This behavior is demonstrated in the script I added to this thread, asyncio_bug_demo.py. I see two possible solutions: 1. Make all the code inside run_forever signal safe 2. Override the default SIGINT handler in asyncio.run with one more fitting the way asyncio works I find the second solution much easier to implement well, and I think it makes more sense. I think python's default SIGINT handler fits normal single-threaded applications very well, but not so much an event loop. When using an event loop it makes sense to handle a signal as an event an process it along with the other running tasks. This is fully supported by the ioloop with the help of signal.set_wakeup_fd. I have implemented the second solution and opened a PR, please review it and tell me what you think!

After digging into asyncio, I stumbled upon this particularly suspicious block in BaseEventLoop._run_once: https://github.com/python/cpython/blob/v3.9.0a3/Lib/asyncio/base_events.py#L1873

handle = self._ready.popleft()
if handle._cancelled:
continue
if self._debug:
...
handle._run()
...
else:
handle._run()

As you can see, a callback is popped from the dequeue of ready callbacks, and only after a couple of lines that callback is called. The question arises, what happens if an exception is raised in between? Or more specifically, What happens to that callback if a KeyboardInterrupt is raised before it is called?
Well, appparently it dies and becomes one with the universe. The chances of it happening are the highest when the ioloop is running very short coroutines (like sleep(0)), and are increased when debug is on (because more code is executed in between).

This is how the bug we've been experiencing came to life:
When SIGINT is received it raises a KeyboardInterrupt in the running frame. If the running frame is a coroutine, it stops, the exception climbs up the stack, and the ioloop shuts down. Otherwise, the KeyboardInterrupt is probably raised inside asyncio's code, somewhere inside run_forever. In that case, the ioloop stops and proceeds to cancel all of the running tasks. After cancelling all the tasks, asyncio actually reruns the ioloop so all tasks receive the CancelledError and handle it or just die (see asyncio.runners._cancel_all_tasks).
Enter our bug; sometimes, randomly, the loop gets stuck waiting for all the cancelled tasks to finish. This behavior is caused by the flaw I described earlier - if the KeyboardInterrupt was raised after a callback was popped and before it was run, the callback is lost and the task that was waiting for it will wait forever.
Depending on the running tasks, the event loop might hang on the select call (until a interrupted by a signal, like SIGINT). This is what happens in SleepTest.py. Another case might be that only a part of the ioloop gets stuck, and other parts that are not dependent on the lost call still run correctly (and run into a CancelledError). This behavior is demonstrated in the script I added to this thread, asyncio_bug_demo.py.

I see two possible solutions:
1. Make all the code inside run_forever signal safe
2. Override the default SIGINT handler in asyncio.run with one more fitting the way asyncio works

I find the second solution much easier to implement well, and I think it makes more sense. I think python's default SIGINT handler fits normal single-threaded applications very well, but not so much an event loop. When using an event loop it makes sense to handle a signal as an event an process it along with the other running tasks. This is fully supported by the ioloop with the help of signal.set_wakeup_fd.
I have implemented the second solution and opened a PR, please review it and tell me what you think!

History
Date	User	Action	Args
2020-02-23 21:49:07	kmaork	set	recipients: + kmaork, asvetlov, yselivanov, Zhibin Dong
2020-02-23 21:49:07	kmaork	set	messageid: <1582494547.48.0.927626445308.issue39622@roundup.psfhosted.org>
2020-02-23 21:49:07	kmaork	link	issue39622 messages
2020-02-23 21:49:07	kmaork	create