classification
Title: Cancellation ignored by asyncio.wait_for can hang application
Type: behavior Stage:
Components: asyncio Versions: Python 3.9, Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: asvetlov, nmatravolgyi, yselivanov
Priority: normal Keywords:

Created on 2021-03-03 16:50 by nmatravolgyi, last changed 2021-03-03 21:33 by nmatravolgyi.

Files
File name Uploaded Description Edit
aio_wait_for_me.py nmatravolgyi, 2021-03-03 16:50 Demonstration of stuck task.
Messages (3)
msg388032 - (view) Author: nmatravolgyi (nmatravolgyi) Date: 2021-03-03 16:50
I have found myself debugging a *very* not intuitive behavior regarding asyncio.wait_for that I'd consider a bug/deficiency. The problem very simply put: wait_for will return the wrapped future's result even when it is being cancelled, ignoring the cancellation as it has never existed.

This will make parallel execution-waits hang forever if some simple conditions are met. From the perspective of this snippet every task must exit so it just needs to wait. I know cancellation *can* be ignored, but it is discouraged by the documentation for this reason exactly.

tasks = [...]
for t in tasks:
    t.cancel()
results = await asyncio.gather(*tasks, return_exceptions=True)

I already know that this behavior has been chosen because otherwise the returned value would be lost. But for many applications, losing an explicit cancellation error/event is just as bad.

The reason why ignoring the cancellation is critical is because the cancelling (arbiter) task cannot reliably solve it. In most cases having repeated cancellations in a polling wait can solve this, but it is ugly and does not work if the original wait_for construct is in a loop and will always ignore the cancellation.

The most sensible solution would be to allow the user to handle both the return value and the cancellation if they do happen at once. This can be done by subclassing the CancelledError as CancelledWithResultError and raising that instead. If the user code does not handle that exception specifically then the user "chose" to ignore the result. Even if this is not intuitive, it would give the user the control over what really is happening. Right now, the user cannot prefer to handle the cancellation or both.

Lastly, I may have overlooked something trivial to make this work well. Right now I'm considering replacing all of the asyncio.wait_for constructs with asyncio.wait constructs. I can fully control all tasks and cancellations with that. I've made a simple demonstration of my problem, maybe someone can shed some light onto it.
msg388059 - (view) Author: nmatravolgyi (nmatravolgyi) Date: 2021-03-03 21:18
I've quickly wanted to create a suitable solution for myself. I made a small library with a asyncio.wait_for()-like function using asyncio.wait(). The prototype worked, so I put together a small project. When I ran tox and realized that this issue with wait_for is only present on py38 and py39 (possibly py310). The wait_for does not get stuck with py36, py37 and pypy3.

The repo is a little bare bones, but you can run tox after checkout: https://github.com/Traktormaster/wait-for2

Right now the tests are set-up that they expect wait_for to get stuck so only py38 and py39 passes.

I'm pretty sure the side-effect of returning the future's result when handling cancellation is not desired. However I'm not sure how to handle it correctly. The repo holds a demo of what I suggested in the beginning of this thread (CancelledWithResultError). It works but it is limited.
msg388061 - (view) Author: nmatravolgyi (nmatravolgyi) Date: 2021-03-03 21:33
One more thing. I've figured out that I can fix the cancellation around the asyncio.wait_for() with asyncio.shield() like:

try:
    await asyncio.shield(wf := asyncio.ensure_future(asyncio.wait_for(self.event.wait(), timeout=60.0)))
except asyncio.CancelledError:
    wf.cancel()
    result = await asyncio.gather(wf, return_exceptions=True)
    # here I know there is a cancellation AND I might have a result as well!
    raise

However I don't like the idea of writing all that boilerplate for every wait_for usage. I still might be overlooking something, but at least I have adequate workarounds.

I'm curious what the consensus will be on this issue. I'm certain it should be documented though. Right now there is no mention of ignoring/eating a cancellation.
History
Date User Action Args
2021-03-03 21:33:14nmatravolgyisetmessages: + msg388061
2021-03-03 21:18:09nmatravolgyisetmessages: + msg388059
2021-03-03 16:50:58nmatravolgyicreate