New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
asyncio.gather drops cancellation #71110
Comments
In a very specific case, asyncio.gather causes a cancel() call on the Task waiting on the _GatheringFuture to return a false positive. An example program demonstrating this is attached. The context: asyncio.gather creates a _GatheringFuture with a list of child Futures. Each child Future carries a done callback that updates the _GatheringFuture, and once the last child done callback is executed, the _GatheringFuture sets its result. The specific situation: When the last child future changes state to done it schedules its done callbacks, but does not immediately execute them. If the Task waiting on the gather is then cancelled before the last child done callback has run, the cancel method in asyncio/tasks.py:578 determines that the _GatheringFuture inspects itself and sees that it is not done() yet, and proceeds to cancel all child futures - which has no effect, since all of them are already done(). It still returns True, so the Tasks thinks all is well, and proceeds with its execution. The behaviour I would expect is that if cancel() is called on the _GatheringFuture, and all children return False on cancel(), then _GatheringFuture.cancel() should also return False, i.e.: def cancel(self):
if self.done():
return False
at_least_one_child_cancelled = False
for child in self._children:
if child.cancel():
at_least_one_child_cancelled = True
return at_least_one_child_cancelled If I replace _GatheringFuture.cancel with the above variant, the bug disappears for me. More context: We hit this bug sporadically in an integration test of aioredis, where some timings conspired to make it appear with a chance of about 1 in 10. The minimal example calls the cancellation in a done_callback, so that it always hits the window. This was not the way the bug was discovered. |
The doc: (Sorry, I didn't read your issue yet.) |
Thank you for providing the relevant documentation link. I just noticed that it should probably be clarified that in case the outer Future is cancelled, and all children that are not already done ignore the cancellation (a.k.a. catch the CancelledError), the cancellation of the outer Future does not continue. This is different to the behaviour of asyncio.wait_for, which always raises a CancelledError. |
I don't think this is a mere documentation problem: If a future cannot be cancelled because it is already done, cancel must return False. As Johannes' example demonstrates, a wrong return value from cancel might lead to a cancelled task being continued as if nothing happened: If Task.cancel receives a false positive from its _fut_waiter, it will not throw a CancelledError into the task (_must_cancel=True), but simply continue the task. |
I think I agree with Johannes. If all children refuse to be cancelled because they are already done, the outer _GatheringFuture might as well refuse to be cancelled as well. However I'm not sure I actually understand the mechanism whereby the calling Task ends up surviving, and Johannes' description appears garbled. Can anyone add some print statements to various parts and explain it here? |
On rereading my original description, it really is not clearly described why the calling Task ends up surviving. Attached is a patch to the 3.5.1 asyncio/tasks.py which adds some print statements in Task.cancel(). If I execute the cancellation_test.py with the patch applied, the output looks like this: ------------------------ The Task keeps on running because Task.cancel() trusts its _fut_waiter task to handle the cancellation correctly if its cancel() method returns True. If it returns False, it handles the Cancellation itself. In this case, that _fut_waiter continues on, and proceeds to set results etc. so that the calling tasks cannot distinguish this from a CancellationError which has been deliberately caught. I hope this explanation is a bit clearer than the first one. |
Thanks! I had eventually pieced together the same explanation. So yes, I ret = False
for child in self._children:
ret |= child.cancel()
return ret # True if at least one child.cancel() call returned True It would also be nice if there was a test for this behavior. I keep worrying a bit -- a similar bug could exist in other pieces of the |
Right, that's neater. Attached is a patch with your version and a test. I checked that it fails with the old version of cancel() and passes with the new one. Concerning possible other bugs, I've had a look in the standard library, but could not find another instance where Future.cancel() is overwritten and has special handling. I also had a look at the try/except Exception blocks in lib/asyncio, but possible Cancellations are handled correctly there. I believe the main source of bugs in this context will probably be other asyncio-based libraries. Either by inadvertently catching CancellationErrors in a try/except Exception block and treating them like other errors, or by not protecting resources with try/finally across yield points which might throw a CancellationError. Not all libraries use cancel() internally, so the authors might not be aware that they have to write "cancellation-safe" code. |
@yury, any comments? The fix looks fine to me. |
I've left a comment in code review. Other than that, the patch/approach looks good. |
Attached is a new version of the patch incorporating the review results. |
New changeset b1aa485fad1b by Yury Selivanov in branch '3.5': New changeset b0af901b1e2a by Yury Selivanov in branch '3.6': New changeset 2a228b03ea16 by Yury Selivanov in branch 'default': |
Thank you, Johannes! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: