Title: asyncio.gather drops cancellation
Type: behavior Stage: resolved
Components: asyncio Versions: Python 3.5
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: JohannesEbke, MartinAltmayer, gvanrossum, python-dev, tatellos, vstinner, yselivanov
Priority: normal Keywords: patch

Created on 2016-05-03 12:50 by JohannesEbke, last changed 2016-10-21 21:26 by yselivanov. This issue is now closed.

File name Uploaded Description Edit JohannesEbke, 2016-05-03 12:50 Test case demonstrating ineffective cancellation.
asyncio_task_prints.patch JohannesEbke, 2016-06-15 08:21
fix_and_test_26923.patch JohannesEbke, 2016-06-20 11:27 Correct return value of _GatheringFuture.cancel with test review
fix_and_test_26923_reviewed.patch JohannesEbke, 2016-06-22 09:00 Correct return value of _GatheringFuture.cancel with test review
Messages (13)
msg264719 - (view) Author: Johannes Ebke (JohannesEbke) * Date: 2016-05-03 12:50
In a very specific case, asyncio.gather causes a cancel() call on the Task waiting on the _GatheringFuture to return a false positive. An example program demonstrating this is attached.

The context: asyncio.gather creates a _GatheringFuture with a list of child Futures. Each child Future carries a done callback that updates the _GatheringFuture, and once the last child done callback is executed, the _GatheringFuture sets its result.

The specific situation: When the last child future changes state to done it schedules its done callbacks, but does not immediately execute them. If the Task waiting on the gather is then cancelled before the last child done callback has run, the cancel method in asyncio/ determines that the _GatheringFuture inspects itself and sees that it is not done() yet, and proceeds to cancel all child futures - which has no effect, since all of them are already done(). It still returns True, so the Tasks thinks all is well, and proceeds with its execution.

The behaviour I would expect is that if cancel() is called on the _GatheringFuture, and all children return False on cancel(), then _GatheringFuture.cancel() should also return False, i.e.:

def cancel(self):
    if self.done():
        return False
    at_least_one_child_cancelled = False
    for child in self._children:
        if child.cancel():
            at_least_one_child_cancelled = True
    return at_least_one_child_cancelled

If I replace _GatheringFuture.cancel with the above variant, the bug disappears for me.

More context: We hit this bug sporadically in an integration test of aioredis, where some timings conspired to make it appear with a chance of about 1 in 10. The minimal example calls the cancellation in a done_callback, so that it always hits the window. This was not the way the bug was discovered.
msg264722 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-05-03 13:44
The doc:
"Cancellation: if the outer Future is cancelled, all children (that have not completed yet) are also cancelled. If any child is cancelled, this is treated as if it raised CancelledError – the outer Future is not cancelled in this case. (This is to prevent the cancellation of one child to cause other children to be cancelled.)"

(Sorry, I didn't read your issue yet.)
msg264727 - (view) Author: Johannes Ebke (JohannesEbke) * Date: 2016-05-03 14:02
Thank you for providing the relevant documentation link.

I just noticed that it should probably be clarified that in case the outer Future is cancelled, and all children that are not already done ignore the cancellation (a.k.a. catch the CancelledError), the cancellation of the outer Future does not continue.

This is different to the behaviour of asyncio.wait_for, which always raises a CancelledError.
msg268537 - (view) Author: Martin Altmayer (MartinAltmayer) * Date: 2016-06-14 07:31
I don't think this is a mere documentation problem: If a future cannot be cancelled because it is already done, cancel must return False.

As Johannes' example demonstrates, a wrong return value from cancel might lead to a cancelled task being continued as if nothing happened:  If Task.cancel receives a false positive from its _fut_waiter, it will not throw a CancelledError into the task (_must_cancel=True), but simply continue the task.
msg268596 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-06-15 00:55
I think I agree with Johannes. If all children refuse to be cancelled because they are already done, the outer _GatheringFuture might as well refuse to be cancelled as well.

However I'm not sure I actually understand the mechanism whereby the calling Task ends up surviving, and Johannes' description appears garbled.

Can anyone add some print statements to various parts and explain it here?
msg268609 - (view) Author: Johannes Ebke (JohannesEbke) * Date: 2016-06-15 08:21
On rereading my original description, it really is not clearly described why the calling Task ends up surviving. 

Attached is a patch to the 3.5.1 asyncio/ which adds some print statements in Task.cancel().

If I execute the with the patch applied, the output looks like this:

Cancelling the task: <Task pending coro=<gather_test() running at> wait_for=<_GatheringFuture pending cb=[Task._wakeup()]> cb=[Task._wakeup()]>
Entered Task.cancel() for task <Task pending coro=<gather_test() running at> wait_for=<_GatheringFuture pending cb=[Task._wakeup()]> cb=[Task._wakeup()]>
Task is not done() and we have a _fut_waiter: Cancelling fut_waiter <_GatheringFuture pending cb=[Task._wakeup()]>
Entered Task.cancel() for task <Task finished coro=<sleep() done, defined at /home/ebkej/.virtualenvs/kialo/lib/python3.5/asyncio/> result=None>
Task is done(), refusing to cancel
Great, _fut_waiter has agreed to be cancelled. We can now also return True
Cancellation returned:  True
All children finished OK, setting _GatheringFuture results!
Finished gathering.
Proof that this is running even though it was cancelled

The Task keeps on running because Task.cancel() trusts its _fut_waiter task to handle the cancellation correctly if its cancel() method returns True. If it returns False, it handles the Cancellation itself.

In this case, that _fut_waiter continues on, and proceeds to set results etc. so that the calling tasks cannot distinguish this from a CancellationError which has been deliberately caught.

I hope this explanation is a bit clearer than the first one.
msg268743 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-06-17 21:40
Thanks! I had eventually pieced together the same explanation. So yes, I
think your fix is right, though I would write it like this:

        ret = False
        for child in self._children:
            ret |= child.cancel()
        return ret  # True if at least one child.cancel() call returned True

It would also be nice if there was a test for this behavior.

I keep worrying a bit -- a similar bug could exist in other pieces of the
code, or in other libraries. But I guess we can't do much about that.
msg268893 - (view) Author: Johannes Ebke (JohannesEbke) * Date: 2016-06-20 11:27
Right, that's neater. Attached is a patch with your version and a test. I checked that it fails with the old version of cancel() and passes with the new one.

Concerning possible other bugs, I've had a look in the standard library, but could not find another instance where Future.cancel() is overwritten and has special handling. I also had a look at the try/except Exception blocks in lib/asyncio, but possible Cancellations are handled correctly there.

I believe the main source of bugs in this context will probably be other asyncio-based libraries. Either by inadvertently catching CancellationErrors in a try/except Exception block and treating them like other errors, or by not protecting resources with try/finally across yield points which might throw a CancellationError.

Not all libraries use cancel() internally, so the authors might not be aware that they have to write "cancellation-safe" code.
msg268928 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-06-20 20:33
@Yury, any comments? The fix looks fine to me.
msg268930 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2016-06-20 20:42
> @Yury, any comments? The fix looks fine to me.

I've left a comment in code review. Other than that, the patch/approach looks good.
msg269063 - (view) Author: Johannes Ebke (JohannesEbke) * Date: 2016-06-22 09:00
Attached is a new version of the patch incorporating the review results.
msg279157 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-10-21 21:25
New changeset b1aa485fad1b by Yury Selivanov in branch '3.5':
Issue #26923: Fix asyncio.Gather to refuse being cancelled once all children are done.

New changeset b0af901b1e2a by Yury Selivanov in branch '3.6':
Merge 3.5 (issue #26923)

New changeset 2a228b03ea16 by Yury Selivanov in branch 'default':
Merge 3.6 (issue #26923)
msg279158 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2016-10-21 21:26
Thank you, Johannes!
Date User Action Args
2016-10-21 21:26:49yselivanovsetstatus: open -> closed
resolution: fixed
messages: + msg279158

stage: resolved
2016-10-21 21:25:13python-devsetnosy: + python-dev
messages: + msg279157
2016-06-22 09:00:21JohannesEbkesetfiles: + fix_and_test_26923_reviewed.patch

messages: + msg269063
2016-06-20 20:42:24yselivanovsetmessages: + msg268930
2016-06-20 20:33:13gvanrossumsetmessages: + msg268928
2016-06-20 11:27:20JohannesEbkesetfiles: + fix_and_test_26923.patch

messages: + msg268893
2016-06-17 21:40:05gvanrossumsetmessages: + msg268743
2016-06-15 08:21:09JohannesEbkesetfiles: + asyncio_task_prints.patch
keywords: + patch
messages: + msg268609
2016-06-15 00:55:15gvanrossumsetmessages: + msg268596
2016-06-14 07:31:14MartinAltmayersetnosy: + MartinAltmayer
messages: + msg268537
2016-05-04 07:29:45tatellossetnosy: + tatellos
2016-05-03 14:02:54JohannesEbkesetmessages: + msg264727
2016-05-03 13:44:21vstinnersetmessages: + msg264722
2016-05-03 12:50:06JohannesEbkecreate