classification
Title: async generators aclose() behavior in 3.8
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.9, Python 3.8
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: asvetlov, gvanrossum, lukasz.langa, ncoghlan, njs, yselivanov
Priority: normal Keywords:

Created on 2019-10-22 21:48 by yselivanov, last changed 2019-10-24 04:03 by yselivanov. This issue is now closed.

Messages (5)
msg355158 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2019-10-22 21:48
I believe I might have discovered a problem with asynchronous generators in 3.8.


# Prelude

In Python prior to 3.8 it was possible to overlap running of "asend()" and "athrow()" methods for the same asynchronous generator.

In plain English, it was possible to await on "agen.asend()" while some other coroutine is already awaiting on "agen.asend()".  That created all kinds of problems with asynchronous generators when people used them to implement channels by trying to push values into the same asynchronous generator from different coroutines.  Regular code that used asynchronous generators in plain and non-sophisticated way did not care.

For classic synchronous generators we have a check for preventing overlapping use of "send()" and "throw()" -- the "gi_running" flag.  If the flag is set, both methods raise a RuntimeError saying that "generator already executing".


# Python 3.8

As was discussed in issues #30773 and #32526 we decided to replicate the same behavior for asynchronous generators, mainly:

* add an "ag_running" flag;

* set "ag_running" when "anext()" or "athrow()" begin to rung, and set it off when they are finished executing;

* if the flag is set when we are about to run "anext()" or "athrow()" it means that another coroutine is reusing the same generator object in parallel and so we raise a RuntimeError.


# Problem

Closing a generator involves throwing a GeneratorExit exception into it.  Throwing the exception is done via calling "throw()" for sync generators, and "athrow()" for async generators.

As shown in https://gist.github.com/1st1/d9860cbf6fe2e5d243e695809aea674c, it's an error to close a synchronous generator while it is being iterated.  This is how async generators *started to behave* in 3.8.

The problem is that asyncio's "loop.shutdown_asyncgens()" method tries to shutdown orphan asynchronous generators by calling "aclose()" on them.  The method is public API and is called by "asyncio.run()" automatically.

Prior to 3.8, calling "aclose()" worked (maybe not in the most clean way). A GeneratorExit was thrown into an asynchronous generator regardless of whether it was running or not, aborting the execution.

In 3.8, calling "aclose()" can crash with a RuntimeError.  It's no longer possible to *reliably cancel* a running asynchrounous generator.


# Dilemma

Replicating the behavior of synchronous generators in asynchronous generators seems like the right thing.  But it seems that the requirements for asynchronous generators are different, and 3.8 breaks backwards compat.


# Proposed Solution

We keep "asend()" and "athrow()" as is in 3.8.  They will continue to raise RuntimeError if used in parallel on the same async generator.

We modify "aclose()" to allow it being called in parallel with "asend()" or "athrow()".  This will restore the <3.8 behavior and fix the "loop.shutdown_asyncgens()" method.


Thoughts?
msg355160 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2019-10-22 22:09
Just discussed this issue off-list with Nathaniel.  Maybe this isn't as severe as I thought it is.

If we cancel all asyncio tasks before calling `loop.shutdown_asyncgens()` this *probably* becomes a non-issue.  And "asyncio.run()" does just that [1].

Any asynchronous generator ultimately needs a task to iterate it, so if we cancel tasks, we should cancel all asynchronous generators too.  Calling `loop.shutdown_asyncgens()` then ensures that there are no orphan async generators.  The only way (at least known to me) that that could be the case if some tasks ignores cancellation.  

Nathaniel, what do you think about this in the context of Trio?

If anything, we can update asyncio docs explaining the new behavior of async generators and how to use "loop.shutdown_asyncgens()".

[1] https://github.com/python/cpython/blob/91528f40c30717563a478920861c81d18da5bf63/Lib/asyncio/runners.py#L46
msg355163 - (view) Author: Nathaniel Smith (njs) * (Python committer) Date: 2019-10-22 22:35
I think conceptually, cancelling all tasks and waiting for them to exit is the right thing to do. That way you run as much shutdown code as possible in regular context, and also avoid this problem – if there are no tasks, then there can't be any tasks blocked in __anext__/asend/athrow/aclose, so calling aclose is safe.

This is also the right way to handle it from Trio's perspective – we always wait for all tasks to exit before exiting our run loop. For asyncio it might be a bit more complicated though so we should think through the trade-offs.

> The only way (at least known to me) that that could be the case if some tasks ignores cancellation.  

Looking at runners.py, it seems like asyncio.run doesn't just call task.cancel(), but actually waits for all tasks to exit. (See the call to gather() inside _cancel_all_tasks.) So I think that does guarantee that by the time we hit shutdown_asyncgens(), we can be certain that all tasks have exited. (If a task ignores cancellation, then that will cause _cancel_all_tasks to hang, so we never reach shutdown_asyncgens at all. Not ideal, but not really our problem either – if someone complains we just tell them they need to stop ignoring cancellation if they want  their program to shutdown cleanly.)

So I think asyncio.run is actually totally fine with the 3.8.0 behavior. It's only explicit calls to shutdown_asyncgens that might run into this, and I think that's probably OK? If you're calling shutdown_asyncgens by hand then you're kind of taking responsibility for dealing with any subtle issues around shutdown.

And if we did allow aclose() to run at any time, then I worry that that could cause serious problems. Users can call aclose() at will, and it's generally good practice to always call aclose() on your async generators explicitly. So it's entirely possible that users will accidentally call aclose() themselves while they have another task is blocked in __anext__. And in that case.... what do we do?

The aclose call has to consume the generator's frame, by definition. But in the mean time, we still have this __anext__ coroutine object hanging around pointing to the consumed frame, that the task scheduler thinks needs to be resumed at some point in response to some event, and that can cause all kinds of weird situations. The __anext__ might try to resume while the aclose() is running. The __anext__ might get stuck and never resume, because of the code run by aclose() accidentally unregistering the event that was going to wake it up, so you just get a deadlock instead of a useful error. It might cause some other code entirely to crash – like if the __anext__ was blocked inside a call to 'await queue.get()', then aclose() interrupting that might corrupt the queue's internal state, so the next time some other task calls queue.put() they get an exception or other incorrect behavior. It all seems really tricky and messy to me.

So... this is super subtle and confusing, but I *think* the conclusion is that yeah, 3.8.0 is fine and there's no urgent action needed.
msg355165 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2019-10-22 22:44
> So I think asyncio.run is actually totally fine with the 3.8.0 behavior.

Yes.  Looks like it.

> It's only explicit calls to shutdown_asyncgens that might run into this, and I think that's probably OK?

Yes.  Calling shutdown_asyncgens after all tasks are cancelled is still useful to cleanup asynchronous generators that were created but not yet GCed or iterated.

> And if we did allow aclose() to run at any time, then I worry that that could cause serious problems. Users can call aclose() at will, and it's generally good practice to always call aclose() on your async generators explicitly. So it's entirely possible that users will accidentally call aclose() themselves while they have another task is blocked in __anext__. And in that case.... what do we do?

I agree.  Avoiding that kind of mess was the main motivation to fix ag_running.

> So... this is super subtle and confusing, but I *think* the conclusion is that yeah, 3.8.0 is fine and there's no urgent action needed.

Let's see if anyone else thinks this might be a problem, but yeah, it seems that the only programs that can suffer from this change are those that don't use 'asyncio.run'.  And if so they should be fixed to explicitly cancel all tasks before calling shutdown_asyncgens.
msg355276 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2019-10-24 04:03
The more I think about this the more I like new 3.8 behavior. I'm going to close this issue; if anything comes up I'll reopen it later. Sorry for the noise.
History
Date User Action Args
2019-10-24 04:03:21yselivanovsetstatus: open -> closed
resolution: not a bug
messages: + msg355276

stage: resolved
2019-10-22 22:44:45yselivanovsetmessages: + msg355165
2019-10-22 22:35:10njssetmessages: + msg355163
2019-10-22 22:09:37yselivanovsetmessages: + msg355160
2019-10-22 21:48:33yselivanovcreate