This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author mbussonn
Recipients chris.jerdonek, giampaolo.rodola, mbussonn, ncoghlan, njs, yselivanov
Date 2017-12-13.19:14:36
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1513192476.96.0.213398074469.issue30491@psf.upfronthosting.co.za>
In-reply-to
Content
Let me try to explain better, I'm pretty sure there is just a misunderstanding
from some of use in the vocabulary or presupposition we start from. I have the
impression that you feel like the API will automatically make coroutine raise
when they are not awaited, while I have the impression that in only provide a
tool for frameworks to detect unawaited coros and raise As soon as possible in
the right place.


We have the scheduler/eventloop that advance users tasks. Each time a user task
yield to the scheduler directly, or indirectly, the scheduler want to check that
since last time it was called in the same task no non-awaited coroutines are
pending, that is to say

    await sleep(0) # yield to scheduler
    c1 = coro() 
    await c1

    c2 = coro()

    c3 = coro()
    del c3
    gc.collect()

    await sleep(0) # yield again.


We want to make sure that between the two `sleep(0)` no coro is unawaited. Let's
assume `await coro()` does not itself yield to the scheduler for simplicity. In
above case you can see that c2 is not awaited, so according to Nathaniel
proposal, `sys.collect_unawaited_coroutines()` would returns `[c2, c3]`, So the
second sleep(0) can raise NonAwaitedCoroutineError, (and if tracemalloc is
enabled give more informations). 



You propose to use 
  weakref.ref(coro, trio._untrack_coro)

And I suppose you want us to do

  def _untrack_coro(self, ref):
      coro = ref()
      if inspect.getcoroutinestate(c) == 'CORO_STARTED'
        .. do what's necessary for unawaited coroutine. 
      else:
        # just discard from the tracked list.


Unfortunately:

> the weak reference object will be passed as the only parameter to the
> callback; *the referent will no longer be available*.

Thus we cannot check the state of the collected coroutine, according to my
understanding and experiments (Also we need to keep ref alive, but that's a
minor inconvenience). My guess is the referent is not around for users not to
suddenly add a ref to it...

So such a tracked list could only be use to know if coroutines where GC'd or
not.


If you are using wekrefs, you may also think about
`sys.collect_unawaited_coroutines()` returning weakref list, but that would lead
to `WeakRefList[c2]` because c3 was collected :-(. While we still raise because of
c2, a user logic only containing c3 would pass the "no unawaited coroutines".

To respond to your answer more specifically:

> 1. How will trio handle situations like:

>    c = coro()
>    await ...
>    nursery.start_soon(c)


It depends what your `...` is, if its yielding to the trio scheduler, then the
scheduler will check unawaited coroutines, see that the list is non-empty and
raise a NonAwaitedCoroutineError at your `await ...`. In the case your `await
...` does not yield.. then nothing. 

> Maybe creating a coroutine and not immediately passing it to 'start_soon' or similar API is an anti-pattern in Trio, but it is an OK thing to do in asyncio/curio/twisted.

Yes, and the trio/curio/asyncio library is responsible to check
`sys.collect_unawaited_coroutines()`, and react appropriately. Trio will likely
decide to raise. Asyncio can just forget this option exists.

> 2. What if another framework (say asyncio that you want to interoperate with) calls "sys.set_unawaited_coroutine_tracking(False)"?  

> The API you propose has the *same* pitfall that 'sys.set_coroutine_wrapper()' has, when used by several frameworks simultaneously.

As far as I understand you'll need to yield back to the other framework at some
point. Then it's IMHO to the author to make sure to save the state of
`sys.collect_unawaited_coroutines()`, set
`sys.set_unawaited_coroutine_tracking(enabled: bool)` to what the other
framework expect, and  do the opposite when they get control back. 


> 3. Given (2), why don't you have a coroutine wrapper like this:

>  def trio_coroutine_wrapper(coro):
>      ref = weakref.ref(coro, trio._untrack_coro)
>      trio._track_coro(ref)
>      return coro

We can't use weakref (cf issue with c3/weakref above).


We could track all coroutines, with a normal container, which is exactly what we
do here[1], except now everytime you need to check
`inspect.getcoroutinestate(coro) == 'CORO_CREATED'` on every item. You can't
use weakref callback as you can't check the coro state in this case. The check
on all coro everytime does add potentially significant overhead, AND block the
GC of awaited couroutines in between checks. While we _do want_ to  block GC of
coroutine only if `sys.collect_unawaited_coroutines()` is set to true and
coroutine is not awaited.

Also we like to keep the actual non-awaited coro around for error messages, but
I guess we can do without. 

One alternative would be the ability to add a callback when coros change
states... But I though coroutines where one way to avoid callbacks...


Now you have way more understanding of core CPython than me, and I may just not
figure out what you would write in your `trio._untrack_coro`, or miss something
else. I hope that this exposition at least help to clarify the various sue case
and attempt. Sorry for the long response. 

1: https://github.com/python-trio/trio/pull/176/files#diff-ce66495f4b73d01ba38f06ae87955d3aR54
History
Date User Action Args
2017-12-13 19:14:36mbussonnsetrecipients: + mbussonn, ncoghlan, giampaolo.rodola, njs, chris.jerdonek, yselivanov
2017-12-13 19:14:36mbussonnsetmessageid: <1513192476.96.0.213398074469.issue30491@psf.upfronthosting.co.za>
2017-12-13 19:14:36mbussonnlinkissue30491 messages
2017-12-13 19:14:36mbussonncreate