This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author mpaolini
Recipients Guido.van.Rossum, Richard.Kiss, giampaolo.rodola, gvanrossum, mpaolini, pitrou, python-dev, richard.kiss, vstinner, yselivanov
Date 2014-08-18.14:20:57
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1408371658.76.0.102770170532.issue21163@psf.upfronthosting.co.za>
In-reply-to
Content
I finally wrapped my head around this. I wrote a (simpler) script to get a better picture.

What happens
-------------

When a consumer task is first istantiated, the loop holds a strong reference to it (_ready)

Later on, as the loop starts, the consumer task is yielded and it waits on an unreachable future. The last strong ref to it is lost (loop._ready).

It is not collected immediately because it just created a reference loop
(task -> coroutine -> stack -> future -> task) that will be broken only at task completion.

gc.collect() called *before* the tasks are ever run has the weird side effect of moving the automatic gc collection forward in time.
Automatic gc triggers after a few (but not all) consumers have become unreachable, depending on how many instructions were executed before running the loop.

gc.collect() called after all the consumers are waiting on the unreachable future reaps all consumer tasks as expected. No bug in garbage collection.

Yielding from asyncio.sleep() prevents the consumers from being 
collected: it creates a strong ref to the future in the loop.
I suspect also all network-related asyncio coroutines behave this way.

Summing up: Tasks that have no strong refs may be garbage collected unexpectedly or not at all, depending on which future they yield to. It is very difficult to debug and undestand why these tasks disappear.
 
Side note: the patches submitted and merged in this issue do emit the relevant warnings when PYTHONASYNCIODEBUG is set. This is very useful.

Proposed enhanchements
----------------------

1. Document that you should always keep strong refs to tasks or to futures/coroutines the tasks yields from. This knowledge is currently passed around the brave asyncio users like oral tradition.

2. Alternatively, keep strong references to all futures that make it through Task._step. We are already keeping strong refs to *some* of the asyncio builtin coroutines (`asyncio.sleep` is one of those). Also, we do keep strong references to tasks that are ready to be run (the ones that simply `yield` or the ones that have not started yet)

If you also think 1. or 2. are neeed, let me know and I'll try cook a patch.

Sorry for the noise
History
Date User Action Args
2014-08-18 14:20:58mpaolinisetrecipients: + mpaolini, gvanrossum, pitrou, vstinner, giampaolo.rodola, python-dev, yselivanov, Guido.van.Rossum, richard.kiss, Richard.Kiss
2014-08-18 14:20:58mpaolinisetmessageid: <1408371658.76.0.102770170532.issue21163@psf.upfronthosting.co.za>
2014-08-18 14:20:58mpaolinilinkissue21163 messages
2014-08-18 14:20:58mpaolinicreate