classification
Title: asyncio.all_tasks() crashes if asyncio is used in multiple threads
Type: behavior Stage:
Components: asyncio Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Nick Davies, asvetlov, lukasz.langa, miss-islington, sdunster, thatch, yselivanov
Priority: normal Keywords: patch

Created on 2019-04-11 18:12 by Nick Davies, last changed 2019-06-12 17:26 by asvetlov.

Pull Requests
URL Status Linked Edit
PR 13971 merged asvetlov, 2019-06-11 10:54
PR 13975 merged miss-islington, 2019-06-11 15:27
PR 13976 merged miss-islington, 2019-06-11 15:28
Messages (11)
msg339991 - (view) Author: Nick Davies (Nick Davies) Date: 2019-04-11 18:12
This problem was identified in https://bugs.python.org/issue34970 but I think the fix might have been incorrect. The theory in issue34970 was that GC was causing the weakrefset for `all_tasks` to change during iteration. However Weakset provides an `_IterationGuard` class to prevent GC from changing the set during iteration and hence preventing this problem in a single thread.

My thoughts on this problem are:
 - `asyncio.tasks._all_tasks` is shared for all users of asyncio (https://github.com/python/cpython/blob/3.7/Lib/asyncio/tasks.py#L818)
 - Any new Task constructed mutates `_all_tasks` (https://github.com/python/cpython/blob/3.7/Lib/asyncio/tasks.py#L117)
 - _IterationGuard won't protect iterations in this case because calls to Weakset.add will always commit changes even if there is something iterating (https://github.com/python/cpython/blob/3.6/Lib/_weakrefset.py#L83)
 - calls to `asyncio.all_tasks` or `asyncio.tasks.Task.all_tasks` crash if any task is started on any thread during iteration.

Repro code:

```
import asyncio
from threading import Thread


async def do_nothing():
    await asyncio.sleep(0)


async def loop_tasks():
    loop = asyncio.get_event_loop()
    while True:
        loop.create_task(do_nothing())
        await asyncio.sleep(0.01)


def old_thread():
    loop = asyncio.new_event_loop()
    while True:
        asyncio.tasks.Task.all_tasks(loop=loop)


def new_thread():
    loop = asyncio.new_event_loop()
    while True:
        asyncio.all_tasks(loop=loop)


old_t = Thread(target=old_thread)
new_t = Thread(target=new_thread)
old_t.start()
new_t.start()
asyncio.run(loop_tasks())
```

Output:

```
Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.7/threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "tmp/test_asyncio.py", line 25, in new_thread
    asyncio.all_tasks(loop=loop)
  File "/usr/lib/python3.7/asyncio/tasks.py", line 40, in all_tasks
    return {t for t in list(_all_tasks)
  File "/usr/lib/python3.7/_weakrefset.py", line 60, in __iter__
    for itemref in self.data:
RuntimeError: Set changed size during iteration

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.7/threading.py", line 865, in run
    self._target(*self._args, **self._kwargs)
  File "tmp/test_asyncio.py", line 19, in old_thread
    asyncio.tasks.Task.all_tasks(loop=loop)
  File "/usr/lib/python3.7/asyncio/tasks.py", line 52, in _all_tasks_compat
    return {t for t in list(_all_tasks) if futures._get_loop(t) is loop}
  File "/usr/lib/python3.7/_weakrefset.py", line 60, in __iter__
    for itemref in self.data:
RuntimeError: Set changed size during iteration
```
msg340424 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2019-04-17 18:25
Thanks for the report!

I see 3 ways to fix the bug:
1. Guard _all_tasks with threading.Lock. It hurts performance significantly.
2. Retry list(_all_tasks) call in a loop if RuntimeError was raised. A chance of collision is very low, the strategy is good enough
3. Change _all_tasks from weak set of tasks to WeakDict[AbstractEventLoop, WeakSet[Task]]. This realization eliminates the collision it is a little complicated. Plus loop should be hashable now (perhaps ok but I'm not sure if all third-party loops support it).

Thus I'm inclining to bullet 2.
THoughts?
msg340425 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2019-04-17 18:26
The fix can be applied to 3.7 and 3.8 only, sorry.
Python 3.6 is in security mode now.
msg340428 - (view) Author: Nick Davies (Nick Davies) Date: 2019-04-17 18:41
My preference would actually be number 3 because:

 1: I agree that this isn't really a safe option because it could slow things down (possibly a lot)
 2: I haven't found this to be rare in my situation but I am not sure how common my setup is. We have a threaded server with a mix of sync and asyncio so we use run in a bunch of places inside threads. Any time the server gets busy any task creation that occurs during the return of run crashes. My two main reservations about this approach are:
    - There is a potentially unbounded number of times that this could need to retry.
    - Also this is covering up a thread unsafe operation and we are pretty lucky based on the current implementation that it explodes in a consistent and sane way that we can catch and retry.
 3: Loop is already expected to be hashable in 3.7 as far as I can tell (https://github.com/python/cpython/blob/3.7/Lib/asyncio/tasks.py#L822) so other than the slightly higher complexity this feels like the cleanest solution.


> The fix can be applied to 3.7 and 3.8 only, sorry. Python 3.6 is in security mode now.

Thats fine, you can work around the issue using asyncio.set_task_factory to something that tracks the tasks per loop with some care.
msg340527 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2019-04-19 08:14
Sorry, I've missed that the loop has hashable requirement already.
Would you prepare a patch for number 3? 
I am afraid we can add another hard-to-debug multi-threaded problem by complicating the data structure.

I'm just curious why do you call `all_tasks()` at all?
In my mind, the only non-debug usage is `asyncio.run()`
msg340544 - (view) Author: Nick Davies (Nick Davies) Date: 2019-04-19 13:54
> Would you prepare a patch for number 3?

I will give it a try and see what I come up with.

> I am afraid we can add another hard-to-debug multi-threaded problem by complicating the data structure.

Yeah this was my concern too, the adding and removing from the WeakDict[AbstractEventLoop, WeakSet[Task]] for `_all_tasks` could still cause issues. Specifically the whole WeakSet class is not threadsafe so I would assume WeakDict is the same, there may not be a nice way of ensuring a combination of GC + the IterationGuard doesn't come and mess up the dict even if I wrap it in a threading lock.

Another option would be to have the WeakSet[Task] attached to the loop itself then because using the same loop in multiple threads not at all thread safe already that would contain the problem. You mentioned "third-party loops" which may make this option impossible.

> I'm just curious why do you call `all_tasks()` at all?
> In my mind, the only non-debug usage is `asyncio.run()`

In reality we aren't using `all_tasks()` directly. We are calling `asyncio.run()` from multiple threads which triggers the issue. The repro I provided was just a more reliable way of triggering the issue. I will paste a slightly more real-world example of how this happened below. This version is a little more messy and harder to see exactly what the problem is which is why I started with the other one.

```
import asyncio
from threading import Thread


async def do_nothing(n=0):
    await asyncio.sleep(n)


async def loop_tasks():
    loop = asyncio.get_event_loop()
    while True:
        loop.create_task(do_nothing())
        await asyncio.sleep(0.01)


async def make_tasks(n):
    loop = asyncio.get_event_loop()
    for i in range(n):
        loop.create_task(do_nothing(1))
    await asyncio.sleep(1)


def make_lots_of_tasks():
    while True:
        asyncio.run(make_tasks(10000))


for i in range(10):
    t = Thread(target=make_lots_of_tasks)
    t.start()

asyncio.run(loop_tasks())
```
msg345239 - (view) Author: miss-islington (miss-islington) Date: 2019-06-11 15:27
New changeset 65aa64fae89a24491aae84ba0329eb8f3c68c389 by Miss Islington (bot) (Andrew Svetlov) in branch 'master':
bpo-36607: Eliminate RuntimeError raised by asyncio.all_tasks() (GH-13971)
https://github.com/python/cpython/commit/65aa64fae89a24491aae84ba0329eb8f3c68c389
msg345247 - (view) Author: miss-islington (miss-islington) Date: 2019-06-11 16:01
New changeset 5d1d4e3179ffd4d2d72462d870cf86dcc11450ce by Miss Islington (bot) in branch '3.7':
bpo-36607: Eliminate RuntimeError raised by asyncio.all_tasks() (GH-13971)
https://github.com/python/cpython/commit/5d1d4e3179ffd4d2d72462d870cf86dcc11450ce
msg345266 - (view) Author: miss-islington (miss-islington) Date: 2019-06-11 20:32
New changeset 83abd9658b4845299452be06c9ce92cbceee944d by Miss Islington (bot) in branch '3.8':
bpo-36607: Eliminate RuntimeError raised by asyncio.all_tasks() (GH-13971)
https://github.com/python/cpython/commit/83abd9658b4845299452be06c9ce92cbceee944d
msg345380 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2019-06-12 16:54
Note: there's a discussion on GitHub on PR 13971 being a bad solution: https://github.com/python/cpython/pull/13971#issuecomment-500908198

They should be reverted (or fixed forward) for 3.8 and 3.9.  I'm OK with this modifying the AbstractEventLoop API if need be for 3.8.  Just have a PR ready in the next two weeks.
msg345389 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2019-06-12 17:26
Lukasz, please don't rush.
Applied PR makes asyncio more stable, not worse.

I see the only way to make it perfect: get rid of weak refs entirely.
Otherwise there is always a chance to catch GC run and set element deletion on iteration over the set. Sorry, that's how weakrefs work.

There is such possibility: call asyncio._unregister_task() explicitly when the task is done (finished with success or failure or cancelled).
By this, we can replace weakset with a regular set.

The only requirement is that task should call this _unregister_task() method.
No public API change is needed.

At the time of work on 3.7, Yuri and I considered this implementation but rejected it because there was a (very low) chance that somebody may implement own task, register custom task factory and don't call _unregister_task().

I never see a code that implements asyncio task from scratch, people always reuse existing asyncio.Task.

So, maybe the idea is not such bad. It can be implemented easily. I'm volunteered to make a PR with the proposal demonstration in a day or two, depending on my free time.
History
Date User Action Args
2019-06-12 17:26:52asvetlovsetmessages: + msg345389
2019-06-12 16:54:54lukasz.langasetnosy: + lukasz.langa
messages: + msg345380
2019-06-11 20:32:28miss-islingtonsetmessages: + msg345266
2019-06-11 20:15:54asvetlovsetstatus: closed -> open
resolution: fixed ->
stage: resolved ->
2019-06-11 16:01:35miss-islingtonsetmessages: + msg345247
2019-06-11 15:33:40asvetlovsetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2019-06-11 15:28:01miss-islingtonsetpull_requests: + pull_request13843
2019-06-11 15:27:53miss-islingtonsetpull_requests: + pull_request13842
2019-06-11 15:27:35miss-islingtonsetnosy: + miss-islington
messages: + msg345239
2019-06-11 10:56:12asvetlovsetversions: + Python 3.9
2019-06-11 10:54:31asvetlovsetkeywords: + patch
stage: patch review
pull_requests: + pull_request13838
2019-04-19 13:54:20Nick Daviessetmessages: + msg340544
2019-04-19 08:14:08asvetlovsetmessages: + msg340527
2019-04-17 18:41:17Nick Daviessetmessages: + msg340428
2019-04-17 18:26:59asvetlovsetmessages: + msg340425
versions: + Python 3.8, - Python 3.6
2019-04-17 18:25:01asvetlovsetmessages: + msg340424
2019-04-17 17:10:16thatchsetnosy: + thatch
2019-04-16 15:53:56Nick Daviessettype: behavior
versions: + Python 3.6
2019-04-12 16:06:23sdunstersetnosy: + sdunster
2019-04-11 18:12:48Nick Daviescreate