Issue36607
Created on 2019-04-11 18:12 by Nick Davies, last changed 2019-06-12 17:26 by asvetlov.
Pull Requests | |||
---|---|---|---|
URL | Status | Linked | Edit |
PR 13971 | merged | asvetlov, 2019-06-11 10:54 | |
PR 13975 | merged | miss-islington, 2019-06-11 15:27 | |
PR 13976 | merged | miss-islington, 2019-06-11 15:28 |
Messages (11) | |||
---|---|---|---|
msg339991 - (view) | Author: Nick Davies (Nick Davies) | Date: 2019-04-11 18:12 | |
This problem was identified in https://bugs.python.org/issue34970 but I think the fix might have been incorrect. The theory in issue34970 was that GC was causing the weakrefset for `all_tasks` to change during iteration. However Weakset provides an `_IterationGuard` class to prevent GC from changing the set during iteration and hence preventing this problem in a single thread. My thoughts on this problem are: - `asyncio.tasks._all_tasks` is shared for all users of asyncio (https://github.com/python/cpython/blob/3.7/Lib/asyncio/tasks.py#L818) - Any new Task constructed mutates `_all_tasks` (https://github.com/python/cpython/blob/3.7/Lib/asyncio/tasks.py#L117) - _IterationGuard won't protect iterations in this case because calls to Weakset.add will always commit changes even if there is something iterating (https://github.com/python/cpython/blob/3.6/Lib/_weakrefset.py#L83) - calls to `asyncio.all_tasks` or `asyncio.tasks.Task.all_tasks` crash if any task is started on any thread during iteration. Repro code: ``` import asyncio from threading import Thread async def do_nothing(): await asyncio.sleep(0) async def loop_tasks(): loop = asyncio.get_event_loop() while True: loop.create_task(do_nothing()) await asyncio.sleep(0.01) def old_thread(): loop = asyncio.new_event_loop() while True: asyncio.tasks.Task.all_tasks(loop=loop) def new_thread(): loop = asyncio.new_event_loop() while True: asyncio.all_tasks(loop=loop) old_t = Thread(target=old_thread) new_t = Thread(target=new_thread) old_t.start() new_t.start() asyncio.run(loop_tasks()) ``` Output: ``` Exception in thread Thread-2: Traceback (most recent call last): File "/usr/lib/python3.7/threading.py", line 917, in _bootstrap_inner self.run() File "/usr/lib/python3.7/threading.py", line 865, in run self._target(*self._args, **self._kwargs) File "tmp/test_asyncio.py", line 25, in new_thread asyncio.all_tasks(loop=loop) File "/usr/lib/python3.7/asyncio/tasks.py", line 40, in all_tasks return {t for t in list(_all_tasks) File "/usr/lib/python3.7/_weakrefset.py", line 60, in __iter__ for itemref in self.data: RuntimeError: Set changed size during iteration Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib/python3.7/threading.py", line 917, in _bootstrap_inner self.run() File "/usr/lib/python3.7/threading.py", line 865, in run self._target(*self._args, **self._kwargs) File "tmp/test_asyncio.py", line 19, in old_thread asyncio.tasks.Task.all_tasks(loop=loop) File "/usr/lib/python3.7/asyncio/tasks.py", line 52, in _all_tasks_compat return {t for t in list(_all_tasks) if futures._get_loop(t) is loop} File "/usr/lib/python3.7/_weakrefset.py", line 60, in __iter__ for itemref in self.data: RuntimeError: Set changed size during iteration ``` |
|||
msg340424 - (view) | Author: Andrew Svetlov (asvetlov) * ![]() |
Date: 2019-04-17 18:25 | |
Thanks for the report! I see 3 ways to fix the bug: 1. Guard _all_tasks with threading.Lock. It hurts performance significantly. 2. Retry list(_all_tasks) call in a loop if RuntimeError was raised. A chance of collision is very low, the strategy is good enough 3. Change _all_tasks from weak set of tasks to WeakDict[AbstractEventLoop, WeakSet[Task]]. This realization eliminates the collision it is a little complicated. Plus loop should be hashable now (perhaps ok but I'm not sure if all third-party loops support it). Thus I'm inclining to bullet 2. THoughts? |
|||
msg340425 - (view) | Author: Andrew Svetlov (asvetlov) * ![]() |
Date: 2019-04-17 18:26 | |
The fix can be applied to 3.7 and 3.8 only, sorry. Python 3.6 is in security mode now. |
|||
msg340428 - (view) | Author: Nick Davies (Nick Davies) | Date: 2019-04-17 18:41 | |
My preference would actually be number 3 because: 1: I agree that this isn't really a safe option because it could slow things down (possibly a lot) 2: I haven't found this to be rare in my situation but I am not sure how common my setup is. We have a threaded server with a mix of sync and asyncio so we use run in a bunch of places inside threads. Any time the server gets busy any task creation that occurs during the return of run crashes. My two main reservations about this approach are: - There is a potentially unbounded number of times that this could need to retry. - Also this is covering up a thread unsafe operation and we are pretty lucky based on the current implementation that it explodes in a consistent and sane way that we can catch and retry. 3: Loop is already expected to be hashable in 3.7 as far as I can tell (https://github.com/python/cpython/blob/3.7/Lib/asyncio/tasks.py#L822) so other than the slightly higher complexity this feels like the cleanest solution. > The fix can be applied to 3.7 and 3.8 only, sorry. Python 3.6 is in security mode now. Thats fine, you can work around the issue using asyncio.set_task_factory to something that tracks the tasks per loop with some care. |
|||
msg340527 - (view) | Author: Andrew Svetlov (asvetlov) * ![]() |
Date: 2019-04-19 08:14 | |
Sorry, I've missed that the loop has hashable requirement already. Would you prepare a patch for number 3? I am afraid we can add another hard-to-debug multi-threaded problem by complicating the data structure. I'm just curious why do you call `all_tasks()` at all? In my mind, the only non-debug usage is `asyncio.run()` |
|||
msg340544 - (view) | Author: Nick Davies (Nick Davies) | Date: 2019-04-19 13:54 | |
> Would you prepare a patch for number 3? I will give it a try and see what I come up with. > I am afraid we can add another hard-to-debug multi-threaded problem by complicating the data structure. Yeah this was my concern too, the adding and removing from the WeakDict[AbstractEventLoop, WeakSet[Task]] for `_all_tasks` could still cause issues. Specifically the whole WeakSet class is not threadsafe so I would assume WeakDict is the same, there may not be a nice way of ensuring a combination of GC + the IterationGuard doesn't come and mess up the dict even if I wrap it in a threading lock. Another option would be to have the WeakSet[Task] attached to the loop itself then because using the same loop in multiple threads not at all thread safe already that would contain the problem. You mentioned "third-party loops" which may make this option impossible. > I'm just curious why do you call `all_tasks()` at all? > In my mind, the only non-debug usage is `asyncio.run()` In reality we aren't using `all_tasks()` directly. We are calling `asyncio.run()` from multiple threads which triggers the issue. The repro I provided was just a more reliable way of triggering the issue. I will paste a slightly more real-world example of how this happened below. This version is a little more messy and harder to see exactly what the problem is which is why I started with the other one. ``` import asyncio from threading import Thread async def do_nothing(n=0): await asyncio.sleep(n) async def loop_tasks(): loop = asyncio.get_event_loop() while True: loop.create_task(do_nothing()) await asyncio.sleep(0.01) async def make_tasks(n): loop = asyncio.get_event_loop() for i in range(n): loop.create_task(do_nothing(1)) await asyncio.sleep(1) def make_lots_of_tasks(): while True: asyncio.run(make_tasks(10000)) for i in range(10): t = Thread(target=make_lots_of_tasks) t.start() asyncio.run(loop_tasks()) ``` |
|||
msg345239 - (view) | Author: miss-islington (miss-islington) | Date: 2019-06-11 15:27 | |
New changeset 65aa64fae89a24491aae84ba0329eb8f3c68c389 by Miss Islington (bot) (Andrew Svetlov) in branch 'master': bpo-36607: Eliminate RuntimeError raised by asyncio.all_tasks() (GH-13971) https://github.com/python/cpython/commit/65aa64fae89a24491aae84ba0329eb8f3c68c389 |
|||
msg345247 - (view) | Author: miss-islington (miss-islington) | Date: 2019-06-11 16:01 | |
New changeset 5d1d4e3179ffd4d2d72462d870cf86dcc11450ce by Miss Islington (bot) in branch '3.7': bpo-36607: Eliminate RuntimeError raised by asyncio.all_tasks() (GH-13971) https://github.com/python/cpython/commit/5d1d4e3179ffd4d2d72462d870cf86dcc11450ce |
|||
msg345266 - (view) | Author: miss-islington (miss-islington) | Date: 2019-06-11 20:32 | |
New changeset 83abd9658b4845299452be06c9ce92cbceee944d by Miss Islington (bot) in branch '3.8': bpo-36607: Eliminate RuntimeError raised by asyncio.all_tasks() (GH-13971) https://github.com/python/cpython/commit/83abd9658b4845299452be06c9ce92cbceee944d |
|||
msg345380 - (view) | Author: Ćukasz Langa (lukasz.langa) * ![]() |
Date: 2019-06-12 16:54 | |
Note: there's a discussion on GitHub on PR 13971 being a bad solution: https://github.com/python/cpython/pull/13971#issuecomment-500908198 They should be reverted (or fixed forward) for 3.8 and 3.9. I'm OK with this modifying the AbstractEventLoop API if need be for 3.8. Just have a PR ready in the next two weeks. |
|||
msg345389 - (view) | Author: Andrew Svetlov (asvetlov) * ![]() |
Date: 2019-06-12 17:26 | |
Lukasz, please don't rush. Applied PR makes asyncio more stable, not worse. I see the only way to make it perfect: get rid of weak refs entirely. Otherwise there is always a chance to catch GC run and set element deletion on iteration over the set. Sorry, that's how weakrefs work. There is such possibility: call asyncio._unregister_task() explicitly when the task is done (finished with success or failure or cancelled). By this, we can replace weakset with a regular set. The only requirement is that task should call this _unregister_task() method. No public API change is needed. At the time of work on 3.7, Yuri and I considered this implementation but rejected it because there was a (very low) chance that somebody may implement own task, register custom task factory and don't call _unregister_task(). I never see a code that implements asyncio task from scratch, people always reuse existing asyncio.Task. So, maybe the idea is not such bad. It can be implemented easily. I'm volunteered to make a PR with the proposal demonstration in a day or two, depending on my free time. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2019-06-12 17:26:52 | asvetlov | set | messages: + msg345389 |
2019-06-12 16:54:54 | lukasz.langa | set | nosy:
+ lukasz.langa messages: + msg345380 |
2019-06-11 20:32:28 | miss-islington | set | messages: + msg345266 |
2019-06-11 20:15:54 | asvetlov | set | status: closed -> open resolution: fixed -> stage: resolved -> |
2019-06-11 16:01:35 | miss-islington | set | messages: + msg345247 |
2019-06-11 15:33:40 | asvetlov | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
2019-06-11 15:28:01 | miss-islington | set | pull_requests: + pull_request13843 |
2019-06-11 15:27:53 | miss-islington | set | pull_requests: + pull_request13842 |
2019-06-11 15:27:35 | miss-islington | set | nosy:
+ miss-islington messages: + msg345239 |
2019-06-11 10:56:12 | asvetlov | set | versions: + Python 3.9 |
2019-06-11 10:54:31 | asvetlov | set | keywords:
+ patch stage: patch review pull_requests: + pull_request13838 |
2019-04-19 13:54:20 | Nick Davies | set | messages: + msg340544 |
2019-04-19 08:14:08 | asvetlov | set | messages: + msg340527 |
2019-04-17 18:41:17 | Nick Davies | set | messages: + msg340428 |
2019-04-17 18:26:59 | asvetlov | set | messages:
+ msg340425 versions: + Python 3.8, - Python 3.6 |
2019-04-17 18:25:01 | asvetlov | set | messages: + msg340424 |
2019-04-17 17:10:16 | thatch | set | nosy:
+ thatch |
2019-04-16 15:53:56 | Nick Davies | set | type: behavior versions: + Python 3.6 |
2019-04-12 16:06:23 | sdunster | set | nosy:
+ sdunster |
2019-04-11 18:12:48 | Nick Davies | create |