Issue 37658: In some cases asyncio.wait_for can lead to socket leak.

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/81839

classification

Title:	In some cases asyncio.wait_for can lead to socket leak.
Type:	resource usage	Stage:	patch review
Components:	asyncio	Versions:	Python 3.7, Python 3.6

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	Elvis.Pranskevichus, Nikita Ilyasov, aaliddell, asvetlov, chris.jerdonek, dreamsorcerer, kumaraditya, lukasz.langa, miss-islington, nmatravolgyi, ods, yselivanov
Priority:	normal	Keywords:	patch

Created on 2019-07-23 12:16 by Nikita Ilyasov, last changed 2022-04-11 14:59 by admin.

Files
File name	Uploaded	Description	Edit
example.py	Nikita Ilyasov, 2019-07-23 12:16	script that can reproduce the problem
ssl_circular_refs.png	ods, 2021-05-24 13:02

Pull Requests
URL	Status	Linked	Edit
PR 21894	merged	Elvis.Pranskevichus, 2020-08-15 22:23
PR 21964	merged	miss-islington, 2020-08-26 16:43
PR 21965	merged	miss-islington, 2020-08-26 16:43
PR 21969	closed	Elvis.Pranskevichus, 2020-08-26 19:06
PR 26097	open	ods, 2021-05-13 10:41
PR 29202	merged	dreamsorcerer, 2021-10-24 15:43
PR 29832	merged	miss-islington, 2021-11-29 08:13
PR 29831	merged	miss-islington, 2021-11-29 08:13

Messages (18)
msg348328 - (view)	Author: Nikita Ilyasov (Nikita Ilyasov)	Date: 2019-07-23 12:16
In some cases `asyncio.wait_for` can lead to socket leak. Condensed example: ```python async def _create_connection(timeout=60, ssl_obj): loop = asyncio.get_event_loop() connector = loop.create_connection(MyEchoClientProtocol, '127.0.0.1', 5000, ssl=ssl_obj) connector = asyncio.ensure_future(connector) tr, pr = await asyncio.wait_for(connector, timeout=timeout, loop=loop) return tr, pr async def main(): ... res = await asyncio.wait_for(_acquire_impl(), timeout=timeout, loop=loop) ``` If my understanding is correct `wait_for` should work in exactly 2 ways 1. the inner task is completed and the outer task will receive the result – transport and protocol in this case 2. The inner task is cancelled and no connection was established I provided source code for client and server so the problem can be easily reproduced on your system. certificate and key can be easily generated with `minica` I found out that if I catch `CancelledError` and add a `done_callback` to the inner task, like so: ```python try: tr, pr = await asyncio.wait_for(connector, timeout=timeout, loop=loop) return tr, pr except asyncio.CancelledError as e: connector.add_done_callback(_done_callback) raise e ``` then inside of `_done_callback` I can access the transport and protocol object and close the transport manually to prevent leaking. I run `netstat -a \| grep 5000 \| grep ESTAB \| awk '{ print $5 }' \| sort \| uniq -c \| grep 5000` after the script is done and there are many unclosed connections. The output depends on your hardware so you might need to tweak the timeout parameter
msg348329 - (view)	Author: Nikita Ilyasov (Nikita Ilyasov)	Date: 2019-07-23 12:19
server example: https://gist.github.com/hexrain/bc92aa70eebc229365f0ce4bcccf7fc4
msg375937 - (view)	Author: Yury Selivanov (yselivanov) *	Date: 2020-08-26 16:42
New changeset a2118a14627256197bddcf4fcecad4c264c1e39d by Elvis Pranskevichus in branch 'master': bpo-37658: Fix asyncio.wait_for() to respect waited task status (#21894) https://github.com/python/cpython/commit/a2118a14627256197bddcf4fcecad4c264c1e39d
msg375942 - (view)	Author: Łukasz Langa (lukasz.langa) *	Date: 2020-08-26 17:15
New changeset 9de6be4e2ae605a1deb6fa72d5c5f66b07817e4c by Miss Islington (bot) in branch '3.9': bpo-37658: Fix asyncio.wait_for() to respect waited task status (GH-21894) (GH-21964) https://github.com/python/cpython/commit/9de6be4e2ae605a1deb6fa72d5c5f66b07817e4c
msg375947 - (view)	Author: Yury Selivanov (yselivanov) *	Date: 2020-08-26 18:26
New changeset 6e1954cd8286e083e7f8d09516d91b6b15769a4e by Miss Islington (bot) in branch '3.8': bpo-37658: Fix asyncio.wait_for() to respect waited task status (GH-21894) (#21965) https://github.com/python/cpython/commit/6e1954cd8286e083e7f8d09516d91b6b15769a4e
msg379542 - (view)	Author: Chris Jerdonek (chris.jerdonek) *	Date: 2020-10-24 21:22
Issue #42130 that was recently filed appears related to this change.
msg393566 - (view)	Author: Denis S. Otkidach (ods) *	Date: 2021-05-13 08:39
The current solutions doesn't look correct. It swallows cancelling causing task to hang: https://bugs.python.org/issue42130 . Proposed test case calls cancel for inner future and set_result for outer task in the same loop step. The old (prior to this patch) behaviour in this case looks correct to me. We have to elaborate on original problem to find the source of actual race. Therefore I've tried to reproduce the problem with original code sample and published the adapted version of it here: https://github.com/ods/bpo-37658 . Nikita Ilyasov, please correct me if I've misunderstood what you meant. My result is that proposed solution actually doesn't solve the problem, but rather lowers the chance to get it. Namely, I get the traceback quoted in README both with 3.8.5, 3.9.4, and current code in main.
msg393567 - (view)	Author: Adam Liddell (aaliddell)	Date: 2021-05-13 08:53
Some discussion leading up to that change is here https://github.com/MagicStack/asyncpg/pull/548 and in the issues it links.
msg393584 - (view)	Author: Denis S. Otkidach (ods) *	Date: 2021-05-13 14:53
The original problem can be fixed by wrapping await into try-except block: ``` async def create_connection(ssl_obj): loop = asyncio.get_event_loop() connector = loop.create_connection(MyEchoClientProtocol, '127.0.0.1', 5000, ssl=ssl_obj) connector = asyncio.ensure_future(connector) try: tr, pr = await connector except asyncio.CancelledError: if connector.done(): tr, pr = connector.result() # Uncommenting the following line fixes the problem: # tr.close() raise return tr, pr ``` I've updated my example to reproduce this: https://github.com/ods/bpo-37658/commit/eca3d81d60cbe129ce687674e6451836d567f6b9 I believe it's general problem with maintaining atomicity in async code, and not a bug in `wait_for`. Probably having an interface like `async with loop.create_connection(…) as transport, protocol` might simplify correct usage for this particular case.
msg393586 - (view)	Author: Adam Liddell (aaliddell)	Date: 2021-05-13 15:20
Wrapping every resource allocating call like that is what we were trying to avoid, since it makes wait_for go from a simple one-line helper to something you have to be very careful with. Conceptually, a user should expect that wait_for should behave the exact same as awaiting the underlying awaitable, just with auto-cancellation. The problem with the current wait_for is that there is a gap where the underlying task may have completed but a cancellation arrives. In this case, we need to raise the cancellation to be a good asyncio citizen, but the underlying task has no opportunity to act on the cancellation (to free the resource) since it is already complete and cannot be re-entered. So the resource returned by the completed task gets stuck in limbo, since we can't return it and we can't assume a generic 'close' behaviour. See my comment in the PR for a suggestion about an alternative structure for wait_for, which may avoid this gap and hence prevent the leak (but I have not tested it!)
msg393687 - (view)	Author: nmatravolgyi (nmatravolgyi) *	Date: 2021-05-14 19:41
I did a little research on how the `wait_for` could be improved. There are four alternate implementations you can find here: https://github.com/Traktormaster/wait-for2/blob/issue37658/wait_for2/impl.py The repository (on the linked branch) has tox set-up and a test that asserts the behaviour of builtin and alternate implementations. Quick summary: - wait_for (builtin): either leaks resources or ignores cancellation - wait_for (preferred alt): works as expected (I'm going to discuss this below) - wait_for_special_raise: works, but has pitfalls - wait_for_no_waiter*: These were proposed by @aaliddell on the related PR: https://github.com/python/cpython/pull/26097#issuecomment-840497455 I could not make them work, but I might be missing something. I think the fact that the inner coroutine gets wrapped into a Future introduces the race-condition, but I don't know how to test that. The general idea of this would be the best if an implementation was possible. About the actually working alternate implementation I made: In my opinion there is no way to implicitly handle losing a result properly in case of a cancellation, since it arbitrarily breaks the flow of execution. I'd look into having `wait_for(...)` support cleanup callbacks when a cancellation and completion happens at the same time. Something like: ```python async def create_something(): # this acquires some resource that needs explicit cleanup after some work return object() def cleanup_something(inst): inst.close() t = asyncio.ensure_future(create_something()) x = await asyncio.wait_for(t, timeout=10, cancel_handler=cleanup_something) try: pass # normal work with x finally: cleanup_something(x) # cleanup at normal execution ``` The inner task is still responsible for handling the resource before it returns, which means if the inner task is cancelled, there must be no leak if the implementation is correct. If no cancellation happens, everything is "fine". Finally, the waiter task would be able to handle its cancellation and the simultaneous completion of the inner task if the caller code provides a callback. Unfortunately this requires the user of `wait_for` to be aware of this race-condition. However it makes it possible to be handled properly when the waited future's result requires a cleanup.
msg394249 - (view)	Author: Denis S. Otkidach (ods) *	Date: 2021-05-24 13:02
Re: msg393586 > See my comment in the PR for a suggestion about an alternative structure for wait_for, which may avoid this gap and hence prevent the leak (but I have not tested it!) Unfortunately this didn't close the gap, so the leak persisted. I did some research on the source of the error. There are circular references between instances of _SelectorSocketTransport and _SSLProtocolTransport (see attached image), therefore these objects are destroyed by garbage collector. The order might be arbitrary. When _SelectorSocketTransport is destroyed after _SSLProtocolTransport everything is ok, but with the opposite order we get the error printed. Here is description on how to reproduce this: https://github.com/ods/bpo-37658#circular-references-when-using-ssltls
msg404998 - (view)	Author: Sam Bull (dreamsorcerer) *	Date: 2021-10-25 20:03
Can I get a review? https://github.com/python/cpython/pull/29202 Seems like a simple mistake given the original description of this issue: > 1. the inner task is completed and the outer task will receive the result – transport and protocol in this case > 2. The inner task is cancelled and no connection was established The try/except blocks clearly add a 3rd condition, where the inner task is completed and a TimeoutError is raised without returning the result.
msg407248 - (view)	Author: Andrew Svetlov (asvetlov) *	Date: 2021-11-29 08:13
New changeset 934a82623793e9d52b85f74d5395d65927a52205 by Sam Bull in branch 'main': bpo-37658: Actually return result in race condition (GH-29202) https://github.com/python/cpython/commit/934a82623793e9d52b85f74d5395d65927a52205
msg407252 - (view)	Author: miss-islington (miss-islington)	Date: 2021-11-29 08:37
New changeset 305236e03a274850be8ed399ea3390ee71519ef4 by Miss Islington (bot) in branch '3.10': bpo-37658: Actually return result in race condition (GH-29202) https://github.com/python/cpython/commit/305236e03a274850be8ed399ea3390ee71519ef4
msg407373 - (view)	Author: Andrew Svetlov (asvetlov) *	Date: 2021-11-30 13:39
New changeset 99a9b343316172f049a52b39a406f0c0c42c106f by Miss Islington (bot) in branch '3.9': bpo-37658: Actually return result in race condition (GH-29202) (GH-29832) https://github.com/python/cpython/commit/99a9b343316172f049a52b39a406f0c0c42c106f
msg414797 - (view)	Author: Kumar Aditya (kumaraditya) *	Date: 2022-03-09 12:12
@asvetlov Anything left to do here or can this be closed ?
msg414925 - (view)	Author: Sam Bull (dreamsorcerer) *	Date: 2022-03-11 17:35
There is still an issue here. I proposed a possible solution at: https://github.com/python/cpython/pull/28149#issuecomment-1007823782 You can also scroll up through the lengthy discussion to see how I reached that conclusion. I've not had time yet to actually try implementing something yet.

History
Date	User	Action	Args
2022-04-11 14:59:18	admin	set	github: 81839
2022-03-11 17:35:41	dreamsorcerer	set	messages: + msg414925
2022-03-09 12:12:34	kumaraditya	set	nosy: + kumaraditya messages: + msg414797
2021-11-30 13:39:31	asvetlov	set	messages: + msg407373
2021-11-29 08:37:46	miss-islington	set	messages: + msg407252
2021-11-29 08:13:39	miss-islington	set	pull_requests: + pull_request28063
2021-11-29 08:13:26	miss-islington	set	pull_requests: + pull_request28062
2021-11-29 08:13:13	asvetlov	set	messages: + msg407248
2021-10-25 20:03:44	dreamsorcerer	set	messages: + msg404998
2021-10-24 15:43:16	dreamsorcerer	set	nosy: + dreamsorcerer pull_requests: + pull_request27471
2021-05-24 13:02:10	ods	set	files: + ssl_circular_refs.png messages: + msg394249
2021-05-14 19:41:13	nmatravolgyi	set	nosy: + nmatravolgyi messages: + msg393687
2021-05-13 15:20:09	aaliddell	set	messages: + msg393586
2021-05-13 14:53:36	ods	set	messages: + msg393584
2021-05-13 10:41:25	ods	set	pull_requests: + pull_request24738
2021-05-13 08:53:43	aaliddell	set	messages: + msg393567
2021-05-13 08:39:05	ods	set	nosy: + ods messages: + msg393566
2020-10-24 21:22:53	chris.jerdonek	set	nosy: + chris.jerdonek messages: + msg379542
2020-08-26 19:06:35	Elvis.Pranskevichus	set	pull_requests: + pull_request21078
2020-08-26 18:26:37	yselivanov	set	messages: + msg375947
2020-08-26 17:15:38	lukasz.langa	set	nosy: + lukasz.langa messages: + msg375942
2020-08-26 16:43:21	miss-islington	set	pull_requests: + pull_request21075
2020-08-26 16:43:13	miss-islington	set	nosy: + miss-islington pull_requests: + pull_request21074
2020-08-26 16:42:51	yselivanov	set	messages: + msg375937
2020-08-16 14:25:32	aaliddell	set	nosy: + aaliddell
2020-08-15 22:23:50	Elvis.Pranskevichus	set	keywords: + patch nosy: + Elvis.Pranskevichus pull_requests: + pull_request21013 stage: patch review
2019-07-23 12:19:39	Nikita Ilyasov	set	messages: + msg348329
2019-07-23 12:16:17	Nikita Ilyasov	create