msg348328 - (view) |
Author: Nikita Ilyasov (Nikita Ilyasov) |
Date: 2019-07-23 12:16 |
In some cases `asyncio.wait_for` can lead to socket leak.
Condensed example:
```python
async def _create_connection(timeout=60, ssl_obj):
loop = asyncio.get_event_loop()
connector = loop.create_connection(MyEchoClientProtocol, '127.0.0.1', 5000, ssl=ssl_obj)
connector = asyncio.ensure_future(connector)
tr, pr = await asyncio.wait_for(connector, timeout=timeout, loop=loop)
return tr, pr
async def main():
...
res = await asyncio.wait_for(_acquire_impl(), timeout=timeout, loop=loop)
```
If my understanding is correct `wait_for` should work in exactly 2 ways
1. the inner task is completed and the outer task will receive the result – transport and protocol in this case
2. The inner task is cancelled and no connection was established
I provided source code for client and server so the problem can be easily reproduced on your system.
certificate and key can be easily generated with `minica`
I found out that if I catch `CancelledError` and add a `done_callback` to the inner task, like so:
```python
try:
tr, pr = await asyncio.wait_for(connector, timeout=timeout, loop=loop)
return tr, pr
except asyncio.CancelledError as e:
connector.add_done_callback(_done_callback)
raise e
```
then inside of `_done_callback` I can access the transport and protocol object and close the transport manually to prevent leaking.
I run `netstat -a | grep 5000 | grep ESTAB | awk '{ print $5 }' | sort | uniq -c | grep 5000` after the script is done and there are many unclosed connections.
The output depends on your hardware so you might need to tweak the timeout parameter
|
msg348329 - (view) |
Author: Nikita Ilyasov (Nikita Ilyasov) |
Date: 2019-07-23 12:19 |
server example: https://gist.github.com/hexrain/bc92aa70eebc229365f0ce4bcccf7fc4
|
msg375937 - (view) |
Author: Yury Selivanov (yselivanov) * |
Date: 2020-08-26 16:42 |
New changeset a2118a14627256197bddcf4fcecad4c264c1e39d by Elvis Pranskevichus in branch 'master':
bpo-37658: Fix asyncio.wait_for() to respect waited task status (#21894)
https://github.com/python/cpython/commit/a2118a14627256197bddcf4fcecad4c264c1e39d
|
msg375942 - (view) |
Author: Łukasz Langa (lukasz.langa) * |
Date: 2020-08-26 17:15 |
New changeset 9de6be4e2ae605a1deb6fa72d5c5f66b07817e4c by Miss Islington (bot) in branch '3.9':
bpo-37658: Fix asyncio.wait_for() to respect waited task status (GH-21894) (GH-21964)
https://github.com/python/cpython/commit/9de6be4e2ae605a1deb6fa72d5c5f66b07817e4c
|
msg375947 - (view) |
Author: Yury Selivanov (yselivanov) * |
Date: 2020-08-26 18:26 |
New changeset 6e1954cd8286e083e7f8d09516d91b6b15769a4e by Miss Islington (bot) in branch '3.8':
bpo-37658: Fix asyncio.wait_for() to respect waited task status (GH-21894) (#21965)
https://github.com/python/cpython/commit/6e1954cd8286e083e7f8d09516d91b6b15769a4e
|
msg379542 - (view) |
Author: Chris Jerdonek (chris.jerdonek) * |
Date: 2020-10-24 21:22 |
Issue #42130 that was recently filed appears related to this change.
|
msg393566 - (view) |
Author: Denis S. Otkidach (ods) * |
Date: 2021-05-13 08:39 |
The current solutions doesn't look correct. It swallows cancelling causing task to hang: https://bugs.python.org/issue42130 .
Proposed test case calls cancel for inner future and set_result for outer task in the same loop step. The old (prior to this patch) behaviour in this case looks correct to me. We have to elaborate on original problem to find the source of actual race.
Therefore I've tried to reproduce the problem with original code sample and published the adapted version of it here: https://github.com/ods/bpo-37658 . Nikita Ilyasov, please correct me if I've misunderstood what you meant. My result is that proposed solution actually doesn't solve the problem, but rather lowers the chance to get it. Namely, I get the traceback quoted in README both with 3.8.5, 3.9.4, and current code in main.
|
msg393567 - (view) |
Author: Adam Liddell (aaliddell) |
Date: 2021-05-13 08:53 |
Some discussion leading up to that change is here https://github.com/MagicStack/asyncpg/pull/548 and in the issues it links.
|
msg393584 - (view) |
Author: Denis S. Otkidach (ods) * |
Date: 2021-05-13 14:53 |
The original problem can be fixed by wrapping await into try-except block:
```
async def create_connection(ssl_obj):
loop = asyncio.get_event_loop()
connector = loop.create_connection(MyEchoClientProtocol, '127.0.0.1', 5000, ssl=ssl_obj)
connector = asyncio.ensure_future(connector)
try:
tr, pr = await connector
except asyncio.CancelledError:
if connector.done():
tr, pr = connector.result()
# Uncommenting the following line fixes the problem:
# tr.close()
raise
return tr, pr
```
I've updated my example to reproduce this: https://github.com/ods/bpo-37658/commit/eca3d81d60cbe129ce687674e6451836d567f6b9
I believe it's general problem with maintaining atomicity in async code, and not a bug in `wait_for`. Probably having an interface like `async with loop.create_connection(…) as transport, protocol` might simplify correct usage for this particular case.
|
msg393586 - (view) |
Author: Adam Liddell (aaliddell) |
Date: 2021-05-13 15:20 |
Wrapping every resource allocating call like that is what we were trying to avoid, since it makes wait_for go from a simple one-line helper to something you have to be very careful with.
Conceptually, a user should expect that wait_for should behave the exact same as awaiting the underlying awaitable, just with auto-cancellation. The problem with the current wait_for is that there is a gap where the underlying task may have completed but a cancellation arrives. In this case, we need to raise the cancellation to be a good asyncio citizen, but the underlying task has no opportunity to act on the cancellation (to free the resource) since it is already complete and cannot be re-entered. So the resource returned by the completed task gets stuck in limbo, since we can't return it and we can't assume a generic 'close' behaviour.
See my comment in the PR for a suggestion about an alternative structure for wait_for, which may avoid this gap and hence prevent the leak (but I have not tested it!)
|
msg393687 - (view) |
Author: nmatravolgyi (nmatravolgyi) * |
Date: 2021-05-14 19:41 |
I did a little research on how the `wait_for` could be improved. There are four alternate implementations you can find here: https://github.com/Traktormaster/wait-for2/blob/issue37658/wait_for2/impl.py
The repository (on the linked branch) has tox set-up and a test that asserts the behaviour of builtin and alternate implementations.
Quick summary:
- wait_for (builtin): either leaks resources or ignores cancellation
- wait_for (preferred alt): works as expected (I'm going to discuss this below)
- wait_for_special_raise: works, but has pitfalls
- wait_for_no_waiter*: These were proposed by @aaliddell on the related PR: https://github.com/python/cpython/pull/26097#issuecomment-840497455 I could not make them work, but I might be missing something. I think the fact that the inner coroutine gets wrapped into a Future introduces the race-condition, but I don't know how to test that. The general idea of this would be the best if an implementation was possible.
About the actually working alternate implementation I made:
In my opinion there is no way to implicitly handle losing a result properly in case of a cancellation, since it arbitrarily breaks the flow of execution. I'd look into having `wait_for(...)` support cleanup callbacks when a cancellation and completion happens at the same time. Something like:
```python
async def create_something():
# this acquires some resource that needs explicit cleanup after some work
return object()
def cleanup_something(inst):
inst.close()
t = asyncio.ensure_future(create_something())
x = await asyncio.wait_for(t, timeout=10, cancel_handler=cleanup_something)
try:
pass # normal work with x
finally:
cleanup_something(x) # cleanup at normal execution
```
The inner task is still responsible for handling the resource before it returns, which means if the inner task is cancelled, there must be no leak if the implementation is correct. If no cancellation happens, everything is "fine". Finally, the waiter task would be able to handle its cancellation and the simultaneous completion of the inner task if the caller code provides a callback.
Unfortunately this requires the user of `wait_for` to be aware of this race-condition. However it makes it possible to be handled properly when the waited future's result requires a cleanup.
|
msg394249 - (view) |
Author: Denis S. Otkidach (ods) * |
Date: 2021-05-24 13:02 |
Re: msg393586
> See my comment in the PR for a suggestion about an alternative structure for wait_for, which may avoid this gap and hence prevent the leak (but I have not tested it!)
Unfortunately this didn't close the gap, so the leak persisted.
I did some research on the source of the error. There are circular references between instances of _SelectorSocketTransport and _SSLProtocolTransport (see attached image), therefore these objects are destroyed by garbage collector. The order might be arbitrary. When _SelectorSocketTransport is destroyed after _SSLProtocolTransport everything is ok, but with the opposite order we get the error printed. Here is description on how to reproduce this:
https://github.com/ods/bpo-37658#circular-references-when-using-ssltls
|
msg404998 - (view) |
Author: Sam Bull (dreamsorcerer) * |
Date: 2021-10-25 20:03 |
Can I get a review?
https://github.com/python/cpython/pull/29202
Seems like a simple mistake given the original description of this issue:
> 1. the inner task is completed and the outer task will receive the result – transport and protocol in this case
> 2. The inner task is cancelled and no connection was established
The try/except blocks clearly add a 3rd condition, where the inner task is completed and a TimeoutError is raised without returning the result.
|
msg407248 - (view) |
Author: Andrew Svetlov (asvetlov) * |
Date: 2021-11-29 08:13 |
New changeset 934a82623793e9d52b85f74d5395d65927a52205 by Sam Bull in branch 'main':
bpo-37658: Actually return result in race condition (GH-29202)
https://github.com/python/cpython/commit/934a82623793e9d52b85f74d5395d65927a52205
|
msg407252 - (view) |
Author: miss-islington (miss-islington) |
Date: 2021-11-29 08:37 |
New changeset 305236e03a274850be8ed399ea3390ee71519ef4 by Miss Islington (bot) in branch '3.10':
bpo-37658: Actually return result in race condition (GH-29202)
https://github.com/python/cpython/commit/305236e03a274850be8ed399ea3390ee71519ef4
|
msg407373 - (view) |
Author: Andrew Svetlov (asvetlov) * |
Date: 2021-11-30 13:39 |
New changeset 99a9b343316172f049a52b39a406f0c0c42c106f by Miss Islington (bot) in branch '3.9':
bpo-37658: Actually return result in race condition (GH-29202) (GH-29832)
https://github.com/python/cpython/commit/99a9b343316172f049a52b39a406f0c0c42c106f
|
msg414797 - (view) |
Author: Kumar Aditya (kumaraditya) * |
Date: 2022-03-09 12:12 |
@asvetlov Anything left to do here or can this be closed ?
|
msg414925 - (view) |
Author: Sam Bull (dreamsorcerer) * |
Date: 2022-03-11 17:35 |
There is still an issue here. I proposed a possible solution at: https://github.com/python/cpython/pull/28149#issuecomment-1007823782
You can also scroll up through the lengthy discussion to see how I reached that conclusion. I've not had time yet to actually try implementing something yet.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:59:18 | admin | set | github: 81839 |
2022-03-11 17:35:41 | dreamsorcerer | set | messages:
+ msg414925 |
2022-03-09 12:12:34 | kumaraditya | set | nosy:
+ kumaraditya messages:
+ msg414797
|
2021-11-30 13:39:31 | asvetlov | set | messages:
+ msg407373 |
2021-11-29 08:37:46 | miss-islington | set | messages:
+ msg407252 |
2021-11-29 08:13:39 | miss-islington | set | pull_requests:
+ pull_request28063 |
2021-11-29 08:13:26 | miss-islington | set | pull_requests:
+ pull_request28062 |
2021-11-29 08:13:13 | asvetlov | set | messages:
+ msg407248 |
2021-10-25 20:03:44 | dreamsorcerer | set | messages:
+ msg404998 |
2021-10-24 15:43:16 | dreamsorcerer | set | nosy:
+ dreamsorcerer pull_requests:
+ pull_request27471
|
2021-05-24 13:02:10 | ods | set | files:
+ ssl_circular_refs.png
messages:
+ msg394249 |
2021-05-14 19:41:13 | nmatravolgyi | set | nosy:
+ nmatravolgyi messages:
+ msg393687
|
2021-05-13 15:20:09 | aaliddell | set | messages:
+ msg393586 |
2021-05-13 14:53:36 | ods | set | messages:
+ msg393584 |
2021-05-13 10:41:25 | ods | set | pull_requests:
+ pull_request24738 |
2021-05-13 08:53:43 | aaliddell | set | messages:
+ msg393567 |
2021-05-13 08:39:05 | ods | set | nosy:
+ ods messages:
+ msg393566
|
2020-10-24 21:22:53 | chris.jerdonek | set | nosy:
+ chris.jerdonek messages:
+ msg379542
|
2020-08-26 19:06:35 | Elvis.Pranskevichus | set | pull_requests:
+ pull_request21078 |
2020-08-26 18:26:37 | yselivanov | set | messages:
+ msg375947 |
2020-08-26 17:15:38 | lukasz.langa | set | nosy:
+ lukasz.langa messages:
+ msg375942
|
2020-08-26 16:43:21 | miss-islington | set | pull_requests:
+ pull_request21075 |
2020-08-26 16:43:13 | miss-islington | set | nosy:
+ miss-islington pull_requests:
+ pull_request21074
|
2020-08-26 16:42:51 | yselivanov | set | messages:
+ msg375937 |
2020-08-16 14:25:32 | aaliddell | set | nosy:
+ aaliddell
|
2020-08-15 22:23:50 | Elvis.Pranskevichus | set | keywords:
+ patch nosy:
+ Elvis.Pranskevichus
pull_requests:
+ pull_request21013 stage: patch review |
2019-07-23 12:19:39 | Nikita Ilyasov | set | messages:
+ msg348329 |
2019-07-23 12:16:17 | Nikita Ilyasov | create | |