classification
Title: ProactorEventLoop raises unhandled ConnectionResetError
Type: Stage: patch review
Components: asyncio Versions: Python 3.10, Python 3.9, Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Ben.Darnell, Burak Yiğit Kaya, Jonathan Slenders, PeterL777, Rustam S., Segev Finer, asvetlov, cjrh, cmeyer, db3l, miss-islington, robbiecares, steve.dower, yselivanov
Priority: normal Keywords: patch

Created on 2019-12-09 20:47 by Jonathan Slenders, last changed 2021-01-29 11:59 by robbiecares.

Pull Requests
URL Status Linked Edit
PR 20525 closed cmeyer, 2020-05-30 00:41
PR 22017 merged Ben.Darnell, 2020-08-30 14:26
PR 22034 merged miss-islington, 2020-08-31 19:58
PR 22035 merged miss-islington, 2020-08-31 19:58
PR 22066 merged Ben.Darnell, 2020-09-02 14:49
PR 22082 merged miss-islington, 2020-09-03 19:39
PR 22083 merged miss-islington, 2020-09-03 19:39
Messages (22)
msg358140 - (view) Author: Jonathan Slenders (Jonathan Slenders) Date: 2019-12-09 20:47
We have a snippet of code that runs perfectly fine using the `SelectorEventLoop`, but crashes *sometimes* using the `ProactorEventLoop`.

The traceback is the following. The exception cannot be caught within the asyncio application itself (e.g., it is not attached to any Future or propagated in a coroutine). It probably propagates in `run_until_complete()`.

  File "C:\Python38\lib\asyncio\proactor_events.py", line 768, in _loop_self_reading
    f.result()  # may raise
  File "C:\Python38\lib\asyncio\windows_events.py", line 808, in _poll
    value = callback(transferred, key, ov)
  File "C:\Python38\lib\asyncio\windows_events.py", line 457, in finish_recv
    raise ConnectionResetError(*exc.args)

I can see that in `IocpProactor._poll`, `OSError` is caught and attached to the future, but not `ConnectionResetError`. I would expect that `ConnectionResetError` too will be attached to the future.

In order to reproduce, run the following snippet on Python 3.8:

from prompt_toolkit import prompt  # pip install prompt_toolkit
while 1:
    prompt('>')

Hold down the enter key, and it'll trigger quickly.

See also: https://github.com/prompt-toolkit/python-prompt-toolkit/issues/1023
msg358141 - (view) Author: Jonathan Slenders (Jonathan Slenders) Date: 2019-12-09 20:57
Suppressing `ConnectionResetError` in `BaseProactorEventLoop._loop_self_reading`, like we do with `CancelledError` seems to fix it.

Although I'm not sure what it causing the error, and whether we need to handle it somehow.
msg358144 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-12-09 21:34
ConnectionResetError means that the pipe socket is closed. Was the event loop closed? Can you provide a reproducer? Can you try to get debug logs to see what's going on?
https://docs.python.org/dev/library/asyncio-dev.html#debug-mode
msg358199 - (view) Author: Jonathan Slenders (Jonathan Slenders) Date: 2019-12-10 16:18
Thanks Victor for the reply.

It looks like it's the self-socket in the BaseProactorEventLoop that gets closed. It's exactly this FD for which the exception is raised.

We don't close the event loop anywhere. I also don't see `_close_self_pipe` being called anywhere.

Debug logs don't provide any help. I'm looking into a reproducer.
msg358208 - (view) Author: Jonathan Slenders (Jonathan Slenders) Date: 2019-12-10 17:49
It looks like the following code will reproduce the issue:

```
import asyncio
import threading

loop = asyncio.get_event_loop()

while True:
    def test():
        loop.call_soon_threadsafe(loop.stop)

    threading.Thread(target=test).start()
    loop.run_forever()

```

Leave it running on Windows, in Python 3.8 for a few seconds, then it starts spawning `ConnectionResetError`s.
msg358226 - (view) Author: Jonathan Slenders (Jonathan Slenders) Date: 2019-12-10 21:18
Even simpler, the following code will crash after so many iterations:


```
import asyncio
loop = asyncio.get_event_loop()

while True:
    loop.call_soon_threadsafe(loop.stop)
    loop.run_forever()
```

Adding a little sleep of 0.01s after `run_forever()` prevents the issue.

So, to me it looks like the cancellation of the `_OverlappedFuture` that wraps around the `_recv()` call from the self-pipe did not complete before we start `_recv()` again in the next `run_forever()` call. No idea if that makes sense...
msg362076 - (view) Author: Ben Darnell (Ben.Darnell) * Date: 2020-02-16 15:05
I just spent some time digging into this. Each call to `run_forever` starts a call to `_loop_self_reading`, then attempts to cancel it before returning:

https://github.com/python/cpython/blob/1ed61617a4a6632905ad6a0b440cd2cafb8b6414/Lib/asyncio/windows_events.py#L312-L325

The comment at line 321 is not entirely accurate: the future will not resolve in the future, but it may have *already* resolved, and added its callback to the call_soon queue. This callback will run if the event loop is restarted again. Since `_loop_self_reading` calls itself, this results in two copies of the "loop" running concurrently and stepping on each other's `_self_reading_futures`. 

This appears to be fairly harmless except for noise in the logs when only one of the loops is stopped cleanly.

I believe the simplest fix is for `_loop_self_reading` to compare its argument to `self._self_reading_future` to determine if it is the "current" loop and if not, don't reschedule anything.
msg366002 - (view) Author: Rustam S. (Rustam S.) Date: 2020-04-08 19:12
Please take a look at this as well:
(ipython #12049 'Unhandled exception in event loop' (WinError 995))
https://github.com/ipython/ipython/issues/12049#issuecomment-586544339 and below
msg368692 - (view) Author: Chris Meyer (cmeyer) * Date: 2020-05-12 01:20
Here is another way to reproduce this (or an extremely similar) error without a loop. Since may be a race condition, I'm not sure this works 100% of the time on all machines - but it did on several machines I tried.

```
import asyncio

loop = asyncio.get_event_loop()

def func():
    pass

f = loop.run_in_executor(None, func)
loop.stop()
loop.run_forever()
loop.stop()
loop.run_forever()
loop.stop()
loop.run_forever()
```

```
Error on reading from the event loop self pipe
loop: <ProactorEventLoop running=True closed=False debug=False>
Traceback (most recent call last):
  File "C:\Miniconda3\envs\py38\lib\asyncio\windows_events.py", line 453, in finish_recv
    return ov.getresult()
OSError: [WinError 995] The I/O operation has been aborted because of either a thread exit or an application request

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Miniconda3\envs\py38\lib\asyncio\proactor_events.py", line 768, in _loop_self_reading
    f.result()  # may raise
  File "C:\Miniconda3\envs\py38\lib\asyncio\windows_events.py", line 808, in _poll
    value = callback(transferred, key, ov)
  File "C:\Miniconda3\envs\py38\lib\asyncio\windows_events.py", line 457, in finish_recv
    raise ConnectionResetError(*exc.args)
ConnectionResetError: [WinError 995] The I/O operation has been aborted because of either a thread exit or an application request
```
msg375861 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-08-24 20:41
Any input from the asyncio experts?

I don't have an issue with handling the exception in this case, and hopefully when someone is up to the task of dealing with the range of edge cases throughout this loop implementation, hopefully they can get the ordering of waits right.
msg376108 - (view) Author: Ben Darnell (Ben.Darnell) * Date: 2020-08-30 14:40
I've posted a pull request with a test and fix: https://github.com/python/cpython/pull/22017. It's a more targeted fix than cmeyer's PR (which I didn't even notice until now due to unfamiliarity with the BPO UI)
msg376166 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2020-08-31 19:58
New changeset ea5a6363c3f8cc90b7c0cc573922b10f296073b6 by Ben Darnell in branch 'master':
bpo-39010: Fix errors logged on proactor loop restart (#22017)
https://github.com/python/cpython/commit/ea5a6363c3f8cc90b7c0cc573922b10f296073b6
msg376173 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-08-31 23:13
Thanks, Ben!
msg376212 - (view) Author: David Bolen (db3l) * Date: 2020-09-01 21:32
I've been seeing failures on the Win10 buildbot 3.x branch that seem to correlate with the timing of this change - could there be some further work needed on Windows?  Or, if it's a test-only artifact and the warnings are innocuous, something to ignore the output during the tests?

Specifically, the following warnings occur during test_asyncio:
   Warning -- threading_cleanup() failed to cleanup 1 threads (count: 1, dangling: 2)
   Warning -- Dangling thread: <_MainThread(MainThread, started 5220)>

See https://buildbot.python.org/all/#/builders/146/builds/23 for the first failing build.
msg376213 - (view) Author: Ben Darnell (Ben.Darnell) * Date: 2020-09-02 02:02
I can confirm that those warnings appear to be coming from the test I added here. I'm not sure how to interpret them, though - what does it mean for the main thread to be dangling?
msg376215 - (view) Author: David Bolen (db3l) * Date: 2020-09-02 03:41
I'm guessing the warning appears odd as we're seeing a thread shutdown data race.  The message is produced by threading_cleanup in support/threading_helper.py, and it appears that between the first and second lines one of the initially dangling threads goes away, so the only one left to be enumerated is the main thread, at which point the function is simply listing all threads currently alive.  But by that point it's just the main thread remaining.

I notice in the test that you have a comment about needing to wait for f to complete or you get a warning about thread not shutting down cleanly.  Was that a similar warning?  The run_until_complete(f) line seems to have no effect on the buildbot.

If I added a small sleep at the end of the test the warnings go away on the buildbot.  The buildbot is a fairly fast machine, so perhaps the test just needs to wait somehow for the event loop to fully shut down or something.

The most direct cause of the warnings seems to be the self.loop.close() call - if I just comment that out the test runs warning-free without any extra delay needed.

I don't know much about asyncio tests, but it would appear the close call in the test defeats some of the logic in the close_loop teardown code that runs under TestCase (in utils.py), which under Windows is just a call to run_until_complete(loop.shutdown_default_executor()).  If I add that same call to the test prior to the close it also passes cleanly.

So if closing the loop in the test itself is crucial, I'd suggest including the extra run_until_complete call.  If closing isn't crucial to the test, simply removing it seems to address the issue.  I'm not sure if its removal then has any implications for the extra run_until_complete(f) call in the test, as I can't see any impact from that on the buildbot.
msg376247 - (view) Author: Ben Darnell (Ben.Darnell) * Date: 2020-09-02 14:51
I've fixed the test and added some commentary about the different levels of clean shutdown we must do to quiet all the warnings: https://github.com/python/cpython/pull/22066
msg376284 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2020-09-03 04:58
New changeset be435ae2b064dc64f04475bec632862e1dbf605f by Ben Darnell in branch 'master':
bpo-39010: Improve test shutdown (#22066)
https://github.com/python/cpython/commit/be435ae2b064dc64f04475bec632862e1dbf605f
msg376314 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2020-09-03 19:38
New changeset 49571c0b0e57e20d85727c738d9a0fe342dd2938 by Miss Islington (bot) in branch '3.9':
bpo-39010: Fix errors logged on proactor loop restart (GH-22017) (#22034)
https://github.com/python/cpython/commit/49571c0b0e57e20d85727c738d9a0fe342dd2938
msg376315 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2020-09-03 19:38
New changeset a986b061a3cd4661eb9f857e2936291f1847bd15 by Miss Islington (bot) in branch '3.8':
bpo-39010: Fix errors logged on proactor loop restart (GH-22017) (#22035)
https://github.com/python/cpython/commit/a986b061a3cd4661eb9f857e2936291f1847bd15
msg376321 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2020-09-03 20:54
New changeset 40e2444c364eede59f193979df5a02ed958f56e9 by Miss Islington (bot) in branch '3.8':
bpo-39010: Improve test shutdown (GH-22066) (#22083)
https://github.com/python/cpython/commit/40e2444c364eede59f193979df5a02ed958f56e9
msg376322 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2020-09-03 20:54
New changeset 8d68e59f11267ded765d4af6d71a49784a12fad5 by Miss Islington (bot) in branch '3.9':
bpo-39010: Improve test shutdown (GH-22066) (#22082)
https://github.com/python/cpython/commit/8d68e59f11267ded765d4af6d71a49784a12fad5
History
Date User Action Args
2021-01-29 11:59:02robbiecaressetnosy: + robbiecares
2020-09-03 20:54:47yselivanovsetmessages: + msg376322
2020-09-03 20:54:40yselivanovsetmessages: + msg376321
2020-09-03 19:39:32miss-islingtonsetpull_requests: + pull_request21169
2020-09-03 19:39:24miss-islingtonsetpull_requests: + pull_request21168
2020-09-03 19:38:38yselivanovsetmessages: + msg376315
2020-09-03 19:38:37yselivanovsetmessages: + msg376314
2020-09-03 04:58:53yselivanovsetmessages: + msg376284
2020-09-02 14:51:40Ben.Darnellsetmessages: + msg376247
2020-09-02 14:49:33Ben.Darnellsetpull_requests: + pull_request21156
2020-09-02 03:41:14db3lsetmessages: + msg376215
2020-09-02 02:02:52Ben.Darnellsetmessages: + msg376213
2020-09-01 21:32:55db3lsetnosy: + db3l
messages: + msg376212
2020-08-31 23:13:08steve.dowersetmessages: + msg376173
2020-08-31 19:58:29miss-islingtonsetpull_requests: + pull_request21133
2020-08-31 19:58:20miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request21132
2020-08-31 19:58:01yselivanovsetmessages: + msg376166
2020-08-30 14:40:52Ben.Darnellsetmessages: + msg376108
2020-08-30 14:26:59Ben.Darnellsetpull_requests: + pull_request21117
2020-08-24 20:41:41steve.dowersetnosy: + steve.dower

messages: + msg375861
versions: + Python 3.9, Python 3.10
2020-05-30 00:41:18cmeyersetkeywords: + patch
stage: patch review
pull_requests: + pull_request19769
2020-05-22 17:49:40Burak Yiğit Kayasetnosy: + Burak Yiğit Kaya
2020-05-12 01:20:02cmeyersetmessages: + msg368692
2020-04-26 16:42:13Segev Finersetnosy: + Segev Finer
2020-04-20 15:07:59cmeyersetnosy: + cmeyer
2020-04-09 00:42:21PeterL777setnosy: + PeterL777
2020-04-08 19:12:07Rustam S.setnosy: + Rustam S.
messages: + msg366002
2020-02-16 15:05:06Ben.Darnellsetnosy: + Ben.Darnell
messages: + msg362076
2020-02-02 01:36:57cjrhsetnosy: + cjrh
2020-01-15 21:54:41vstinnersetnosy: - vstinner
2019-12-10 21:18:18Jonathan Slenderssetmessages: + msg358226
2019-12-10 17:49:24Jonathan Slenderssetmessages: + msg358208
2019-12-10 16:18:23Jonathan Slenderssetmessages: + msg358199
2019-12-09 21:34:59vstinnersetnosy: + vstinner
messages: + msg358144
2019-12-09 20:57:39Jonathan Slenderssetmessages: + msg358141
2019-12-09 20:47:38Jonathan Slenderscreate