Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProactorEventLoop raises unhandled ConnectionResetError #83191

Closed
JonathanSlenders mannequin opened this issue Dec 9, 2019 · 22 comments
Closed

ProactorEventLoop raises unhandled ConnectionResetError #83191

JonathanSlenders mannequin opened this issue Dec 9, 2019 · 22 comments
Labels
3.8 only security fixes 3.9 only security fixes 3.10 only security fixes topic-asyncio

Comments

@JonathanSlenders
Copy link
Mannequin

JonathanSlenders mannequin commented Dec 9, 2019

BPO 39010
Nosy @db3l, @asvetlov, @cjrh, @bdarnell, @1st1, @zooba, @cmeyer, @segevfiner, @miss-islington, @robbiecares
PRs
  • bpo-39010: Ignore error on cancelled future after end of loop. #20525
  • bpo-39010: Fix errors logged on proactor loop restart #22017
  • [3.9] bpo-39010: Fix errors logged on proactor loop restart (GH-22017) #22034
  • [3.8] bpo-39010: Fix errors logged on proactor loop restart (GH-22017) #22035
  • bpo-39010: Improve test shutdown #22066
  • [3.9] bpo-39010: Improve test shutdown (GH-22066) #22082
  • [3.8] bpo-39010: Improve test shutdown (GH-22066) #22083
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2019-12-09.20:47:38.611>
    labels = ['3.8', '3.9', '3.10', 'expert-asyncio']
    title = 'ProactorEventLoop raises unhandled ConnectionResetError'
    updated_at = <Date 2021-01-29.11:59:02.884>
    user = 'https://bugs.python.org/JonathanSlenders'

    bugs.python.org fields:

    activity = <Date 2021-01-29.11:59:02.884>
    actor = 'robbiecares'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['asyncio']
    creation = <Date 2019-12-09.20:47:38.611>
    creator = 'Jonathan Slenders'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 39010
    keywords = ['patch']
    message_count = 22.0
    messages = ['358140', '358141', '358144', '358199', '358208', '358226', '362076', '366002', '368692', '375861', '376108', '376166', '376173', '376212', '376213', '376215', '376247', '376284', '376314', '376315', '376321', '376322']
    nosy_count = 14.0
    nosy_names = ['db3l', 'asvetlov', 'cjrh', 'Ben.Darnell', 'yselivanov', 'steve.dower', 'cmeyer', 'Segev Finer', 'miss-islington', 'PeterL777', 'Jonathan Slenders', 'Rustam S.', 'Burak Yi\xc4\x9fit Kaya', 'robbiecares']
    pr_nums = ['20525', '22017', '22034', '22035', '22066', '22082', '22083']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue39010'
    versions = ['Python 3.8', 'Python 3.9', 'Python 3.10']

    @JonathanSlenders
    Copy link
    Mannequin Author

    JonathanSlenders mannequin commented Dec 9, 2019

    We have a snippet of code that runs perfectly fine using the SelectorEventLoop, but crashes sometimes using the ProactorEventLoop.

    The traceback is the following. The exception cannot be caught within the asyncio application itself (e.g., it is not attached to any Future or propagated in a coroutine). It probably propagates in run_until_complete().

    File "C:\Python38\lib\asyncio\proactor_events.py", line 768, in _loop_self_reading
    f.result() # may raise
    File "C:\Python38\lib\asyncio\windows_events.py", line 808, in _poll
    value = callback(transferred, key, ov)
    File "C:\Python38\lib\asyncio\windows_events.py", line 457, in finish_recv
    raise ConnectionResetError(*exc.args)

    I can see that in IocpProactor._poll, OSError is caught and attached to the future, but not ConnectionResetError. I would expect that ConnectionResetError too will be attached to the future.

    In order to reproduce, run the following snippet on Python 3.8:

    from prompt_toolkit import prompt  # pip install prompt_toolkit
    while 1:
        prompt('>')

    Hold down the enter key, and it'll trigger quickly.

    See also: prompt-toolkit/python-prompt-toolkit#1023

    @JonathanSlenders JonathanSlenders mannequin added 3.8 only security fixes topic-asyncio labels Dec 9, 2019
    @JonathanSlenders
    Copy link
    Mannequin Author

    JonathanSlenders mannequin commented Dec 9, 2019

    Suppressing ConnectionResetError in BaseProactorEventLoop._loop_self_reading, like we do with CancelledError seems to fix it.

    Although I'm not sure what it causing the error, and whether we need to handle it somehow.

    @vstinner
    Copy link
    Member

    vstinner commented Dec 9, 2019

    ConnectionResetError means that the pipe socket is closed. Was the event loop closed? Can you provide a reproducer? Can you try to get debug logs to see what's going on?
    https://docs.python.org/dev/library/asyncio-dev.html#debug-mode

    @JonathanSlenders
    Copy link
    Mannequin Author

    JonathanSlenders mannequin commented Dec 10, 2019

    Thanks Victor for the reply.

    It looks like it's the self-socket in the BaseProactorEventLoop that gets closed. It's exactly this FD for which the exception is raised.

    We don't close the event loop anywhere. I also don't see _close_self_pipe being called anywhere.

    Debug logs don't provide any help. I'm looking into a reproducer.

    @JonathanSlenders
    Copy link
    Mannequin Author

    JonathanSlenders mannequin commented Dec 10, 2019

    It looks like the following code will reproduce the issue:

    import asyncio
    import threading
    
    loop = asyncio.get_event_loop()
    
    while True:
        def test():
            loop.call_soon_threadsafe(loop.stop)
    
        threading.Thread(target=test).start()
        loop.run_forever()
    
    

    Leave it running on Windows, in Python 3.8 for a few seconds, then it starts spawning ConnectionResetErrors.

    @JonathanSlenders
    Copy link
    Mannequin Author

    JonathanSlenders mannequin commented Dec 10, 2019

    Even simpler, the following code will crash after so many iterations:

    import asyncio
    loop = asyncio.get_event_loop()
    
    while True:
        loop.call_soon_threadsafe(loop.stop)
        loop.run_forever()
    

    Adding a little sleep of 0.01s after run_forever() prevents the issue.

    So, to me it looks like the cancellation of the _OverlappedFuture that wraps around the _recv() call from the self-pipe did not complete before we start _recv() again in the next run_forever() call. No idea if that makes sense...

    @bdarnell
    Copy link
    Mannequin

    bdarnell mannequin commented Feb 16, 2020

    I just spent some time digging into this. Each call to run_forever starts a call to _loop_self_reading, then attempts to cancel it before returning:

    def run_forever(self):
    try:
    assert self._self_reading_future is None
    self.call_soon(self._loop_self_reading)
    super().run_forever()
    finally:
    if self._self_reading_future is not None:
    ov = self._self_reading_future._ov
    self._self_reading_future.cancel()
    # self_reading_future was just cancelled so it will never be signalled
    # Unregister it otherwise IocpProactor.close will wait for it forever
    if ov is not None:
    self._proactor._unregister(ov)
    self._self_reading_future = None

    The comment at line 321 is not entirely accurate: the future will not resolve in the future, but it may have already resolved, and added its callback to the call_soon queue. This callback will run if the event loop is restarted again. Since _loop_self_reading calls itself, this results in two copies of the "loop" running concurrently and stepping on each other's _self_reading_futures.

    This appears to be fairly harmless except for noise in the logs when only one of the loops is stopped cleanly.

    I believe the simplest fix is for _loop_self_reading to compare its argument to self._self_reading_future to determine if it is the "current" loop and if not, don't reschedule anything.

    @RustamS
    Copy link
    Mannequin

    RustamS mannequin commented Apr 8, 2020

    Please take a look at this as well:
    (ipython bpo-12049 'Unhandled exception in event loop' (WinError 995))
    ipython/ipython#12049 (comment) and below

    @cmeyer
    Copy link
    Mannequin

    cmeyer mannequin commented May 12, 2020

    Here is another way to reproduce this (or an extremely similar) error without a loop. Since may be a race condition, I'm not sure this works 100% of the time on all machines - but it did on several machines I tried.

    import asyncio
    
    loop = asyncio.get_event_loop()
    
    def func():
        pass
    
    f = loop.run_in_executor(None, func)
    loop.stop()
    loop.run_forever()
    loop.stop()
    loop.run_forever()
    loop.stop()
    loop.run_forever()
    
    Error on reading from the event loop self pipe
    loop: <ProactorEventLoop running=True closed=False debug=False>
    Traceback (most recent call last):
      File "C:\Miniconda3\envs\py38\lib\asyncio\windows_events.py", line 453, in finish_recv
        return ov.getresult()
    OSError: [WinError 995] The I/O operation has been aborted because of either a thread exit or an application request
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "C:\Miniconda3\envs\py38\lib\asyncio\proactor_events.py", line 768, in _loop_self_reading
        f.result()  # may raise
      File "C:\Miniconda3\envs\py38\lib\asyncio\windows_events.py", line 808, in _poll
        value = callback(transferred, key, ov)
      File "C:\Miniconda3\envs\py38\lib\asyncio\windows_events.py", line 457, in finish_recv
        raise ConnectionResetError(*exc.args)
    ConnectionResetError: [WinError 995] The I/O operation has been aborted because of either a thread exit or an application request
    

    @zooba
    Copy link
    Member

    zooba commented Aug 24, 2020

    Any input from the asyncio experts?

    I don't have an issue with handling the exception in this case, and hopefully when someone is up to the task of dealing with the range of edge cases throughout this loop implementation, hopefully they can get the ordering of waits right.

    @zooba zooba added 3.9 only security fixes 3.10 only security fixes labels Aug 24, 2020
    @bdarnell
    Copy link
    Mannequin

    bdarnell mannequin commented Aug 30, 2020

    I've posted a pull request with a test and fix: #22017. It's a more targeted fix than cmeyer's PR (which I didn't even notice until now due to unfamiliarity with the BPO UI)

    @1st1
    Copy link
    Member

    1st1 commented Aug 31, 2020

    New changeset ea5a636 by Ben Darnell in branch 'master':
    bpo-39010: Fix errors logged on proactor loop restart (bpo-22017)
    ea5a636

    @zooba
    Copy link
    Member

    zooba commented Aug 31, 2020

    Thanks, Ben!

    @db3l
    Copy link
    Contributor

    db3l commented Sep 1, 2020

    I've been seeing failures on the Win10 buildbot 3.x branch that seem to correlate with the timing of this change - could there be some further work needed on Windows? Or, if it's a test-only artifact and the warnings are innocuous, something to ignore the output during the tests?

    Specifically, the following warnings occur during test_asyncio:
    Warning -- threading_cleanup() failed to cleanup 1 threads (count: 1, dangling: 2)
    Warning -- Dangling thread: <_MainThread(MainThread, started 5220)>

    See https://buildbot.python.org/all/#/builders/146/builds/23 for the first failing build.

    @bdarnell
    Copy link
    Mannequin

    bdarnell mannequin commented Sep 2, 2020

    I can confirm that those warnings appear to be coming from the test I added here. I'm not sure how to interpret them, though - what does it mean for the main thread to be dangling?

    @db3l
    Copy link
    Contributor

    db3l commented Sep 2, 2020

    I'm guessing the warning appears odd as we're seeing a thread shutdown data race. The message is produced by threading_cleanup in support/threading_helper.py, and it appears that between the first and second lines one of the initially dangling threads goes away, so the only one left to be enumerated is the main thread, at which point the function is simply listing all threads currently alive. But by that point it's just the main thread remaining.

    I notice in the test that you have a comment about needing to wait for f to complete or you get a warning about thread not shutting down cleanly. Was that a similar warning? The run_until_complete(f) line seems to have no effect on the buildbot.

    If I added a small sleep at the end of the test the warnings go away on the buildbot. The buildbot is a fairly fast machine, so perhaps the test just needs to wait somehow for the event loop to fully shut down or something.

    The most direct cause of the warnings seems to be the self.loop.close() call - if I just comment that out the test runs warning-free without any extra delay needed.

    I don't know much about asyncio tests, but it would appear the close call in the test defeats some of the logic in the close_loop teardown code that runs under TestCase (in utils.py), which under Windows is just a call to run_until_complete(loop.shutdown_default_executor()). If I add that same call to the test prior to the close it also passes cleanly.

    So if closing the loop in the test itself is crucial, I'd suggest including the extra run_until_complete call. If closing isn't crucial to the test, simply removing it seems to address the issue. I'm not sure if its removal then has any implications for the extra run_until_complete(f) call in the test, as I can't see any impact from that on the buildbot.

    @bdarnell
    Copy link
    Mannequin

    bdarnell mannequin commented Sep 2, 2020

    I've fixed the test and added some commentary about the different levels of clean shutdown we must do to quiet all the warnings: #22066

    @1st1
    Copy link
    Member

    1st1 commented Sep 3, 2020

    New changeset be435ae by Ben Darnell in branch 'master':
    bpo-39010: Improve test shutdown (bpo-22066)
    be435ae

    @1st1
    Copy link
    Member

    1st1 commented Sep 3, 2020

    New changeset 49571c0 by Miss Islington (bot) in branch '3.9':
    bpo-39010: Fix errors logged on proactor loop restart (GH-22017) (bpo-22034)
    49571c0

    @1st1
    Copy link
    Member

    1st1 commented Sep 3, 2020

    New changeset a986b06 by Miss Islington (bot) in branch '3.8':
    bpo-39010: Fix errors logged on proactor loop restart (GH-22017) (bpo-22035)
    a986b06

    @1st1
    Copy link
    Member

    1st1 commented Sep 3, 2020

    New changeset 40e2444 by Miss Islington (bot) in branch '3.8':
    bpo-39010: Improve test shutdown (GH-22066) (bpo-22083)
    40e2444

    @1st1
    Copy link
    Member

    1st1 commented Sep 3, 2020

    New changeset 8d68e59 by Miss Islington (bot) in branch '3.9':
    bpo-39010: Improve test shutdown (GH-22066) (bpo-22082)
    8d68e59

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes 3.9 only security fixes 3.10 only security fixes topic-asyncio
    Projects
    Status: Done
    Development

    No branches or pull requests

    5 participants