Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

asyncio: BaseEventLoop.close() shutdowns the executor without waiting causing leak of dangling threads #78218

Closed
vstinner opened this issue Jul 3, 2018 · 30 comments
Labels
3.9 only security fixes tests Tests in the Lib/test dir topic-asyncio

Comments

@vstinner
Copy link
Member

vstinner commented Jul 3, 2018

BPO 34037
Nosy @vstinner, @encukou, @asvetlov, @cjrh, @ambv, @1st1, @cmeyer, @aeros
PRs
  • Revert "bpo-34037, asyncio: add BaseEventLoop.wait_executor_on_close (GH-13786)" #13802
  • bpo-34037: Fix test_asyncio failure and add loop.shutdown_default_executor() #15735
  • bpo-34037: test_asyncio uses shutdown_default_executor() #16284
  • bpo-34037: Add Python API whatsnew for loop.shutdown_default_executor() #19634
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-04-21.21:31:38.061>
    created_at = <Date 2018-07-03.22:57:26.558>
    labels = ['tests', '3.9', 'expert-asyncio']
    title = 'asyncio: BaseEventLoop.close() shutdowns the executor without waiting causing leak of dangling threads'
    updated_at = <Date 2020-04-21.21:31:38.060>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2020-04-21.21:31:38.060>
    actor = 'aeros'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-04-21.21:31:38.061>
    closer = 'aeros'
    components = ['Tests', 'asyncio']
    creation = <Date 2018-07-03.22:57:26.558>
    creator = 'vstinner'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 34037
    keywords = ['patch']
    message_count = 30.0
    messages = ['321006', '321534', '326050', '326052', '326259', '344475', '344476', '344477', '344545', '344547', '344548', '344552', '344686', '351231', '351232', '351349', '351359', '351384', '351394', '351396', '352788', '352789', '352797', '352798', '352822', '366812', '366821', '366936', '366939', '366943']
    nosy_count = 8.0
    nosy_names = ['vstinner', 'petr.viktorin', 'asvetlov', 'cjrh', 'lukasz.langa', 'yselivanov', 'cmeyer', 'aeros']
    pr_nums = ['13802', '15735', '16284', '19634']
    priority = 'high'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue34037'
    versions = ['Python 3.9']

    @vstinner
    Copy link
    Member Author

    vstinner commented Jul 3, 2018

    AMD64 FreeBSD 10.x Shared 3.7:
    http://buildbot.python.org/all/#/builders/124/builds/410

    ...
    test_remove_fds_after_closing (test.test_asyncio.test_events.KqueueEventLoopTests) ... ok
    test_run_in_executor (test.test_asyncio.test_events.KqueueEventLoopTests) ... ok
    test_run_in_executor_cancel (test.test_asyncio.test_events.KqueueEventLoopTests) ...

    Warning -- threading_cleanup() failed to cleanup 1 threads (count: 1, dangling: 2)
    Dangling thread: <Thread(ThreadPoolExecutor-13_0, started daemon 34471091200)>
    Dangling thread: <_MainThread(MainThread, started 34393318400)>

    ok
    test_run_until_complete (test.test_asyncio.test_events.KqueueEventLoopTests) ... ok
    test_run_until_complete_nesting (test.test_asyncio.test_events.KqueueEventLoopTests) ... ok
    ...
    1 test altered the execution environment:
    test_asyncio

    @vstinner vstinner added 3.7 (EOL) end of life tests Tests in the Lib/test dir topic-asyncio labels Jul 3, 2018
    @vstinner
    Copy link
    Member Author

    It's easy to reproduce the issue on Linux:

    diff --git a/Lib/test/test_asyncio/test_events.py b/Lib/test/test_asyncio/test_events.py
    index 11cd950df1..df4c2b9849 100644
    --- a/Lib/test/test_asyncio/test_events.py
    +++ b/Lib/test/test_asyncio/test_events.py
    @@ -359,7 +359,7 @@ class EventLoopTestsMixin:
                 called = True
     
             def run():
    -            time.sleep(0.05)
    +            time.sleep(1.0)
     
             f2 = self.loop.run_in_executor(None, run)
             f2.cancel()

    The problem is that BaseEventLoop.close() shutdowns its default executor without waiting:

        def close(self):
            ...
            executor = self._default_executor
            if executor is not None:
                self._default_executor = None
                executor.shutdown(wait=True)

    I fixed a similar issue in socketserver:

    • bpo-31233: for socketserver.ThreadingMixIn
    • bpo-31151: for socketserver.ForkingMixIn
    • bpo-33540: add block_on_close attr to socketserver

    I suggest to wait by default, but maybe also add a block_on_close attribute to BaseEventLoop (default: False) just for backward compatibility.

    What do you think Yury, Andrew, and Guido?

    @vstinner vstinner changed the title test_asyncio: test_run_in_executor_cancel() leaked a dangling thread on AMD64 FreeBSD 10.x Shared 3.7 asyncio: BaseEventLoop.close() shutdowns the executor without waiting causing leak of dangling threads Jul 12, 2018
    @vstinner
    Copy link
    Member Author

    Yury, Andrew: Do you know if the executor doesn't wait on purpose? Would it be possible to change that in Python 3.8?

    @1st1
    Copy link
    Member

    1st1 commented Sep 21, 2018

    Yury, Andrew: Do you know if the executor doesn't wait on purpose? Would it be possible to change that in Python 3.8?

    Maybe. At least we need to add a "timeout" argument to asyncio.run() to let it wait for executor jobs.

    I'm also thinking about making OS threads cancellable/interruptable in Python 3.8 (if they run pure Python code).

    @vstinner
    Copy link
    Member Author

    Maybe. At least we need to add a "timeout" argument to asyncio.run() to let it wait for executor jobs.

    The shutdown() method of concurrent.futures.Executor API doesn't accept a timeout. It waits for multiple things.

    I added "block_on_close = True" class attribute to socketserver.ForkingMixIn and socketserver.ThreadingMixIn. By default, server_close() waits until all children complete, but the wait is non-blocking if block_on_close is false.

    @vstinner
    Copy link
    Member Author

    vstinner commented Jun 3, 2019

    I wrote PR 13786 to fix this issue.

    @vstinner vstinner added 3.8 only security fixes and removed 3.7 (EOL) end of life labels Jun 3, 2019
    @vstinner
    Copy link
    Member Author

    vstinner commented Jun 3, 2019

    "executor.shutdown(wait=False)" was added by this change:

    commit 4ca7355
    Author: Antoine Pitrou <solipsis@pitrou.net>
    Date: Sun Oct 20 00:54:10 2013 +0200

    Issue bpo-19299: fix refleak test failures in test_asyncio
    

    Guido van Rossum proposed to wait for the executor. Guido proposed " executor.shutdown(wait=False)", but he didn't comment why wait=False:

    https://bugs.python.org/issue19299#msg200508

    @vstinner
    Copy link
    Member Author

    vstinner commented Jun 3, 2019

    New changeset 0f0a30f by Victor Stinner in branch 'master':
    bpo-34037, asyncio: add BaseEventLoop.wait_executor_on_close (GH-13786)
    0f0a30f

    @ambv
    Copy link
    Contributor

    ambv commented Jun 4, 2019

    New changeset 7f9a2ae by Łukasz Langa in branch 'master':
    Revert "bpo-34037, asyncio: add BaseEventLoop.wait_executor_on_close (GH-13786)" (bpo-13802)
    7f9a2ae

    @vstinner
    Copy link
    Member Author

    vstinner commented Jun 4, 2019

    Łukasz Langa reverted my change (PR 13786). I pushed my change without any review to try to fix https://bugs.python.org/issue37137 which blocked the Python 3.8 beta1. Sadly, my change didn't fix this issue. The root cause was differently (and it's now fixed).

    Andrew Svetlov asked to make some changes on my API. Yury Selivanov added "I don't like this last minute api decision. Please don't merge this.". See PR 13786 for the discussion. In short, my change was reverted because of the API.

    But I still consider that calling executor.shutdown(wait=False) in loop.close() in wrong: we must wait until it finishes. Otherwise, the code "leaks" threads which continue to run and so can cause troubles. I would prefer to wait by default.

    Andrew, Yury: Would you be interested to rewrite my change in the "asyncio way"?

    IMHO this change can wait Python 3.9, it's not a critical bug.

    @asvetlov
    Copy link
    Contributor

    asvetlov commented Jun 4, 2019

    I still think that this is a valuable change but we need to discuss public API.

    In fact, in my job we used to override default executor and wait for its shutdown with explicit waiting to make the system robust.

    The problem is that loop.close() can "hang" for minutes if there are pending background jobs or slow DNS resolution requests.

    We controlled our tasks running in the thread pool carefully and used aiodns for DNS resolving.

    @vstinner
    Copy link
    Member Author

    vstinner commented Jun 4, 2019

    The problem is that loop.close() can "hang" for minutes if there are pending background jobs or slow DNS resolution requests.

    Maybe concurrent.futures should be enhanced for that?

    Or from asyncio, is it possible to poll the executor and emit a warning if some threads take longer than <threshold in esconds>? It would be great if we could display repr() of the slow job in that case.

    Yury also asked for a timeout, but it's not supported by concurrent.futures yet, and I'm not sure what should be done in case of timeout? Retry? Log a warning?

    @asvetlov
    Copy link
    Contributor

    asvetlov commented Jun 5, 2019

    I prefer postponing to think about it until all required 3.8 tasks are done :)

    @vstinner
    Copy link
    Member Author

    vstinner commented Sep 6, 2019

    This bug still occurs randomly on the CI. Example on AppVeyor:
    https://ci.appveyor.com/project/python/cpython/builds/27209898

    test_reader_callback (test.test_asyncio.test_events.SelectEventLoopTests) ... ok
    test_remove_fds_after_closing (test.test_asyncio.test_events.SelectEventLoopTests) ... ok
    test_run_in_executor (test.test_asyncio.test_events.SelectEventLoopTests) ... ok
    test_run_in_executor_cancel (test.test_asyncio.test_events.SelectEventLoopTests) ... Warning -- threading_cleanup() failed to cleanup 1 threads (count: 1, dangling: 2)
    Dangling thread: <Thread(ThreadPoolExecutor-19_0, started daemon 2728)>
    Dangling thread: <_MainThread(MainThread, started 1916)>
    ok

    Any plan to reapply my change, or fix it?

    @aeros
    Copy link
    Contributor

    aeros commented Sep 6, 2019

    Any plan to reapply my change, or fix it?

    I can try to help with this. I'm not the most familiar with the internals of asyncio, but I think it would provide a good learning experience.

    @aeros
    Copy link
    Contributor

    aeros commented Sep 9, 2019

    I've opened PR-15735 which applies the same functionality as Victor's PR-13786, but adds the public getter and setter methods (for both AbstractEventLoop and BaseEventLoop) as requested by Andrew.

    Since this is still causing intermittent CI failures from test_asyncio, I think it's important to implement these changes in some form in the near future.

    @vstinner
    Copy link
    Member Author

    vstinner commented Sep 9, 2019

    I change the version to Python 3.9: the 3.8 branch don't accept new features anymore.

    @vstinner vstinner added 3.9 only security fixes and removed 3.8 only security fixes labels Sep 9, 2019
    @1st1
    Copy link
    Member

    1st1 commented Sep 9, 2019

    Here's the API I propose to solve this problem: #15735 (review)

    Summary:

    • Add a new loop.shutdown_threadpool() method. Just like with shutdown_asyncgens() -- it would be invalid to call loop.run_in_executer() after loop.shutdown_threadpool() is called.

    • Make asyncio.run() to call the new loop.shutdown_threadpool().

    @vstinner
    Copy link
    Member Author

    vstinner commented Sep 9, 2019

    "shutdown_threadpool()" name

    What do you think of the "shutdown_default_executor()" name?

    The default executor can be overriden by set_default_executor():

        def set_default_executor(self, executor):
            if not isinstance(executor, concurrent.futures.ThreadPoolExecutor):
                warnings.warn(
                    'Using the default executor that is not an instance of '
                    'ThreadPoolExecutor is deprecated and will be prohibited '
                    'in Python 3.9',
                    DeprecationWarning, 2)
            self._default_executor = executor

    The default executor should always be a thread pool, so "shutdown_threadpool()" name is also correct. I have no strong preference.

    @1st1
    Copy link
    Member

    1st1 commented Sep 9, 2019

    What do you think of the "shutdown_default_executor()" name?

    Yeah, I think it's a better name!

    @asvetlov
    Copy link
    Contributor

    New changeset 9fdc64c by Andrew Svetlov (Kyle Stanley) in branch 'master':
    bpo-34037: Fix test_asyncio failure and add loop.shutdown_default_executor() (GH-15735)
    9fdc64c

    @asvetlov
    Copy link
    Contributor

    Thnks, Kyle!

    @vstinner
    Copy link
    Member Author

    New changeset 079931d by Victor Stinner in branch 'master':
    bpo-34037: test_asyncio uses shutdown_default_executor() (GH-16284)
    079931d

    @vstinner
    Copy link
    Member Author

    asyncio.run() now shutdowns the default executor: nice, thanks! That was my request. IMHO it will make asyncio more reliable, especially for tests on the CI.

    If it becomes an issue in Python 3.9 (executor hangs forever), we can add new parameters (to run()?) to put a timeout for example.

    @aeros
    Copy link
    Contributor

    aeros commented Sep 20, 2019

    Thanks, Kyle!

    No problem, and thanks for all of the help from Andrew, Yury, and Victor!

    IMHO it will make asyncio more reliable, especially for tests on the CI.

    Awesome, that was my primary intention. (:

    If it becomes an issue in Python 3.9 (executor hangs forever),

    Feel free to add me to the nosy list if you have to open an issue for it, I'd be glad to help out with this if it becomes an issue.

    @encukou
    Copy link
    Member

    encukou commented Apr 20, 2020

    I got a report from a library that ties together asyncio and some other async libraries, getting errors like this:

    tests/test_taskgroups.py:60: in test_run_natively
    module.run(testfunc())
    /usr/lib64/python3.9/asyncio/runners.py:48: in run
    loop.run_until_complete(loop.shutdown_default_executor())
    uvloop/loop.pyx:1451: in uvloop.loop.Loop.run_until_complete
    ???
    /usr/lib64/python3.9/asyncio/events.py:254: in shutdown_default_executor
    raise NotImplementedError
    E NotImplementedError

    (more at: https://bugzilla.redhat.com/show_bug.cgi?id=1817681#c1 )

    I'm not all that familiar with asyncio, but it seems to me that all event loops inherited from BaseEventLoop must be updated to implement this new method to correctly work with run() in Python 3.9. Is that right? If it is, this needs at least a much more prominent What's New entry. Or the hard NotImplementedError should turn into a warning.

    @vstinner
    Copy link
    Member Author

    I reopen the issue.

    /usr/lib64/python3.9/asyncio/events.py:254: in shutdown_default_executor
    raise NotImplementedError

    That means that the project (python-anyio) implements its own event loop which inherits from AbstractEventLoop, it doesn't use BaseEventLoop which implements shutdown_default_executor().

    AbstractEventLoop documentation says:

    https://docs.python.org/dev/library/asyncio-eventloop.html#asyncio.AbstractEventLoop

    "The Event Loop Methods section lists all methods that an alternative implementation of AbstractEventLoop should have defined."

    It points to the following documentation which lists shutdown_asyncgens():

    It points to https://docs.python.org/dev/library/asyncio-eventloop.html#asyncio-event-loop

    I'm not all that familiar with asyncio, but it seems to me that all event loops inherited from BaseEventLoop must be updated to implement this new method to correctly work with run() in Python 3.9. Is that right?

    Yes

    If it is, this needs at least a much more prominent What's New entry. Or the hard NotImplementedError should turn into a warning.

    Raising NotImplementedError is a deliberate design choice.

    Documentation the new requirement in the following section sounds like a good idea.

    https://docs.python.org/dev/whatsnew/3.9.html#changes-in-the-python-api

    Kyle: Can you please add a short sentence there to document the new requirement?

    @vstinner vstinner reopened this Apr 20, 2020
    @aeros
    Copy link
    Contributor

    aeros commented Apr 21, 2020

    Kyle: Can you please add a short sentence there to document the new requirement?

    Yep, I'll open a PR soon. I'm glad this is coming up now rather than post-release, thanks for bringing attention to it.

    @aeros
    Copy link
    Contributor

    aeros commented Apr 21, 2020

    New changeset 9c82ea7 by Kyle Stanley in branch 'master':
    bpo-34037: Add Python API whatsnew for loop.shutdown_default_executor() (bpo-19634)
    9c82ea7

    @aeros
    Copy link
    Contributor

    aeros commented Apr 21, 2020

    PR-19634 should address the primary issue of documenting in the 3.9 whatsnew that alternative event loops should implement loop.shutdown_default_executor().

    For reference, here's the implementation in BaseEventLoop:

    async def shutdown_default_executor(self):
    """Schedule the shutdown of the default executor."""
    self._executor_shutdown_called = True
    if self._default_executor is None:
    return
    future = self.create_future()
    thread = threading.Thread(target=self._do_shutdown, args=(future,))
    thread.start()
    try:
    await future
    finally:
    thread.join()
    def _do_shutdown(self, future):
    try:
    self._default_executor.shutdown(wait=True)
    self.call_soon_threadsafe(future.set_result, None)
    except Exception as ex:
    self.call_soon_threadsafe(future.set_exception, ex)
    . I believe it could be used in alternative event loops without too much modification.

    @aeros aeros closed this as completed Apr 21, 2020
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes tests Tests in the Lib/test dir topic-asyncio
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants