classification
Title: asyncio: BaseEventLoop.close() shutdowns the executor without waiting causing leak of dangling threads
Type: Stage: resolved
Components: asyncio, Tests Versions: Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: aeros, asvetlov, cjrh, cmeyer, lukasz.langa, petr.viktorin, vstinner, yselivanov
Priority: high Keywords: patch

Created on 2018-07-03 22:57 by vstinner, last changed 2020-04-21 21:31 by aeros. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 13802 merged lukasz.langa, 2019-06-04 10:43
PR 15735 merged aeros, 2019-09-09 04:26
PR 16284 merged vstinner, 2019-09-19 14:12
PR 19634 merged aeros, 2020-04-21 20:03
Messages (30)
msg321006 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-07-03 22:57
AMD64 FreeBSD 10.x Shared 3.7:
http://buildbot.python.org/all/#/builders/124/builds/410

...
test_remove_fds_after_closing (test.test_asyncio.test_events.KqueueEventLoopTests) ... ok
test_run_in_executor (test.test_asyncio.test_events.KqueueEventLoopTests) ... ok
test_run_in_executor_cancel (test.test_asyncio.test_events.KqueueEventLoopTests) ...

  Warning -- threading_cleanup() failed to cleanup 1 threads (count: 1, dangling: 2)
  Dangling thread: <Thread(ThreadPoolExecutor-13_0, started daemon 34471091200)>
  Dangling thread: <_MainThread(MainThread, started 34393318400)>

ok
test_run_until_complete (test.test_asyncio.test_events.KqueueEventLoopTests) ... ok
test_run_until_complete_nesting (test.test_asyncio.test_events.KqueueEventLoopTests) ... ok
...
1 test altered the execution environment:
    test_asyncio
msg321534 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-07-12 10:20
It's easy to reproduce the issue on Linux:

diff --git a/Lib/test/test_asyncio/test_events.py b/Lib/test/test_asyncio/test_events.py
index 11cd950df1..df4c2b9849 100644
--- a/Lib/test/test_asyncio/test_events.py
+++ b/Lib/test/test_asyncio/test_events.py
@@ -359,7 +359,7 @@ class EventLoopTestsMixin:
             called = True
 
         def run():
-            time.sleep(0.05)
+            time.sleep(1.0)
 
         f2 = self.loop.run_in_executor(None, run)
         f2.cancel()


The problem is that BaseEventLoop.close() shutdowns its default executor  without waiting:

    def close(self):
        ...
        executor = self._default_executor
        if executor is not None:
            self._default_executor = None
            executor.shutdown(wait=True)

I fixed a similar issue in socketserver:

* bpo-31233: for socketserver.ThreadingMixIn
* bpo-31151: for socketserver.ForkingMixIn 
* bpo-33540: add block_on_close attr to socketserver

I suggest to wait by default, but maybe also add a block_on_close attribute to BaseEventLoop (default: False) just for backward compatibility.

What do you think Yury, Andrew, and Guido?
msg326050 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-09-21 22:19
Yury, Andrew: Do you know if the executor doesn't wait on purpose? Would it be possible to change that in Python 3.8?
msg326052 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-09-21 22:29
> Yury, Andrew: Do you know if the executor doesn't wait on purpose? Would it be possible to change that in Python 3.8?

Maybe.  At least we need to add a "timeout" argument to asyncio.run() to let it wait for executor jobs.

I'm also thinking about making OS threads cancellable/interruptable in Python 3.8 (if they run pure Python code).
msg326259 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-09-24 15:58
> Maybe.  At least we need to add a "timeout" argument to asyncio.run() to let it wait for executor jobs.

The shutdown() method of concurrent.futures.Executor API doesn't accept a timeout. It waits for multiple things.

I added "block_on_close = True" class attribute to socketserver.ForkingMixIn and socketserver.ThreadingMixIn. By default, server_close() waits until all children complete, but the wait is non-blocking if block_on_close is false.
msg344475 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-03 21:00
I wrote PR 13786 to fix this issue.
msg344476 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-03 21:09
"executor.shutdown(wait=False)" was added by this change:

commit 4ca7355901049ff9a873cf236091e6780d7a8126
Author: Antoine Pitrou <solipsis@pitrou.net>
Date:   Sun Oct 20 00:54:10 2013 +0200

    Issue #19299: fix refleak test failures in test_asyncio

Guido van Rossum proposed to wait for the executor. Guido proposed " executor.shutdown(wait=False)", but he didn't comment why wait=False:

https://bugs.python.org/issue19299#msg200508
msg344477 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-03 21:31
New changeset 0f0a30f4da4b529e0f7df857b9f575b231b32758 by Victor Stinner in branch 'master':
bpo-34037, asyncio: add BaseEventLoop.wait_executor_on_close (GH-13786)
https://github.com/python/cpython/commit/0f0a30f4da4b529e0f7df857b9f575b231b32758
msg344545 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2019-06-04 11:03
New changeset 7f9a2ae78051877f4d966119e2fcd27ec77eda1d by Łukasz Langa in branch 'master':
Revert "bpo-34037, asyncio: add BaseEventLoop.wait_executor_on_close (GH-13786)" (#13802)
https://github.com/python/cpython/commit/7f9a2ae78051877f4d966119e2fcd27ec77eda1d
msg344547 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-04 11:50
Łukasz Langa reverted my change (PR 13786). I pushed my change without any review to try to fix https://bugs.python.org/issue37137 which blocked the Python 3.8 beta1. Sadly, my change didn't fix this issue. The root cause was differently (and it's now fixed).

Andrew Svetlov asked to make some changes on my API. Yury Selivanov added "I don't like this last minute api decision. Please don't merge this.". See PR 13786 for the discussion. In short, my change was reverted because of the API.

But I still consider that calling executor.shutdown(wait=False) in loop.close() in wrong: we must wait until it finishes. Otherwise, the code "leaks" threads which continue to run and so can cause troubles. I would prefer to wait by default.

Andrew, Yury: Would you be interested to rewrite my change in the "asyncio way"?

IMHO this change can wait Python 3.9, it's not a critical bug.
msg344548 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2019-06-04 11:58
I still think that this is a valuable change but we need to discuss public API.

In fact, in my job we used to override default executor and wait for its shutdown with explicit waiting to make the system robust.

The problem is that loop.close() can "hang" for minutes if there are pending background jobs or slow DNS resolution requests.

We controlled our tasks running in the thread pool carefully and used aiodns for DNS resolving.
msg344552 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-04 12:08
> The problem is that loop.close() can "hang" for minutes if there are pending background jobs or slow DNS resolution requests.

Maybe concurrent.futures should be enhanced for that?

Or from asyncio, is it possible to poll the executor and emit a warning if some threads take longer than <threshold in esconds>? It would be great if we could display repr() of the slow job in that case.

Yury also asked for a timeout, but it's not supported by concurrent.futures yet, and I'm not sure what should be done in case of timeout? Retry? Log a warning?
msg344686 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2019-06-05 08:58
I prefer postponing to think about it until all required 3.8 tasks are done :)
msg351231 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-09-06 07:04
This bug still occurs randomly on the CI. Example on AppVeyor:
https://ci.appveyor.com/project/python/cpython/builds/27209898

test_reader_callback (test.test_asyncio.test_events.SelectEventLoopTests) ... ok
test_remove_fds_after_closing (test.test_asyncio.test_events.SelectEventLoopTests) ... ok
test_run_in_executor (test.test_asyncio.test_events.SelectEventLoopTests) ... ok
test_run_in_executor_cancel (test.test_asyncio.test_events.SelectEventLoopTests) ... Warning -- threading_cleanup() failed to cleanup 1 threads (count: 1, dangling: 2)
Dangling thread: <Thread(ThreadPoolExecutor-19_0, started daemon 2728)>
Dangling thread: <_MainThread(MainThread, started 1916)>
ok

Any plan to reapply my change, or fix it?
msg351232 - (view) Author: Kyle Stanley (aeros) * (Python committer) Date: 2019-09-06 07:19
> Any plan to reapply my change, or fix it?

I can try to help with this. I'm not the most familiar with the internals of asyncio, but I think it would provide a good learning experience.
msg351349 - (view) Author: Kyle Stanley (aeros) * (Python committer) Date: 2019-09-09 05:50
I've opened PR-15735 which applies the same functionality as Victor's PR-13786, but adds the public getter and setter methods (for both AbstractEventLoop and BaseEventLoop) as requested by Andrew. 

Since this is still causing intermittent CI failures from test_asyncio, I think it's important to implement these changes in some form in the near future.
msg351359 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-09-09 08:34
I change the version to Python 3.9: the 3.8 branch don't accept new features anymore.
msg351384 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2019-09-09 09:43
Here's the API I propose to solve this problem: https://github.com/python/cpython/pull/15735#pullrequestreview-285389412

Summary:

* Add a new loop.shutdown_threadpool() method. Just like with shutdown_asyncgens() -- it would be invalid to call loop.run_in_executer() after loop.shutdown_threadpool() is called.

* Make asyncio.run() to call the new loop.shutdown_threadpool().
msg351394 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-09-09 09:56
"shutdown_threadpool()" name

What do you think of the "shutdown_default_executor()" name?

The default executor can be overriden by set_default_executor():

    def set_default_executor(self, executor):
        if not isinstance(executor, concurrent.futures.ThreadPoolExecutor):
            warnings.warn(
                'Using the default executor that is not an instance of '
                'ThreadPoolExecutor is deprecated and will be prohibited '
                'in Python 3.9',
                DeprecationWarning, 2)
        self._default_executor = executor

The default executor should always be a thread pool, so "shutdown_threadpool()" name is also correct. I have no strong preference.
msg351396 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2019-09-09 09:57
> What do you think of the "shutdown_default_executor()" name?

Yeah, I think it's a better name!
msg352788 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2019-09-19 12:47
New changeset 9fdc64cf1266b6d5bf0503847b5c38e5edc53a14 by Andrew Svetlov (Kyle Stanley) in branch 'master':
bpo-34037: Fix test_asyncio failure and add loop.shutdown_default_executor() (GH-15735)
https://github.com/python/cpython/commit/9fdc64cf1266b6d5bf0503847b5c38e5edc53a14
msg352789 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2019-09-19 12:48
Thnks, Kyle!
msg352797 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-09-19 14:45
New changeset 079931d12223ec98cbf53185b90db48efa61f93f by Victor Stinner in branch 'master':
bpo-34037: test_asyncio uses shutdown_default_executor() (GH-16284)
https://github.com/python/cpython/commit/079931d12223ec98cbf53185b90db48efa61f93f
msg352798 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-09-19 14:47
asyncio.run() now shutdowns the default executor: nice, thanks! That was my request. IMHO it will make asyncio more reliable, especially for tests on the CI.

If it becomes an issue in Python 3.9 (executor hangs forever), we can add new parameters (to run()?) to put a timeout for example.
msg352822 - (view) Author: Kyle Stanley (aeros) * (Python committer) Date: 2019-09-20 02:08
> Thanks, Kyle!

No problem, and thanks for all of the help from Andrew, Yury, and Victor!

> IMHO it will make asyncio more reliable, especially for tests on the CI.

Awesome, that was my primary intention. (:

> If it becomes an issue in Python 3.9 (executor hangs forever),

Feel free to add me to the nosy list if you have to open an issue for it, I'd be glad to help out with this if it becomes an issue.
msg366812 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2020-04-20 08:33
I got a report from a library that ties together asyncio and some other async libraries, getting errors like this:

tests/test_taskgroups.py:60: in test_run_natively
    module.run(testfunc())
/usr/lib64/python3.9/asyncio/runners.py:48: in run
    loop.run_until_complete(loop.shutdown_default_executor())
uvloop/loop.pyx:1451: in uvloop.loop.Loop.run_until_complete
    ???
/usr/lib64/python3.9/asyncio/events.py:254: in shutdown_default_executor
    raise NotImplementedError
E   NotImplementedError

(more at: https://bugzilla.redhat.com/show_bug.cgi?id=1817681#c1 )

I'm not all that familiar with asyncio, but it seems to me that all event loops inherited from BaseEventLoop must be updated to implement this new method to correctly work with run() in Python 3.9. Is that right? If it is, this needs at least a much more prominent What's New entry. Or the hard NotImplementedError should turn into a warning.
msg366821 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-04-20 12:59
I reopen the issue.

> /usr/lib64/python3.9/asyncio/events.py:254: in shutdown_default_executor
>     raise NotImplementedError

That means that the project (python-anyio) implements its own event loop which inherits from AbstractEventLoop, it doesn't use BaseEventLoop which implements shutdown_default_executor().

AbstractEventLoop documentation says:

https://docs.python.org/dev/library/asyncio-eventloop.html#asyncio.AbstractEventLoop

"The Event Loop Methods section lists all methods that an alternative implementation of AbstractEventLoop should have defined."

It points to the following documentation which lists shutdown_asyncgens():

It points to https://docs.python.org/dev/library/asyncio-eventloop.html#asyncio-event-loop


> I'm not all that familiar with asyncio, but it seems to me that all event loops inherited from BaseEventLoop must be updated to implement this new method to correctly work with run() in Python 3.9. Is that right?

Yes

> If it is, this needs at least a much more prominent What's New entry. Or the hard NotImplementedError should turn into a warning.

Raising NotImplementedError is a deliberate design choice.

Documentation the new requirement in the following section sounds like a good idea.

https://docs.python.org/dev/whatsnew/3.9.html#changes-in-the-python-api

Kyle: Can you please add a short sentence there to document the new requirement?
msg366936 - (view) Author: Kyle Stanley (aeros) * (Python committer) Date: 2020-04-21 19:19
> Kyle: Can you please add a short sentence there to document the new requirement?

Yep, I'll open a PR soon. I'm glad this is coming up now rather than post-release, thanks for bringing attention to it.
msg366939 - (view) Author: Kyle Stanley (aeros) * (Python committer) Date: 2020-04-21 20:50
New changeset 9c82ea7868a1c5ecf88891c627b5c399357eb05e by Kyle Stanley in branch 'master':
bpo-34037: Add Python API whatsnew for loop.shutdown_default_executor() (#19634)
https://github.com/python/cpython/commit/9c82ea7868a1c5ecf88891c627b5c399357eb05e
msg366943 - (view) Author: Kyle Stanley (aeros) * (Python committer) Date: 2020-04-21 21:31
PR-19634 should address the primary issue of documenting in the 3.9 whatsnew that alternative event loops should implement loop.shutdown_default_executor().

For reference, here's the implementation in BaseEventLoop: https://github.com/python/cpython/blob/9c82ea7868a1c5ecf88891c627b5c399357eb05e/Lib/asyncio/base_events.py#L556-L574. I believe it could be used in alternative event loops without too much modification.
History
Date User Action Args
2020-04-21 21:31:38aerossetstatus: open -> closed
resolution: fixed
messages: + msg366943

stage: patch review -> resolved
2020-04-21 20:50:55aerossetmessages: + msg366939
2020-04-21 20:03:31aerossetstage: resolved -> patch review
pull_requests: + pull_request18960
2020-04-21 19:19:59aerossetmessages: + msg366936
2020-04-20 15:10:19cmeyersetnosy: + cmeyer
2020-04-20 12:59:25vstinnersetstatus: closed -> open
resolution: fixed -> (no value)
messages: + msg366821
2020-04-20 08:33:48petr.viktorinsetnosy: + petr.viktorin
messages: + msg366812
2019-09-20 02:08:48aerossetmessages: + msg352822
2019-09-19 14:47:52vstinnersetmessages: + msg352798
2019-09-19 14:45:09vstinnersetmessages: + msg352797
2019-09-19 14:12:39vstinnersetpull_requests: + pull_request15869
2019-09-19 12:48:29asvetlovsetstatus: open -> closed
resolution: fixed
messages: + msg352789

stage: patch review -> resolved
2019-09-19 12:47:29asvetlovsetmessages: + msg352788
2019-09-11 13:56:06cjrhsetnosy: + cjrh
2019-09-09 09:57:20yselivanovsetmessages: + msg351396
2019-09-09 09:56:33vstinnersetmessages: + msg351394
2019-09-09 09:43:07yselivanovsetmessages: + msg351384
2019-09-09 08:34:39vstinnersetmessages: + msg351359
versions: + Python 3.9, - Python 3.8
2019-09-09 05:50:36aerossetpriority: normal -> high

messages: + msg351349
2019-09-09 04:26:15aerossetpull_requests: + pull_request15390
2019-09-06 07:19:54aerossetnosy: + aeros
messages: + msg351232
2019-09-06 07:04:51vstinnersetmessages: + msg351231
2019-06-05 08:58:01asvetlovsetmessages: + msg344686
2019-06-04 12:08:53vstinnersetmessages: + msg344552
2019-06-04 11:58:42asvetlovsetmessages: + msg344548
2019-06-04 11:50:37vstinnersetmessages: + msg344547
2019-06-04 11:03:36lukasz.langasetnosy: + lukasz.langa
messages: + msg344545
2019-06-04 10:43:42lukasz.langasetkeywords: + patch
stage: patch review
pull_requests: + pull_request13688
2019-06-03 21:31:27vstinnersetmessages: + msg344477
2019-06-03 21:09:09vstinnersetmessages: + msg344476
2019-06-03 21:00:25vstinnersetmessages: + msg344475
versions: + Python 3.8, - Python 3.7
2018-09-24 15:58:45vstinnersetmessages: + msg326259
2018-09-21 22:29:20yselivanovsetmessages: + msg326052
2018-09-21 22:19:51vstinnersetmessages: + msg326050
2018-07-13 21:46:20gvanrossumsetnosy: - gvanrossum
2018-07-12 10:21:26vstinnersettitle: test_asyncio: test_run_in_executor_cancel() leaked a dangling thread on AMD64 FreeBSD 10.x Shared 3.7 -> asyncio: BaseEventLoop.close() shutdowns the executor without waiting causing leak of dangling threads
2018-07-12 10:20:53vstinnersetnosy: + gvanrossum
messages: + msg321534
2018-07-03 22:57:26vstinnercreate