classification
Title: asyncio: BaseEventLoop.close() shutdowns the executor without waiting causing leak of dangling threads
Type: Stage: patch review
Components: asyncio, Tests Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: asvetlov, lukasz.langa, vstinner, yselivanov
Priority: normal Keywords: patch

Created on 2018-07-03 22:57 by vstinner, last changed 2019-06-05 08:58 by asvetlov.

Pull Requests
URL Status Linked Edit
PR 13802 merged lukasz.langa, 2019-06-04 10:43
Messages (13)
msg321006 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-07-03 22:57
AMD64 FreeBSD 10.x Shared 3.7:
http://buildbot.python.org/all/#/builders/124/builds/410

...
test_remove_fds_after_closing (test.test_asyncio.test_events.KqueueEventLoopTests) ... ok
test_run_in_executor (test.test_asyncio.test_events.KqueueEventLoopTests) ... ok
test_run_in_executor_cancel (test.test_asyncio.test_events.KqueueEventLoopTests) ...

  Warning -- threading_cleanup() failed to cleanup 1 threads (count: 1, dangling: 2)
  Dangling thread: <Thread(ThreadPoolExecutor-13_0, started daemon 34471091200)>
  Dangling thread: <_MainThread(MainThread, started 34393318400)>

ok
test_run_until_complete (test.test_asyncio.test_events.KqueueEventLoopTests) ... ok
test_run_until_complete_nesting (test.test_asyncio.test_events.KqueueEventLoopTests) ... ok
...
1 test altered the execution environment:
    test_asyncio
msg321534 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-07-12 10:20
It's easy to reproduce the issue on Linux:

diff --git a/Lib/test/test_asyncio/test_events.py b/Lib/test/test_asyncio/test_events.py
index 11cd950df1..df4c2b9849 100644
--- a/Lib/test/test_asyncio/test_events.py
+++ b/Lib/test/test_asyncio/test_events.py
@@ -359,7 +359,7 @@ class EventLoopTestsMixin:
             called = True
 
         def run():
-            time.sleep(0.05)
+            time.sleep(1.0)
 
         f2 = self.loop.run_in_executor(None, run)
         f2.cancel()


The problem is that BaseEventLoop.close() shutdowns its default executor  without waiting:

    def close(self):
        ...
        executor = self._default_executor
        if executor is not None:
            self._default_executor = None
            executor.shutdown(wait=True)

I fixed a similar issue in socketserver:

* bpo-31233: for socketserver.ThreadingMixIn
* bpo-31151: for socketserver.ForkingMixIn 
* bpo-33540: add block_on_close attr to socketserver

I suggest to wait by default, but maybe also add a block_on_close attribute to BaseEventLoop (default: False) just for backward compatibility.

What do you think Yury, Andrew, and Guido?
msg326050 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-09-21 22:19
Yury, Andrew: Do you know if the executor doesn't wait on purpose? Would it be possible to change that in Python 3.8?
msg326052 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-09-21 22:29
> Yury, Andrew: Do you know if the executor doesn't wait on purpose? Would it be possible to change that in Python 3.8?

Maybe.  At least we need to add a "timeout" argument to asyncio.run() to let it wait for executor jobs.

I'm also thinking about making OS threads cancellable/interruptable in Python 3.8 (if they run pure Python code).
msg326259 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-09-24 15:58
> Maybe.  At least we need to add a "timeout" argument to asyncio.run() to let it wait for executor jobs.

The shutdown() method of concurrent.futures.Executor API doesn't accept a timeout. It waits for multiple things.

I added "block_on_close = True" class attribute to socketserver.ForkingMixIn and socketserver.ThreadingMixIn. By default, server_close() waits until all children complete, but the wait is non-blocking if block_on_close is false.
msg344475 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-03 21:00
I wrote PR 13786 to fix this issue.
msg344476 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-03 21:09
"executor.shutdown(wait=False)" was added by this change:

commit 4ca7355901049ff9a873cf236091e6780d7a8126
Author: Antoine Pitrou <solipsis@pitrou.net>
Date:   Sun Oct 20 00:54:10 2013 +0200

    Issue #19299: fix refleak test failures in test_asyncio

Guido van Rossum proposed to wait for the executor. Guido proposed " executor.shutdown(wait=False)", but he didn't comment why wait=False:

https://bugs.python.org/issue19299#msg200508
msg344477 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-03 21:31
New changeset 0f0a30f4da4b529e0f7df857b9f575b231b32758 by Victor Stinner in branch 'master':
bpo-34037, asyncio: add BaseEventLoop.wait_executor_on_close (GH-13786)
https://github.com/python/cpython/commit/0f0a30f4da4b529e0f7df857b9f575b231b32758
msg344545 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2019-06-04 11:03
New changeset 7f9a2ae78051877f4d966119e2fcd27ec77eda1d by Łukasz Langa in branch 'master':
Revert "bpo-34037, asyncio: add BaseEventLoop.wait_executor_on_close (GH-13786)" (#13802)
https://github.com/python/cpython/commit/7f9a2ae78051877f4d966119e2fcd27ec77eda1d
msg344547 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-04 11:50
Łukasz Langa reverted my change (PR 13786). I pushed my change without any review to try to fix https://bugs.python.org/issue37137 which blocked the Python 3.8 beta1. Sadly, my change didn't fix this issue. The root cause was differently (and it's now fixed).

Andrew Svetlov asked to make some changes on my API. Yury Selivanov added "I don't like this last minute api decision. Please don't merge this.". See PR 13786 for the discussion. In short, my change was reverted because of the API.

But I still consider that calling executor.shutdown(wait=False) in loop.close() in wrong: we must wait until it finishes. Otherwise, the code "leaks" threads which continue to run and so can cause troubles. I would prefer to wait by default.

Andrew, Yury: Would you be interested to rewrite my change in the "asyncio way"?

IMHO this change can wait Python 3.9, it's not a critical bug.
msg344548 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2019-06-04 11:58
I still think that this is a valuable change but we need to discuss public API.

In fact, in my job we used to override default executor and wait for its shutdown with explicit waiting to make the system robust.

The problem is that loop.close() can "hang" for minutes if there are pending background jobs or slow DNS resolution requests.

We controlled our tasks running in the thread pool carefully and used aiodns for DNS resolving.
msg344552 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-06-04 12:08
> The problem is that loop.close() can "hang" for minutes if there are pending background jobs or slow DNS resolution requests.

Maybe concurrent.futures should be enhanced for that?

Or from asyncio, is it possible to poll the executor and emit a warning if some threads take longer than <threshold in esconds>? It would be great if we could display repr() of the slow job in that case.

Yury also asked for a timeout, but it's not supported by concurrent.futures yet, and I'm not sure what should be done in case of timeout? Retry? Log a warning?
msg344686 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2019-06-05 08:58
I prefer postponing to think about it until all required 3.8 tasks are done :)
History
Date User Action Args
2019-06-05 08:58:01asvetlovsetmessages: + msg344686
2019-06-04 12:08:53vstinnersetmessages: + msg344552
2019-06-04 11:58:42asvetlovsetmessages: + msg344548
2019-06-04 11:50:37vstinnersetmessages: + msg344547
2019-06-04 11:03:36lukasz.langasetnosy: + lukasz.langa
messages: + msg344545
2019-06-04 10:43:42lukasz.langasetkeywords: + patch
stage: patch review
pull_requests: + pull_request13688
2019-06-03 21:31:27vstinnersetmessages: + msg344477
2019-06-03 21:09:09vstinnersetmessages: + msg344476
2019-06-03 21:00:25vstinnersetmessages: + msg344475
versions: + Python 3.8, - Python 3.7
2018-09-24 15:58:45vstinnersetmessages: + msg326259
2018-09-21 22:29:20yselivanovsetmessages: + msg326052
2018-09-21 22:19:51vstinnersetmessages: + msg326050
2018-07-13 21:46:20gvanrossumsetnosy: - gvanrossum
2018-07-12 10:21:26vstinnersettitle: test_asyncio: test_run_in_executor_cancel() leaked a dangling thread on AMD64 FreeBSD 10.x Shared 3.7 -> asyncio: BaseEventLoop.close() shutdowns the executor without waiting causing leak of dangling threads
2018-07-12 10:20:53vstinnersetnosy: + gvanrossum
messages: + msg321534
2018-07-03 22:57:26vstinnercreate