classification
Title: Memory leak in ThreadPoolExecutor + run_in_executor
Type: resource usage Stage: patch review
Components: asyncio Versions: Python 3.9, Python 3.7, Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Evgeny Nizhibitsky, asvetlov, eamanu, yselivanov
Priority: normal Keywords: patch

Created on 2019-10-10 09:41 by Evgeny Nizhibitsky, last changed 2019-11-05 01:20 by vstinner.

Files
File name Uploaded Description Edit
python-leak.zip Evgeny Nizhibitsky, 2019-10-10 09:41 Minimal python script + dockerfile/makefile for reproducibility
Pull Requests
URL Status Linked Edit
PR 17038 closed eamanu, 2019-11-03 23:21
Messages (14)
msg354351 - (view) Author: Evgeny Nizhibitsky (Evgeny Nizhibitsky) Date: 2019-10-10 09:41
I have run into a memory leak caused by using run_in_executor + ThreadPoolExecutor while running some stability tests with custom web services.

It was 1 MB leaked for 1k requests made for my case and I've extracted the root cause and converted it into minimal script with both mentioned parts + just NOP function to "run".

The script can easily eat up to 1 GB of memory in less then 1 minute now. It uses external psutil library to report the memory allocated but it can be easily commented out and the leak will stay anyway.

One can found that script attached + Dockerfile/Makefile for reproducibility. I've also reproduced it in my own conda-based 3.7 environment as well as the master branch of cpython.
msg354352 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-10 09:52
Well, that's a common issue when using asyncio: you forgot await.

async def main(_loop):
    while True:
        with futures.ThreadPoolExecutor() as pool:
            _loop.run_in_executor(pool, nop)
        sys.stdout.write(f'\r{get_mem():0.3f}MB')

It should be: "await _loop.run_in_executor(pool, nop)" ;-)

Sadly, PYTHONASYNCIODEBUG=1 env var doesn't complain on this bug.

See: https://docs.python.org/dev/library/asyncio-dev.html#debug-mode
msg354353 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2019-10-10 09:54
You MUST await a future returned from `loop.run_in_executor()` to avoid the leak.

Yuri, what should we do with the issue? I see the second similar report in the last half of the year.
Sure, we can add weakrefs somewhere in futures._chain_future() but I pretty sure that the proper fix is just enforcing `await` here, e.g.

1. rename existing run_in_executor() into a private _run_in_executor() function.

2. write async trampoline:
async def run_in_executor(self, executor, func, *args):
    return await self._run_in_executor(executor, func, *args)
msg354354 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2019-10-10 09:54
Victor answered the first :)
msg354355 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-10 09:55
> Yuri, what should we do with the issue?

A Task emits a warning when it's not awaited. Can a Task be used instead of a Future in run_in_executor()?
msg354356 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2019-10-10 10:00
> Can a Task be used instead of a Future in run_in_executor()?

I don't think that the task is required here. The problem is that run_in_executor is a function that returns asyncio future; that is in turn a wrapper around concurrent future object.

If we convert run_in_executor() into async function we'll get a warning about unawaited coroutine even without asyncio debug mode.
msg354357 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-10 10:02
> If we convert run_in_executor() into async function we'll get a warning about unawaited coroutine even without asyncio debug mode.

That sounds like a good solution :-)
msg354358 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2019-10-10 10:02
The change is slightly not backward compatible but
1. It's pretty visible. In the worst-case instead of the memory leak people see a RuntimeWarning
2. We did it already for a part of API, e.g. loop.sock_read() and family
msg354360 - (view) Author: Evgeny Nizhibitsky (Evgeny Nizhibitsky) Date: 2019-10-10 10:38
Oh, I see that in the initial code with leakage (it was heavy ThreadPoolExecutor + xgboost thing) there was an await but I must have lost it somewhere while reducing it to the minimal example and finished in the wrong direction.

Glad too see that it raised a discussion to prevent others from getting into this silent trap.
msg355835 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2019-11-01 19:08
> The change is slightly not backward compatible but

Yeah, that's my main problem with converting `loop.run_in_executor()` to a coroutine. When I attempted doing that I discovered that there's code that expects the method to return a Future, and so expects it have the `cancel()` method.

If we convert it to a coroutine a lot of code will break, which might be OK if it's really necessary. Is it though? Can we return a special Future subclass that complains if it's not awaited?  Would that fix the problem?
msg355836 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2019-11-01 19:18
I thought about returning a special subclass.
Future has too rich API: get_loop(), done(), result() etc.

What's about returning the proxy object with future instance embedded; the object raises DeprecationWarning for everythin except __repr__, __del__ and __await__, __getattr__ redirects to getattr(self._fut, name) for all other attributes access. 

It is a more complex solution but definitely 100% backward compatible; plus the solution we can prepare people for removing the deprecated code eventually.
msg355843 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2019-11-01 21:12
> It is a more complex solution but definitely 100% backward compatible; plus the solution we can prepare people for removing the deprecated code eventually.

Yeah.  Do you think it's worth it bothering with this old low-level API instead of making a new high-level one?  I don't, but if you do the feel free to change it.
msg355864 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2019-11-02 09:55
The API exists, people use it and get the memory leak.
We should either remove the API (not realistic dream at least for many years) or fix it.  There is no choice actually.
msg355882 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2019-11-02 21:26
> We should either remove the API (not realistic dream at least for many years) or fix it.  There is no choice actually.

I don't understand. What happens if we don't await the future that run_in_executor returns? Does it get GCed eventually? Why is memory leaking?
History
Date User Action Args
2019-11-05 01:20:32vstinnersetnosy: - vstinner
2019-11-03 23:21:28eamanusetkeywords: + patch
stage: patch review
pull_requests: + pull_request16551
2019-11-03 00:38:13eamanusetnosy: + eamanu
2019-11-02 21:26:06yselivanovsetmessages: + msg355882
2019-11-02 09:55:09asvetlovsetmessages: + msg355864
2019-11-01 21:12:46yselivanovsetmessages: + msg355843
2019-11-01 19:18:48asvetlovsetmessages: + msg355836
2019-11-01 19:08:11yselivanovsetmessages: + msg355835
2019-10-10 10:38:17Evgeny Nizhibitskysetmessages: + msg354360
2019-10-10 10:02:49asvetlovsetmessages: + msg354358
2019-10-10 10:02:05vstinnersetmessages: + msg354357
2019-10-10 10:00:52asvetlovsetmessages: + msg354356
2019-10-10 09:55:34vstinnersetnosy: + vstinner
messages: + msg354355
2019-10-10 09:54:51asvetlovsetmessages: + msg354354
2019-10-10 09:54:09asvetlovsetstatus: closed -> open

nosy: - vstinner
messages: + msg354353

resolution: not a bug ->
stage: resolved -> (no value)
2019-10-10 09:52:38vstinnersetstatus: open -> closed

nosy: + vstinner
messages: + msg354352

resolution: not a bug
stage: resolved
2019-10-10 09:41:34Evgeny Nizhibitskycreate