Issue 38430: Memory leak in ThreadPoolExecutor + run_in_executor

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/82611

classification

Title:	Memory leak in ThreadPoolExecutor + run_in_executor
Type:	resource usage	Stage:	patch review
Components:	asyncio	Versions:	Python 3.9, Python 3.7, Python 3.6

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	Evgeny Nizhibitsky, asvetlov, eamanu, yselivanov
Priority:	normal	Keywords:	patch

Created on 2019-10-10 09:41 by Evgeny Nizhibitsky, last changed 2022-04-11 14:59 by admin.

Files
File name	Uploaded	Description	Edit
python-leak.zip	Evgeny Nizhibitsky, 2019-10-10 09:41	Minimal python script + dockerfile/makefile for reproducibility

Pull Requests
URL	Status	Linked	Edit
PR 17038	closed	eamanu, 2019-11-03 23:21

Messages (14)
msg354351 - (view)	Author: Evgeny Nizhibitsky (Evgeny Nizhibitsky)	Date: 2019-10-10 09:41
I have run into a memory leak caused by using run_in_executor + ThreadPoolExecutor while running some stability tests with custom web services. It was 1 MB leaked for 1k requests made for my case and I've extracted the root cause and converted it into minimal script with both mentioned parts + just NOP function to "run". The script can easily eat up to 1 GB of memory in less then 1 minute now. It uses external psutil library to report the memory allocated but it can be easily commented out and the leak will stay anyway. One can found that script attached + Dockerfile/Makefile for reproducibility. I've also reproduced it in my own conda-based 3.7 environment as well as the master branch of cpython.
msg354352 - (view)	Author: STINNER Victor (vstinner) *	Date: 2019-10-10 09:52
Well, that's a common issue when using asyncio: you forgot await. async def main(_loop): while True: with futures.ThreadPoolExecutor() as pool: _loop.run_in_executor(pool, nop) sys.stdout.write(f'\r{get_mem():0.3f}MB') It should be: "await _loop.run_in_executor(pool, nop)" ;-) Sadly, PYTHONASYNCIODEBUG=1 env var doesn't complain on this bug. See: https://docs.python.org/dev/library/asyncio-dev.html#debug-mode
msg354353 - (view)	Author: Andrew Svetlov (asvetlov) *	Date: 2019-10-10 09:54
You MUST await a future returned from `loop.run_in_executor()` to avoid the leak. Yuri, what should we do with the issue? I see the second similar report in the last half of the year. Sure, we can add weakrefs somewhere in futures._chain_future() but I pretty sure that the proper fix is just enforcing `await` here, e.g. 1. rename existing run_in_executor() into a private _run_in_executor() function. 2. write async trampoline: async def run_in_executor(self, executor, func, args): return await self._run_in_executor(executor, func, args)
msg354354 - (view)	Author: Andrew Svetlov (asvetlov) *	Date: 2019-10-10 09:54
Victor answered the first :)
msg354355 - (view)	Author: STINNER Victor (vstinner) *	Date: 2019-10-10 09:55
> Yuri, what should we do with the issue? A Task emits a warning when it's not awaited. Can a Task be used instead of a Future in run_in_executor()?
msg354356 - (view)	Author: Andrew Svetlov (asvetlov) *	Date: 2019-10-10 10:00
> Can a Task be used instead of a Future in run_in_executor()? I don't think that the task is required here. The problem is that run_in_executor is a function that returns asyncio future; that is in turn a wrapper around concurrent future object. If we convert run_in_executor() into async function we'll get a warning about unawaited coroutine even without asyncio debug mode.
msg354357 - (view)	Author: STINNER Victor (vstinner) *	Date: 2019-10-10 10:02
> If we convert run_in_executor() into async function we'll get a warning about unawaited coroutine even without asyncio debug mode. That sounds like a good solution :-)
msg354358 - (view)	Author: Andrew Svetlov (asvetlov) *	Date: 2019-10-10 10:02
The change is slightly not backward compatible but 1. It's pretty visible. In the worst-case instead of the memory leak people see a RuntimeWarning 2. We did it already for a part of API, e.g. loop.sock_read() and family
msg354360 - (view)	Author: Evgeny Nizhibitsky (Evgeny Nizhibitsky)	Date: 2019-10-10 10:38
Oh, I see that in the initial code with leakage (it was heavy ThreadPoolExecutor + xgboost thing) there was an await but I must have lost it somewhere while reducing it to the minimal example and finished in the wrong direction. Glad too see that it raised a discussion to prevent others from getting into this silent trap.
msg355835 - (view)	Author: Yury Selivanov (yselivanov) *	Date: 2019-11-01 19:08
> The change is slightly not backward compatible but Yeah, that's my main problem with converting `loop.run_in_executor()` to a coroutine. When I attempted doing that I discovered that there's code that expects the method to return a Future, and so expects it have the `cancel()` method. If we convert it to a coroutine a lot of code will break, which might be OK if it's really necessary. Is it though? Can we return a special Future subclass that complains if it's not awaited? Would that fix the problem?
msg355836 - (view)	Author: Andrew Svetlov (asvetlov) *	Date: 2019-11-01 19:18
I thought about returning a special subclass. Future has too rich API: get_loop(), done(), result() etc. What's about returning the proxy object with future instance embedded; the object raises DeprecationWarning for everythin except __repr__, __del__ and __await__, __getattr__ redirects to getattr(self._fut, name) for all other attributes access. It is a more complex solution but definitely 100% backward compatible; plus the solution we can prepare people for removing the deprecated code eventually.
msg355843 - (view)	Author: Yury Selivanov (yselivanov) *	Date: 2019-11-01 21:12
> It is a more complex solution but definitely 100% backward compatible; plus the solution we can prepare people for removing the deprecated code eventually. Yeah. Do you think it's worth it bothering with this old low-level API instead of making a new high-level one? I don't, but if you do the feel free to change it.
msg355864 - (view)	Author: Andrew Svetlov (asvetlov) *	Date: 2019-11-02 09:55
The API exists, people use it and get the memory leak. We should either remove the API (not realistic dream at least for many years) or fix it. There is no choice actually.
msg355882 - (view)	Author: Yury Selivanov (yselivanov) *	Date: 2019-11-02 21:26
> We should either remove the API (not realistic dream at least for many years) or fix it. There is no choice actually. I don't understand. What happens if we don't await the future that run_in_executor returns? Does it get GCed eventually? Why is memory leaking?

History
Date	User	Action	Args
2022-04-11 14:59:21	admin	set	github: 82611
2019-11-05 01:20:32	vstinner	set	nosy: - vstinner
2019-11-03 23:21:28	eamanu	set	keywords: + patch stage: patch review pull_requests: + pull_request16551
2019-11-03 00:38:13	eamanu	set	nosy: + eamanu
2019-11-02 21:26:06	yselivanov	set	messages: + msg355882
2019-11-02 09:55:09	asvetlov	set	messages: + msg355864
2019-11-01 21:12:46	yselivanov	set	messages: + msg355843
2019-11-01 19:18:48	asvetlov	set	messages: + msg355836
2019-11-01 19:08:11	yselivanov	set	messages: + msg355835
2019-10-10 10:38:17	Evgeny Nizhibitsky	set	messages: + msg354360
2019-10-10 10:02:49	asvetlov	set	messages: + msg354358
2019-10-10 10:02:05	vstinner	set	messages: + msg354357
2019-10-10 10:00:52	asvetlov	set	messages: + msg354356
2019-10-10 09:55:34	vstinner	set	nosy: + vstinner messages: + msg354355
2019-10-10 09:54:51	asvetlov	set	messages: + msg354354
2019-10-10 09:54:09	asvetlov	set	status: closed -> open nosy: - vstinner messages: + msg354353 resolution: not a bug -> stage: resolved -> (no value)
2019-10-10 09:52:38	vstinner	set	status: open -> closed nosy: + vstinner messages: + msg354352 resolution: not a bug stage: resolved
2019-10-10 09:41:34	Evgeny Nizhibitsky	create