This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author aeros
Recipients aeros, asvetlov, dralley, sophia2, yselivanov
Date 2020-10-30.01:36:49
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1604021810.11.0.0779848703111.issue41699@roundup.psfhosted.org>
In-reply-to
Content
In the snippet provided, at least part of the resources are not finalized because executor.shutdown() was not called in the program (which should be done when creating a local instance of the executors, either explicitly or using the context manager). For the event loop's default threadpool (used w/ loop.run_in_executor(None, ...), I added a coroutine function loop.shutdown_default_executor() in 3.9+ handles this (called in asyncio.run()).

Without ever calling executor.shutdown(), the worker threads/processes and their associated resources are not finalized until interpreter shutdown. There's also some additional finalization that occurs in `_python_exit()` for both TPE and PPE (see https://github.com/python/cpython/blob/3317466061509c83dce257caab3661d52571cab1/Lib/concurrent/futures/thread.py#L23 or https://github.com/python/cpython/blob/3317466061509c83dce257caab3661d52571cab1/Lib/concurrent/futures/process.py#L87), which is called just before all non-daemon threads are joined just before interpreter shutdown occurs.

However, even considering the above, there still seems to be a significant additional difference in RSS compared to using ThreadPoolExecutor vs loop.run_in_executor() that I can't seem to account for (before and after asyncio.run()):

```
import asyncio
import concurrent.futures as cf
import os
import gc
import argparse

from concurrent.futures.thread import _python_exit

def leaker(n):
    list(range(n))

def func_TPE(n):
    with cf.ThreadPoolExecutor() as executor:
        for i in range(10_000):
            executor.submit(leaker, n)

async def func_run_in_executor(n):
    loop = asyncio.get_running_loop()
    for i in range(10_000):
        loop.run_in_executor(None, leaker, n)

def display_rss():
    os.system(f"grep ^VmRSS /proc/{os.getpid()}/status")

def main(n=100, asyncio_=False):
    try:
        if asyncio_:
            asyncio.run(func_run_in_executor(n))
        else:
            func_TPE(n)
    finally:
        _python_exit()
        gc.collect()
        print(f"after 10_000 iterations of {n} element lists:")
        display_rss()

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("-n", type=int, default=100)
    parser.add_argument("--asyncio", action=argparse.BooleanOptionalAction)

    print("start RSS memory:")
    display_rss()

    args = parser.parse_args()
    main(args.n, args.asyncio)
```
Results (on latest commit to master, 3.10):
asyncio -
```
[aeros:~/repos/cpython]$ ./python ~/programming/python/asyncio_run_in_exec_leak.py -n=10000 --asyncio
start RSS memory:
VmRSS:     16948 kB
after 10_000 iterations of 10000 element lists:
VmRSS:     27080 kB
```
concurrent.futures -
```
[aeros:~/repos/cpython]$ ./python ~/programming/python/asyncio_run_in_exec_leak.py -n=10000 --no-asyncio
start RSS memory:
VmRSS:     17024 kB
after 10_000 iterations of 10000 element lists:
VmRSS:     19572 kB
```
When using await before loop.run_in_executor(), the results are more similar to using ThreadPoolExecutor directly:
```
[aeros:~/repos/cpython]$ ./python ~/programming/python/asyncio_run_in_exec_leak.py -n=10000 --asyncio  
start RSS memory:
VmRSS:     16940 kB
after 10_000 iterations of 10000 element lists:
VmRSS:     17756 kB
```
However, as mentioned by the OP, if stored in a container and awaited later (such as w/ asyncio.gather()), a substantial memory difference is present (increases with list size):
```
[aeros:~/repos/cpython]$ ./python ~/programming/python/asyncio_run_in_exec_leak.py -n=10000 --asyncio
start RSS memory:
VmRSS:     16980 kB
after 10_000 iterations of 10000 element lists:
VmRSS:     29744 kB
```

Based on the above results, I think there may be a smaller leak occurring in concurrent.futures (potentially related to the linked bpo-41588) and a bit of a larger leak occurring in loop.run_in_executor(). So they can remain as separate issues IMO.

At the moment, my best guess is that there's some memory leak that occurs from the future not being fully cleaned up, but I'm not certain about that. This will likely require some further investigation.

Input from Yury and/or Andrew would definitely be appreciated. Is there something that I'm potentially missing here?
History
Date User Action Args
2020-10-30 01:36:50aerossetrecipients: + aeros, asvetlov, yselivanov, sophia2, dralley
2020-10-30 01:36:50aerossetmessageid: <1604021810.11.0.0779848703111.issue41699@roundup.psfhosted.org>
2020-10-30 01:36:50aeroslinkissue41699 messages
2020-10-30 01:36:49aeroscreate