This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Title: Memory leak/high usage on copy in different thread
Type: Stage: resolved
Components: Interpreter Core Versions: Python 3.7, Python 3.6, Python 3.5
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: MultiSosnooley, bquinlan, pitrou, whitespacer
Priority: normal Keywords:

Created on 2018-05-08 11:26 by MultiSosnooley, last changed 2022-04-11 14:59 by admin. This issue is now closed.

File name Uploaded Description Edit MultiSosnooley, 2018-05-08 11:26
Messages (3)
msg316285 - (view) Author: MultiSosnooley (MultiSosnooley) Date: 2018-05-08 11:26
On linux (ubuntu 16.04, 18.04 tested) with python 3.6.5, 3.5.5 and 3.7-dev (windows is not affected) there is ~850Mb of memory used by python process at sleep point. Replacing `submit` `result` with plain function call causes normal ~75Mb at sleep point.
Maybe it is linux side issue (or normal behavior).
msg316289 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-05-08 17:45
Yes, this looks surprising, but there is no memory leak here, just memory fragmentation in the glibc allocator.  This is the program I used to diagnose it:

At the end the program prints glibc allocator stats as returned by mallinfo() (see On my system, the process takes 480 MB RSS and "fordblks" (the total number of bytes in free blocks) is 478 MB. However, "keepcost" (the releasable free space) is only 30 MB.  The rest is probably interspersed with internal interpreter structures that have stayed alive.

The fragmentation seems to depend on the number of threads.  If you start the executor with only one thread, memory consumption is much lower.  This makes sense: by ensuring all operations happen in order with little concurrency, we minimize the chance that short-lived data gets interspersed with longer-lived data.
msg316359 - (view) Author: whitespacer (whitespacer) Date: 2018-05-10 11:06
pitrou, thanks for great diagnosing program! You are right that there is no memory leak. 

We have investigated this issue a little bit more - it looks real reason for large memory consuming in the end is not fragmentation, just glibc doesn't release all free memory.

So what happens in this example (according to
1) More threads leads to creating of more arenas (e.g. to more sub-heaps). It can be tested with setting environment variable MALLOC_ARENA_MAX to 1 - memory consumption in example will be reduced significantly. But this can be used in real code, performance will degrade.  
2) Heap trim threshold (M_TRIM_THRESHOLD) is calculated dynamically. Creation and deletion of small arrays leads to M_MMAP_THRESHOLD=2*4*1024*1024*sizeof(long) on 64 bit systems. So each sub-heap in each arena can have up to 64 mb (if long is 8 bytes) of untrimmed space.
It can be tested with setting environment variable MALLOC_TRIM_THRESHOLD_ to 128*1024 - memory consumption in example will be reduced significantly because dynamic calculation of trim threshold will be turned off. Again I doubt this can be used in real code.  
3) Maxiumum number of arenas vary on number of cpu's. On my system it looks like it is set to 16. So I get around 16*64 of untrimmed space.

To conclude:
There is similar bug in python tracker: Message states that nothing can be done in some cases (like in attached example).

One possible solution is to use jemalloc - we have tested that it doesn't have such problems. Limiting maximum number of arenas also helps, but can degrade performance.
Date User Action Args
2022-04-11 14:59:00adminsetgithub: 77625
2021-08-09 16:35:39benjamin.petersonlinkissue44871 superseder
2018-05-10 11:06:21whitespacersetnosy: + whitespacer
messages: + msg316359
2018-05-08 17:45:48pitrousetstatus: open -> closed
resolution: not a bug
stage: resolved
2018-05-08 17:45:33pitrousetmessages: + msg316289
2018-05-08 12:08:44serhiy.storchakasetnosy: + bquinlan, pitrou
2018-05-08 11:26:46MultiSosnooleycreate