This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: glibc allocator doesn't release all free()ed memory
Type: resource usage Stage: resolved
Components: XML Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: bkline, dmalcolm, eli.bendersky, flox, kaifeng, methane, neologix, pitrou, python-dev, tim.peters, vstinner
Priority: normal Keywords: patch

Created on 2011-04-15 09:08 by kaifeng, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
test.py kaifeng, 2011-04-15 09:08
issue11849_test.py flox, 2011-04-15 11:39 raw benchmark test
issue11849_test2.py kaifeng, 2011-04-18 00:37
valgrind.log kaifeng, 2011-04-25 08:01
pymalloc_threshold.diff neologix, 2011-05-02 16:57 patch increasing pymalloc threshold review
pymalloc_frag.diff neologix, 2011-05-02 21:59 final patch with pymalloc threshold review
arenas_mmap.diff neologix, 2011-11-25 22:45 review
Messages (44)
msg133797 - (view) Author: kaifeng (kaifeng) Date: 2011-04-15 09:08
I'm using xml.etree.ElementTree to parse large XML file, while the memory keep increasing consistently.

You can run attached test script to reproduce it.  From 'top' in Linux or 'Task Manager' in Windows, the memory usage of python is not decreased as expected when 'Done' is printed.

Tested with Python 2.5/3.1 in Windows 7, and Python 2.5 in CentOS 5.3.
msg133799 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2011-04-15 09:33
Do you experience same issue with current versions of Python? (3.2 or 2.7)
The package was upgraded in latest versions.
msg133800 - (view) Author: kaifeng (kaifeng) Date: 2011-04-15 09:52
Yes. Just tested with Python 2.7 and 3.2 in Windows 7, the memory usage is still unexpected high after 'Done' is printed.
msg133808 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2011-04-15 11:39
I've tested a small variant of your script, on OSX.
It seems to behave correctly (with 2.5, 2.6, 2.7 and 3.1).

You can force Python to release memory immediately by calling "gc.collect()".
msg133809 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2011-04-15 11:41
this is the output for 2.7.1:

 $ python2.7 issue11849_test.py 
*** Python 2.7.1 final
---   PID STAT      TIME  SL  RE PAGEIN      VSZ    RSS   LIM     TSIZ  %CPU %MEM COMMAND
  0  2754 S+     0:00.07   0   0      0  2441472   5372     -        0  11,7  0,1 python2.7 issue11849_test.py
  1  2754 S+     0:02.36   0   0      0  2520740  83720     -        0 100,0  2,0 python2.7 issue11849_test.py
  2  2754 S+     0:04.89   0   0      0  2596784 158888     -        0 100,0  3,8 python2.7 issue11849_test.py
  3  2754 S+     0:07.28   0   0      0  2668740 230972     -        0 100,0  5,5 python2.7 issue11849_test.py
  4  2754 S+     0:10.11   0   0      0  2740932 303200     -        0 100,0  7,2 python2.7 issue11849_test.py
  5  2754 S+     0:12.85   0   0      0  2812876 375276     -        0  98,4  8,9 python2.7 issue11849_test.py
  6  2754 R+     0:14.95   0   0      0  2885868 447740     -        0  98,9 10,7 python2.7 issue11849_test.py
  7  2754 S+     0:17.91   0   0      0  2962156 522560     -        0  99,1 12,5 python2.7 issue11849_test.py
  8  2754 S+     0:21.08   0   0      0  3034092 594620     -        0  98,3 14,2 python2.7 issue11849_test.py
  9  2754 S+     0:23.20   0   0      0  3106028 667004     -        0 100,0 15,9 python2.7 issue11849_test.py
END  2754 S+     0:27.50   0   0      0  2551160 114480     -        0  96,3  2,7 python2.7 issue11849_test.py
 GC  2754 S+     0:27.75   0   0      0  2454904  18992     -        0  97,2  0,5 python2.7 issue11849_test.py
***  2754 S+     0:27.75   0   0      0  2454904  18992     -        0   3,0  0,5 python2.7 issue11849_test.py
msg133813 - (view) Author: kaifeng (kaifeng) Date: 2011-04-15 12:32
Python 3.2 On Linux (CentOS 5.3)

*** Python 3.2.0 final
---   PID TTY      STAT   TIME  MAJFL   TRS   DRS   RSS %MEM COMMAND
  0 15116 pts/0    S+     0:00      1  1316 11055  6452  0.6 python3.2 issue11849_test.py
  1 15116 pts/0    S+     0:02      1  1316 53155 47340  4.5 python3.2 issue11849_test.py
  2 15116 pts/0    S+     0:05      1  1316 91051 86364  8.3 python3.2 issue11849_test.py
  3 15116 pts/0    S+     0:08      1  1316 129067 124232 12.0 python3.2 issue11849_test.py
  4 15116 pts/0    S+     0:10      1  1316 166587 162096 15.6 python3.2 issue11849_test.py
  5 15116 pts/0    S+     0:13      1  1316 204483 198824 19.2 python3.2 issue11849_test.py
  6 15116 pts/0    S+     0:17      1  1316 242375 236692 22.8 python3.2 issue11849_test.py
  7 15116 pts/0    S+     0:19      1  1316 284383 277528 26.8 python3.2 issue11849_test.py
  8 15116 pts/0    S+     0:23      1  1316 318371 312452 30.1 python3.2 issue11849_test.py
  9 15116 pts/0    S+     0:25      1  1316 360235 353288 34.1 python3.2 issue11849_test.py
END 15116 pts/0    S+     0:30      1  1316 393975 388176 37.4 python3.2 issue11849_test.py
 GC 15116 pts/0    S+     0:30      1  1316 352035 347656 33.5 python3.2 issue11849_test.py
*** 15116 pts/0    S+     0:30      1  1316 352035 347656 33.5 python3.2 issue11849_test.py
msg133929 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-04-17 14:39
The "problem" is not with Python, but with your libc.
When a program - such as Python - returns memory, it uses the free(3) library call.
But the libc is free to either return the memory immediately to the kernel using the relevant syscall (brk, munmap), or keep it around just in case (to simplify).
It seems that RHEL5 and onwards tend to keep a lot of memory around, at least in this case (probably because of the allocation pattern).

To sum up, python is returning memory, but your libc is not.
You can force it using malloc_trim, see the attached patch (I'm not at all suggesting its inclusion, it's just an illustration).

Results with current code:

*** Python 3.3.0 alpha
---   PID TTY      STAT   TIME  MAJFL   TRS   DRS   RSS %MEM COMMAND
  0 29823 pts/0    S+     0:00      1  1607 168176 8596  0.2 ./python /tmp/issue11849_test.py
  1 29823 pts/0    S+     0:01      1  1607 249400 87088  2.2 ./python /tmp/issue11849_test.py
  2 29823 pts/0    S+     0:03      1  1607 324080 161704  4.1 ./python /tmp/issue11849_test.py
  3 29823 pts/0    S+     0:04      1  1607 398960 235036  5.9 ./python /tmp/issue11849_test.py
  4 29823 pts/0    S+     0:06      1  1607 473356 309464  7.8 ./python /tmp/issue11849_test.py
  5 29823 pts/0    S+     0:07      1  1607 548120 384624  9.8 ./python /tmp/issue11849_test.py
  6 29823 pts/0    S+     0:09      1  1607 622884 458332 11.6 ./python /tmp/issue11849_test.py
  7 29823 pts/0    S+     0:10      1  1607 701864 535736 13.6 ./python /tmp/issue11849_test.py
  8 29823 pts/0    S+     0:12      1  1607 772440 607988 15.5 ./python /tmp/issue11849_test.py
  9 29823 pts/0    S+     0:13      1  1607 851156 685384 17.4 ./python /tmp/issue11849_test.py
END 29823 pts/0    S+     0:16      1  1607 761712 599400 15.2 ./python /tmp/issue11849_test.py
 GC 29823 pts/0    S+     0:16      1  1607 680900 519280 13.2 ./python /tmp/issue11849_test.py
*** 29823 pts/0    S+     0:16      1  1607 680900 519288 13.2 ./python /tmp/issue11849_test.py


Results with the malloc_trim:

*** Python 3.3.0 alpha
---   PID TTY      STAT   TIME  MAJFL   TRS   DRS   RSS %MEM COMMAND
  0 30020 pts/0    S+     0:00      1  1607 168180 8596  0.2 ./python /tmp/issue11849_test.py
  1 30020 pts/0    S+     0:01      1  1607 249404 86160  2.1 ./python /tmp/issue11849_test.py
  2 30020 pts/0    S+     0:03      1  1607 324084 160596  4.0 ./python /tmp/issue11849_test.py
  3 30020 pts/0    S+     0:04      1  1607 398964 235036  5.9 ./python /tmp/issue11849_test.py
  4 30020 pts/0    S+     0:06      1  1607 473360 309808  7.9 ./python /tmp/issue11849_test.py
  5 30020 pts/0    S+     0:07      1  1607 548124 383896  9.7 ./python /tmp/issue11849_test.py
  6 30020 pts/0    S+     0:09      1  1607 622888 458716 11.7 ./python /tmp/issue11849_test.py
  7 30020 pts/0    S+     0:10      1  1607 701868 536124 13.6 ./python /tmp/issue11849_test.py
  8 30020 pts/0    S+     0:12      1  1607 772444 607212 15.4 ./python /tmp/issue11849_test.py
  9 30020 pts/0    S+     0:14      1  1607 851160 684608 17.4 ./python /tmp/issue11849_test.py
END 30020 pts/0    S+     0:16      1  1607 761716 599524 15.3 ./python /tmp/issue11849_test.py
 GC 30020 pts/0    S+     0:16      1  1607 680776 10744  0.2 ./python /tmp/issue11849_test.py
*** 30020 pts/0    S+     0:16      1  1607 680776 10752  0.2 ./python /tmp/issue11849_test.py
msg133940 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-04-17 22:27
> To sum up, python is returning memory, but your libc is not.
> You can force it using malloc_trim, see the attached patch (I'm not at 
> all suggesting its inclusion, it's just an illustration).

That's an interesting thing, perhaps you want to open a feature request as a separate issue?
msg133946 - (view) Author: kaifeng (kaifeng) Date: 2011-04-18 00:37
I added 'malloc_trim' to the test code and rerun the test with Python 2.5 / 3.2 on CentOS 5.3.  The problem still exists.


*** Python 2.5.5 final
---   PID TTY      STAT   TIME  MAJFL   TRS   DRS   RSS %MEM COMMAND
  0  2567 pts/0    S+     0:00      0     1  8206  4864  0.4 /home/zkf/.programs/python/bin/python issue11849_test.py
  1  2567 pts/0    S+     0:03      0     1 44558 41140  3.9 /home/zkf/.programs/python/bin/python issue11849_test.py
  2  2567 pts/0    S+     0:07      0     1 81166 77728  7.5 /home/zkf/.programs/python/bin/python issue11849_test.py
  3  2567 pts/0    S+     0:12      0     1 117798 114316 11.0 /home/zkf/.programs/python/bin/python issue11849_test.py
  4  2567 pts/0    S+     0:17      0     1 154402 150912 14.5 /home/zkf/.programs/python/bin/python issue11849_test.py
  5  2567 pts/0    S+     0:23      0     1 191018 187500 18.1 /home/zkf/.programs/python/bin/python issue11849_test.py
  6  2567 pts/0    S+     0:29      0     1 227630 224084 21.6 /home/zkf/.programs/python/bin/python issue11849_test.py
  7  2567 pts/0    S+     0:36      0     1 264242 260668 25.1 /home/zkf/.programs/python/bin/python issue11849_test.py
  8  2567 pts/0    S+     0:44      0     1 300882 297288 28.7 /home/zkf/.programs/python/bin/python issue11849_test.py
  9  2567 pts/0    S+     0:53      0     1 337230 333860 32.2 /home/zkf/.programs/python/bin/python issue11849_test.py
END  2567 pts/0    S+     1:02      0     1 373842 370444 35.7 /home/zkf/.programs/python/bin/python issue11849_test.py
 GC  2567 pts/0    S+     1:02      0     1 373842 370444 35.7 /home/zkf/.programs/python/bin/python issue11849_test.py
***  2567 pts/0    S+     1:02      0     1 373714 370436 35.7 /home/zkf/.programs/python/bin/python issue11849_test.py


*** Python 3.2.0 final
---   PID TTY      STAT   TIME  MAJFL   TRS   DRS   RSS %MEM COMMAND
  0  2633 pts/0    S+     0:00      1  1316 11051  6448  0.6 python3.2 issue11849_test.py
  1  2633 pts/0    S+     0:02      1  1316 53151 47340  4.5 python3.2 issue11849_test.py
  2  2633 pts/0    S+     0:05      1  1316 91051 85216  8.2 python3.2 issue11849_test.py
  3  2633 pts/0    S+     0:08      1  1316 128943 124228 12.0 python3.2 issue11849_test.py
  4  2633 pts/0    S+     0:11      1  1316 166803 162296 15.6 python3.2 issue11849_test.py
  5  2633 pts/0    S+     0:14      1  1316 204475 199972 19.3 python3.2 issue11849_test.py
  6  2633 pts/0    S+     0:17      1  1316 243831 238180 23.0 python3.2 issue11849_test.py
  7  2633 pts/0    S+     0:20      1  1316 284371 277532 26.8 python3.2 issue11849_test.py
  8  2633 pts/0    S+     0:23      1  1316 318187 312456 30.1 python3.2 issue11849_test.py
  9  2633 pts/0    S+     0:26      1  1316 360231 353296 34.1 python3.2 issue11849_test.py
END  2633 pts/0    S+     0:30      1  1316 393971 388184 37.4 python3.2 issue11849_test.py
 GC  2633 pts/0    S+     0:30      1  1316 352031 347652 33.5 python3.2 issue11849_test.py
***  2633 pts/0    S+     0:31      1  1316 351903 347524 33.5 python3.2 issue11849_test.py
msg133956 - (view) Author: kaifeng (kaifeng) Date: 2011-04-18 10:01
Found a minor defect of Python 3.2 / 3.3: line 1676 of xml/etree/ElementTree.py
was:
    del self.target, self._parser # get rid of circular references
should be:
    del self.target, self._target, self.parser, self._parser # get rid of circular references

While it doesn't help this issue...
msg133980 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-04-18 16:41
> kaifeng <cafeeee@gmail.com> added the comment:
>
> I added 'malloc_trim' to the test code and rerun the test with Python 2.5 / 3.2 on CentOS 5.3.  The problem still exists.
>

Well, malloc_trim can fail, but how did you "add" it ? Did you use
patch to apply the diff ?
Also, could you post the output of a
ltrace -e malloc_trim python <test script>

For info, the sample outputs I posted above come from a RHEL6 box.

Anyway, I'm 99% sure this isn't a leak but a malloc issue (valgrind
--tool=memcheck could confirm this if you want to try, I could be
wrong, it wouldn't be the first time ;-) ).
By the way, look at what I just found:
http://mail.gnome.org/archives/xml/2008-February/msg00003.html

> Antoine Pitrou <pitrou@free.fr> added the comment:
> That's an interesting thing, perhaps you want to open a feature request as a separate issue?

Dunno.
Memory management is a domain which belongs to the operating
system/libc, and I think applications should mess with it (apart from
specific cases) .
I don't have time to look at this precise problem in greater detail
right now, but AFAICT, this looks either like a glibc bug, or at least
a corner case with default malloc parameters (M_TRIM_THRESHOLD and
friends), affecting only RHEL and derived distributions.
malloc_trim should be called automatically by free if the amount of
memory that could be release is above M_TRIM_THRESHOLD.
Calling it systematically can have a non-negligible performance impact.
msg134008 - (view) Author: kaifeng (kaifeng) Date: 2011-04-19 02:41
I applied your patch to Python 3.2, also I added a function call to 'malloc_trim' via ctypes, as you can see in issue11849_test2.py.

In fact I have a daemon written in Python 2.5, parsing an XML of size 10+ MB every 5 minutes, after 16+ hours running, the program finally exhausted 4 GB memory and died.  I simplified the logic of the daemon and found ElementTree eats too much memory.  There comes the attached test script.

BTW, after utilize lxml instead of ElementTree, such phenomenon of increasing memory usage disappeared.


$ ltrace -e malloc_trim python3.2 Issue11849_test2.py
--- SIGCHLD (Child exited) ---
--- SIGCHLD (Child exited) ---
*** Python 3.2.0 final
--- SIGCHLD (Child exited) ---
---   PID TTY      STAT   TIME  MAJFL   TRS   DRS   RSS %MEM COMMAND
--- SIGCHLD (Child exited) ---
  0 13708 pts/1    S+     0:00      1    65  1742   636  0.0 ltrace -e malloc_trim python3.2 Issue11849_test2.py
13709 pts/1    S+     0:00      1  1316 11055  6440  0.6 python3.2 Issue11849_test2.py
--- SIGCHLD (Child exited) ---
  1 13708 pts/1    S+     0:00      1    65  1742   636  0.0 ltrace -e malloc_trim python3.2 Issue11849_test2.py
13709 pts/1    S+     0:03      1  1316 53155 47332  4.5 python3.2 Issue11849_test2.py
--- SIGCHLD (Child exited) ---
  2 13708 pts/1    S+     0:00      1    65  1742   636  0.0 ltrace -e malloc_trim python3.2 Issue11849_test2.py
13709 pts/1    S+     0:06      1  1316 91055 85204  8.2 python3.2 Issue11849_test2.py
--- SIGCHLD (Child exited) ---
  3 13708 pts/1    S+     0:01      1    65  1742   636  0.0 ltrace -e malloc_trim python3.2 Issue11849_test2.py
13709 pts/1    S+     0:10      1  1316 128947 124212 11.9 python3.2 Issue11849_test2.py
--- SIGCHLD (Child exited) ---
  4 13708 pts/1    S+     0:01      1    65  1742   636  0.0 ltrace -e malloc_trim python3.2 Issue11849_test2.py
13709 pts/1    S+     0:13      1  1316 166807 162280 15.6 python3.2 Issue11849_test2.py
--- SIGCHLD (Child exited) ---
  5 13708 pts/1    S+     0:01      1    65  1742   636  0.0 ltrace -e malloc_trim python3.2 Issue11849_test2.py
13709 pts/1    S+     0:16      1  1316 204483 198808 19.2 python3.2 Issue11849_test2.py
--- SIGCHLD (Child exited) ---
  6 13708 pts/1    S+     0:02      1    65  1742   636  0.0 ltrace -e malloc_trim python3.2 Issue11849_test2.py
13709 pts/1    S+     0:20      1  1316 242379 236672 22.8 python3.2 Issue11849_test2.py
--- SIGCHLD (Child exited) ---
  7 13708 pts/1    S+     0:02      1    65  1742   636  0.0 ltrace -e malloc_trim python3.2 Issue11849_test2.py
13709 pts/1    S+     0:23      1  1316 284383 277508 26.8 python3.2 Issue11849_test2.py
--- SIGCHLD (Child exited) ---
  8 13708 pts/1    S+     0:03      1    65  1742   636  0.0 ltrace -e malloc_trim python3.2 Issue11849_test2.py
13709 pts/1    S+     0:27      1  1316 318191 312436 30.1 python3.2 Issue11849_test2.py
--- SIGCHLD (Child exited) ---
  9 13708 pts/1    S+     0:03      1    65  1742   636  0.0 ltrace -e malloc_trim python3.2 Issue11849_test2.py
13709 pts/1    S+     0:29      1  1316 360199 353272 34.1 python3.2 Issue11849_test2.py
--- SIGCHLD (Child exited) ---
END 13708 pts/1    S+     0:03      1    65  1742   636  0.0 ltrace -e malloc_trim python3.2 Issue11849_test2.py
13709 pts/1    S+     0:34      1  1316 393975 388164 37.4 python3.2 Issue11849_test2.py
malloc_trim(0, 0, 0x818480a, 0x81a0114, 0xbfb6c940)                                              = 1
--- SIGCHLD (Child exited) ---
 GC 13708 pts/1    S+     0:03      1    65  1742   648  0.0 ltrace -e malloc_trim python3.2 Issue11849_test2.py
13709 pts/1    S+     0:35      1  1316 351871 347480 33.5 python3.2 Issue11849_test2.py
--- SIGCHLD (Child exited) ---
*** 13708 pts/1    S+     0:03      1    65  1742   648  0.0 ltrace -e malloc_trim python3.2 Issue11849_test2.py
13709 pts/1    S+     0:35      1  1316 351871 347480 33.5 python3.2 Issue11849_test2.py
+++ exited (status 0) +++
msg134083 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-04-19 17:26
> BTW, after utilize lxml instead of ElementTree, such phenomenon of increasing memory usage disappeared.

If you looked at the link I posted, you'll see that lxml had some similar issues and solved it by calling malloc_trim systematically when freeing memory.
It could also be heap fragmentation, though.

To go further, it'd be nice if you could provide the output of
valgrind --tool=memcheck --leak-check=full --suppressions=Misc/valgrind-python.supp python <test script>
after uncommenting relevant lines in Misc/valgrind-python.supp (see http://svn.python.org/projects/python/trunk/Misc/README.valgrind ).
It will either confirm a memory leak or malloc issue (I still favour the later).

By the way, does

while True:
    XML(gen_xml())

lead to a constant memory usage increase ?
msg134358 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-04-24 22:58
This is definitely a malloc bug.
Test with default malloc on a Debian box:

cf@neobox:~/cpython$ ./python ../issue11849_test.py 
*** Python 3.3.0 alpha
---   PID TTY      STAT   TIME  MAJFL   TRS   DRS   RSS %MEM COMMAND
  0  3778 pts/2    S+     0:00      1  1790  8245  7024  0.5 ./python ../issue11849_test.py
  1  3778 pts/2    S+     0:17      1  1790 61937 60404  4.6 ./python ../issue11849_test.py
  2  3778 pts/2    S+     0:35      1  1790 110841 108300  8.3 ./python ../issue11849_test.py
  3  3778 pts/2    S+     0:53      1  1790 159885 158540 12.2 ./python ../issue11849_test.py
  4  3778 pts/2    S+     1:10      1  1790 209369 206724 15.9 ./python ../issue11849_test.py
  5  3778 pts/2    S+     1:28      1  1790 258505 255956 19.7 ./python ../issue11849_test.py
  6  3778 pts/2    S+     1:46      1  1790 307669 304964 23.5 ./python ../issue11849_test.py
  7  3778 pts/2    S+     2:02      1  1790 360705 356952 27.5 ./python ../issue11849_test.py
  8  3778 pts/2    S+     2:21      1  1790 405529 404172 31.2 ./python ../issue11849_test.py
  9  3778 pts/2    S+     2:37      1  1790 458789 456128 35.2 ./python ../issue11849_test.py
END  3778 pts/2    S+     3:00      1  1790 504189 501624 38.7 ./python ../issue11849_test.py
 GC  3778 pts/2    S+     3:01      1  1790 454689 453476 35.0 ./python ../issue11849_test.py
***  3778 pts/2    S+     3:01      1  1790 454689 453480 35.0 ./python ../issue11849_test.py
[56426 refs]


The heap is not trimmed, even after GC collection.
Now, using a smaller mmap threshold so that malloc uses mmap instead of brk:

cf@neobox:~/cpython$ MALLOC_MMAP_THRESHOLD_=1024 ./python ../issue11849_test.py 
*** Python 3.3.0 alpha
---   PID TTY      STAT   TIME  MAJFL   TRS   DRS   RSS %MEM COMMAND
  0  3843 pts/2    S+     0:00      1  1790  8353  7036  0.5 ./python ../issue11849_test.py
  1  3843 pts/2    S+     0:17      1  1790 62593 59240  4.5 ./python ../issue11849_test.py
  2  3843 pts/2    S+     0:35      1  1790 112321 108304  8.3 ./python ../issue11849_test.py
  3  3843 pts/2    S+     0:53      1  1790 162313 157372 12.1 ./python ../issue11849_test.py
  4  3843 pts/2    S+     1:11      1  1790 212057 206456 15.9 ./python ../issue11849_test.py
  5  3843 pts/2    S+     1:29      1  1790 261749 255484 19.7 ./python ../issue11849_test.py
  6  3843 pts/2    S+     1:47      1  1790 311669 304484 23.5 ./python ../issue11849_test.py
  7  3843 pts/2    S+     2:03      1  1790 365485 356488 27.5 ./python ../issue11849_test.py
  8  3843 pts/2    S+     2:22      1  1790 411341 402568 31.1 ./python ../issue11849_test.py
  9  3843 pts/2    S+     2:38      1  1790 465141 454552 35.1 ./python ../issue11849_test.py
END  3843 pts/2    S+     3:02      1  1790 67173 63892  4.9 ./python ../issue11849_test.py
 GC  3843 pts/2    S+     3:03      1  1790  9925  8664  0.6 ./python ../issue11849_test.py
***  3843 pts/2    S+     3:03      1  1790  9925  8668  0.6 ./python ../issue11849_test.py
[56428 refs]

Just to be sure, with ptmalloc3 malloc implementation:

cf@neobox:~/cpython$ LD_PRELOAD=../ptmalloc3/libptmalloc3.so ./python ../issue11849_test.py 
*** Python 3.3.0 alpha
---   PID TTY      STAT   TIME  MAJFL   TRS   DRS   RSS %MEM COMMAND
  0  3898 pts/2    S+     0:00      1  1790  8369  7136  0.5 ./python ../issue11849_test.py
  1  3898 pts/2    S+     0:17      1  1790 62825 60264  4.6 ./python ../issue11849_test.py
  2  3898 pts/2    S+     0:34      1  1790 112641 110176  8.5 ./python ../issue11849_test.py
  3  3898 pts/2    S+     0:52      1  1790 162689 160048 12.3 ./python ../issue11849_test.py
  4  3898 pts/2    S+     1:09      1  1790 212285 209732 16.2 ./python ../issue11849_test.py
  5  3898 pts/2    S+     1:27      1  1790 261881 259460 20.0 ./python ../issue11849_test.py
  6  3898 pts/2    S+     1:45      1  1790 311929 309332 23.9 ./python ../issue11849_test.py
  7  3898 pts/2    S+     2:01      1  1790 365625 362004 27.9 ./python ../issue11849_test.py
  8  3898 pts/2    S+     2:19      1  1790 411445 408812 31.5 ./python ../issue11849_test.py
  9  3898 pts/2    S+     2:35      1  1790 465205 461536 35.6 ./python ../issue11849_test.py
END  3898 pts/2    S+     2:58      1  1790 72141 69688  5.3 ./python ../issue11849_test.py
 GC  3898 pts/2    S+     2:59      1  1790 15001 13748  1.0 ./python ../issue11849_test.py
***  3898 pts/2    S+     2:59      1  1790 15001 13752  1.0 ./python ../issue11849_test.py
[56428 refs]

So the problem is really that glibc/eglibc malloc implementations don't automatically trim memory upon free (this happens if you're only allocating/deallocating small chunks < 64B that come from fastbins, but that's not the case here).
By the way, I noticed that dictionnaries are never allocated through pymalloc, since a new dictionnary takes more than 256B...
msg134359 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-04-24 23:20
The MALLOC_MMAP_THRESHOLD improvement is less visible here:

$ MALLOC_MMAP_THRESHOLD_=1024 ../opt/python issue11849_test.py 
*** Python 3.3.0 alpha
--- USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
  0 antoine   7703  0.0  0.1  57756  8560 pts/2    S+   01:16   0:00 ../opt/python issue11849_test.py
  1 antoine   7703 62.0  1.0 138892 86100 pts/2    S+   01:16   0:01 ../opt/python issue11849_test.py
  2 antoine   7703 84.6  2.0 213580 160552 pts/2   S+   01:16   0:02 ../opt/python issue11849_test.py
  3 antoine   7703 97.0  2.9 288080 234972 pts/2   S+   01:16   0:03 ../opt/python issue11849_test.py
  4 antoine   7703 85.6  3.9 362852 309408 pts/2   S+   01:16   0:05 ../opt/python issue11849_test.py
  5 antoine   7703 93.4  4.8 437616 383844 pts/2   S+   01:16   0:06 ../opt/python issue11849_test.py
  6 antoine   7703 99.0  5.7 512380 458276 pts/2   S+   01:16   0:07 ../opt/python issue11849_test.py
  7 antoine   7703 89.6  6.7 591360 535672 pts/2   S+   01:16   0:08 ../opt/python issue11849_test.py
  8 antoine   7703 94.9  7.6 661676 607156 pts/2   S+   01:16   0:10 ../opt/python issue11849_test.py
  9 antoine   7703 95.5  8.6 740652 684556 pts/2   S+   01:16   0:11 ../opt/python issue11849_test.py
END antoine   7703 96.1  7.5 650432 597736 pts/2   S+   01:16   0:13 ../opt/python issue11849_test.py
 GC antoine   7703 97.2  6.5 570316 519228 pts/2   S+   01:16   0:13 ../opt/python issue11849_test.py
*** antoine   7703 90.8  6.5 569876 518792 pts/2   S+   01:16   0:13 ../opt/python issue11849_test.py


By the way, an easy fix is to use cElementTree instead of ElementTree. It still won't release all memory but it will eat a lot less of it, and be much faster as well.
msg134360 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-04-24 23:22
> By the way, I noticed that dictionnaries are never allocated through
> pymalloc, since a new dictionnary takes more than 256B...

On 64-bit builds indeed. pymalloc could be improved to handle allocations up to 512B. Want to try and write a patch?
msg134375 - (view) Author: kaifeng (kaifeng) Date: 2011-04-25 08:01
Sorry for the later update.

Valgrind shows there is no memory leak (see attached valgrind.log).

The following code,
    while True:
        XML(gen_xml())
has an increasing memory usage in the first 5~8 iterations, and waves around a constant level afterwards.

So I guess there's a component, maybe libc, Python interpreter, ElementTree/pyexpat module or someone else, hold some memory until process ends.
msg134380 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-04-25 12:36
> The MALLOC_MMAP_THRESHOLD improvement is less visible here:
>

Are you running on 64-bit ?
If yes, it could be that you're exhausting M_MMAP_MAX (malloc falls
back to brk when there are too many mmap mappings).
You could try with
MALLOC_MMAP_THRESHOLD_=1024 MALLOC_MMAP_MAX_=16777216 ../opt/python
issue11849_test.py

By the way, never do that in real life, it's a CPU and memory hog ;-)

I think the root cause is that glibc's malloc coalescing of free
chunks is called far less often than in the original ptmalloc version,
but I still have to dig some more.

>> By the way, I noticed that dictionnaries are never allocated through
>> pymalloc, since a new dictionnary takes more than 256B...
>
> On 64-bit builds indeed. pymalloc could be improved to handle allocations up
> to 512B. Want to try and write a patch?

Sure.
I'll open another issue.
msg134388 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-04-25 14:55
> > The MALLOC_MMAP_THRESHOLD improvement is less visible here:
> >
> 
> Are you running on 64-bit ?

Yes.

> If yes, it could be that you're exhausting M_MMAP_MAX (malloc falls
> back to brk when there are too many mmap mappings).
> You could try with
> MALLOC_MMAP_THRESHOLD_=1024 MALLOC_MMAP_MAX_=16777216 ../opt/python
> issue11849_test.py

It isn't better.
msg134392 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-04-25 15:57
> It isn't better.

Requests above 256B are directly handled by malloc, so MALLOC_MMAP_THRESHOLD_ should in fact be set to 256 (with 1024 I guess that on 64-bit every mid-sized dictionnary gets allocated with brk).
msg134992 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-05-02 16:57
I've had some time to look at this, and I've written a quick demo
patch that should - hopefully - fix this, and reduce memory
fragmentation.
 A little bit of background first:
 - a couple years ago (probably true when pymalloc was designed and
merged), glibc's malloc used brk for small and medium allocations, and
mmap for large allocations, to reduce memory fragmentation (also,
because of the processes' VM layout in older Linux 32-bit kernels, you
couldn't have a heap bigger than 1GB). The threshold for routing
requests to mmap was fixed, and had a default of 256KB (exactly the
size of an pymalloc arena). Thus, all arenas were allocated with mmap
 - in 2006, a patch was merged to make this mmap threshold dynamic,
see http://sources.redhat.com/ml/libc-alpha/2006-03/msg00033.html for
more details
 - as a consequence, with modern glibc/elibc versions, the first
arenas will be allocated through mmap, but as soon as one of them is
freed, subsequent arenas allocation will be allocated from the heap
through brk, and not mmap
 - imagine the following happens :
   1) program creates many objects
   2) to store those objects, many arenas are allocated from the heap
through brk
   3) program destroys all the objects created, except 1 which is in
the last allocated arena
   4) since the arena has at least one object in it, it's not
deallocated, and thus the heap doesn't shrink, and the memory usage
remains high (with a huge hole between the base of the heap and its
top)
 Note that 3) can be a single leaked reference, or just a variable
that doesn't get deallocated immediately. As an example, here's a demo
program that should exhibit this behaviour:

 """
 import sys
 import gc

 # allocate/de-allocate/re-allocate the array to make sure that arenas are
 # allocated through brk
 tab = []
 for i in range(1000000):
    tab.append(i)
 tab = []
 for i in range(1000000):
    tab.append(i)

 print('after allocation')
 sys.stdin.read(1)

 # allocate a dict at the top of the heap (actually it works even without) this
 a = {}

 # deallocate the big array
 del tab
 print('after deallocation')
 sys.stdin.read(1)

 # collect
 gc.collect()
 print('after collection')
 sys.stdin.read(1)
 """

 You should see that even after the big array has been deallocated and
collected, the memory usage doesn't decrease.

 Also, there's another factor coming into play, the linked list of
arenas ("arenas" variable in Object/obmalloc.c), which is expanded
when there are not enough arenas allocated: if this variable is
realloc()ed while the heap is really large and whithout hole in it, it
will be allocated from the top of the heap, and since it's not resized
when the number of used arenas goes down, it will remain at the top of
the heap and will also prevent the heap from shrinking.

 My demo patch (pymem.diff) thus does two things:
 1) use mallopt to fix the mmap threshold so that arenas are allocated
through mmap
 2) increase the maximum size of requests handled by pymalloc from
256B to 512B (as discussed above with Antoine). The reason is that if
a PyObject_Malloc request is not handled by pymalloc from an arena
(i.e. greater than 256B) and is less than the mmap threshold, then we
can't do anything if it's not freed and remains in the middle of the
heap. That's exactly what's happening in the OP case, some
dictionnaries aren't deallocated even after the collection (I couldn't
quite identify them, but there seems to be some UTF-8 codecs and other
stuff)

 To sum up, this patch increases greatly the likelihood of Python's
objects being allocated from arenas which should reduce fragmentation
(and seems to speed up certain operations quite a bit), and ensures
that arenas are allocated from mmap so that a single dangling object
doesn't prevent the heap from being trimmed.

 I've tested it on RHEL6 64-bit and Debian 32-bit, but it'd be great
if someone else could try it - and of course comment on the above
explanation/proposed solution.
Here's the result on Debian 32-bit:

Without patch:

*** Python 3.3.0 alpha
---   PID TTY      STAT   TIME  MAJFL   TRS   DRS   RSS %MEM COMMAND
  0  1843 pts/1    S+     0:00      1  1795  9892  7528  0.5 ./python
/home/cf/issue11849_test.py
  1  1843 pts/1    S+     0:16      1  1795 63584 60928  4.7 ./python
/home/cf/issue11849_test.py
  2  1843 pts/1    S+     0:33      1  1795 112772 109064  8.4
./python /home/cf/issue11849_test.py
  3  1843 pts/1    S+     0:50      1  1795 162140 159424 12.3
./python /home/cf/issue11849_test.py
  4  1843 pts/1    S+     1:06      1  1795 211376 207608 16.0
./python /home/cf/issue11849_test.py
END  1843 pts/1    S+     1:25      1  1795 260560 256888 19.8
./python /home/cf/issue11849_test.py
 GC  1843 pts/1    S+     1:26      1  1795 207276 204932 15.8
./python /home/cf/issue11849_test.py

With patch:

*** Python 3.3.0 alpha
---   PID TTY      STAT   TIME  MAJFL   TRS   DRS   RSS %MEM COMMAND
  0  1996 pts/1    S+     0:00      1  1795 10160  7616  0.5 ./python
/home/cf/issue11849_test.py
  1  1996 pts/1    S+     0:16      1  1795 64168 59836  4.6 ./python
/home/cf/issue11849_test.py
  2  1996 pts/1    S+     0:33      1  1795 114160 108908  8.4
./python /home/cf/issue11849_test.py
  3  1996 pts/1    S+     0:50      1  1795 163864 157944 12.2
./python /home/cf/issue11849_test.py
  4  1996 pts/1    S+     1:07      1  1795 213848 207008 15.9
./python /home/cf/issue11849_test.py
END  1996 pts/1    S+     1:26      1  1795 68280 63776  4.9 ./python
/home/cf/issue11849_test.py
 GC  1996 pts/1    S+     1:26      1  1795 12112  9708  0.7 ./python
/home/cf/issue11849_test.py

Antoine: since the increasing of the pymalloc threshold is part of the
solution to this problem, I'm attaching a standalone patch here
(pymalloc_threshold.diff). It's included in pymem.diff.
I'll try post some pybench results tomorrow.
msg134995 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-05-02 17:50
This is a very interesting patch, thank you.
I've tested it on Mandriva 64-bit and it indeed fixes the free() issue on the XML workload. I see no regression on pybench, stringbench or json/pickle benchmarks.

I guess the final patch will have to guard the mallopt() call with some #ifdef?
(also, I suppose a portable solution would have to call mmap() ourselves for allocation of arenas, but that would probably be a bit more involved)
msg135010 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-05-02 21:59
> I guess the final patch will have to guard the mallopt() call with some #ifdef?

Yes. See attached patch pymalloc_frag.diff
It's the first time I'm playing with autotools, so please review this part really carefully ;-)

> (also, I suppose a portable solution would have to call mmap() ourselves
> for allocation of arenas, but that would probably be a bit more involved)

Yes. But since it probably only affects glibc/eglibc malloc versions, I guess that target implementations are likely to provide mallopt(M_MMAP_THRESHOLD).
Also, performing an anonymous mappings varies even among Unices (the mmapmodule code is scary). I'm not talking about Windows, which I don't know at all.
msg135023 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-05-03 10:00
Patch looks fine to me, thank you.
msg135049 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-05-03 16:19
New changeset f8a697bc3ca8 by Antoine Pitrou in branch 'default':
Issue #11849: Make it more likely for the system allocator to release
http://hg.python.org/cpython/rev/f8a697bc3ca8
msg148293 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-11-25 00:30
For the record, this seems to make large allocations slower:

-> with patch:
$ ./python -m timeit "b'x'*200000"
10000 loops, best of 3: 27.2 usec per loop

-> without patch:
$ ./python -m timeit "b'x'*200000"
100000 loops, best of 3: 7.4 usec per loop

Not sure we should care, though. It's still very fast.
(noticed in http://mail.python.org/pipermail/python-dev/2011-November/114610.html )
msg148297 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-11-25 00:52
More surprising is that, even ignoring the allocation cost, other operations on the memory area seem more expensive:

$ ./python -m timeit -s "b=bytearray(500000)" "b[:] = b"
-> python 3.3:
1000 loops, best of 3: 367 usec per loop
-> python 3.2:
10000 loops, best of 3: 185 usec per loop

(note how this is just a dump memcpy)
msg148308 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-11-25 08:17
> For the record, this seems to make large allocations slower:
>
> -> with patch:
> $ ./python -m timeit "b'x'*200000"
> 10000 loops, best of 3: 27.2 usec per loop
>
> -> without patch:
> $ ./python -m timeit "b'x'*200000"
> 100000 loops, best of 3: 7.4 usec per loop
>

Yes, IIRC, I warned it could be a possible side effect: since we're
now using mmap() instead of brk() for large allocations (between 256B
and 32/64MB), it can be slower (that's the reason adaptive mmap
threadshold was introduced in the first place).

> More surprising is that, even ignoring the allocation cost, other operations on the memory area seem more expensive:

Hum, this it strange.
I see you're comparing 3.2 and default: could you run the same
benchmark on default with and without the patch ?
msg148313 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-11-25 12:18
> I see you're comparing 3.2 and default: could you run the same
> benchmark on default with and without the patch ?

Same results:
-> default branch:
1000 loops, best of 3: 364 usec per loop
-> default branch with patch reverted:
10000 loops, best of 3: 185 usec per loop

(with kernel 2.6.38.8-desktop-8.mga and glibc-2.12.1-11.2.mga1)

And I can reproduce on another machine:

-> default branch:
1000 loops, best of 3: 224 usec per loop
-> default branch with patch reverted:
10000 loops, best of 3: 88 usec per loop

(Debian stable with kernel 2.6.32-5-686 and glibc 2.11.2-10)
msg148314 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-11-25 12:52
Ah, sorry, false alarm. "b[:] = b" actually makes a temporary copy of the bytearray when assigning to itself (!).

However, there's still another strange regression:

$ ./python -m timeit \
  -s "n=300000; f=open('10MB.bin', 'rb', buffering=0); b=bytearray(n)" \
  "f.seek(0);f.readinto(b)"

-> default branch:
10000 loops, best of 3: 43 usec per loop
-> default branch with patch reverted:
10000 loops, best of 3: 27.5 usec per loop

FileIO.readinto executes a single read() into the passed buffer.
msg148363 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-11-25 21:51
> However, there's still another strange regression:
>
> $ ./python -m timeit \
>   -s "n=300000; f=open('10MB.bin', 'rb', buffering=0); b=bytearray(n)" \
>   "f.seek(0);f.readinto(b)"
>
> -> default branch:
> 10000 loops, best of 3: 43 usec per loop
> -> default branch with patch reverted:
> 10000 loops, best of 3: 27.5 usec per loop
>
> FileIO.readinto executes a single read() into the passed buffer.

On my box:
default:
$ ./python -m timeit -s "n=300000; f=open('/tmp/10MB.bin', 'rb');
b=bytearray(n)" "f.seek(0);f.readinto(b)"
1000 loops, best of 3: 640 usec per loop

default without patch ("$ hg revert -r 68258 Objects/obmalloc.c && make"):
$ ./python -m timeit -s "n=300000; f=open('/tmp/10MB.bin', 'rb');
b=bytearray(n)" "f.seek(0);f.readinto(b)"
1000 loops, best of 3: 663 usec per loop

I'm just observing a random variance (but my computer is maybe too
slow to notice).
However, I really don't see how the patch could play a role here.

Concerning the slight performance regression, if it's a problem, I see
two options:
- revert the patch
- replace calls to malloc()/free() by mmap()/munmap() to allocate/free
arenas (but I'm not sure anonymous mappings are supported by every OS
out there, so this might lead to some ugly #ifdef's...)
msg148364 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-11-25 22:00
> On my box:
> default:
> $ ./python -m timeit -s "n=300000; f=open('/tmp/10MB.bin', 'rb');
> b=bytearray(n)" "f.seek(0);f.readinto(b)"
> 1000 loops, best of 3: 640 usec per loop
> 
> default without patch ("$ hg revert -r 68258 Objects/obmalloc.c && make"):
> $ ./python -m timeit -s "n=300000; f=open('/tmp/10MB.bin', 'rb');
> b=bytearray(n)" "f.seek(0);f.readinto(b)"
> 1000 loops, best of 3: 663 usec per loop
> 
> I'm just observing a random variance (but my computer is maybe too
> slow to notice).

Hmm, quite slow indeed, are you sure you're not running in debug mode?

> However, I really don't see how the patch could play a role here.
> 
> Concerning the slight performance regression, if it's a problem, I see
> two options:
> - revert the patch
> - replace calls to malloc()/free() by mmap()/munmap() to allocate/free
> arenas (but I'm not sure anonymous mappings are supported by every OS
> out there, so this might lead to some ugly #ifdef's...)

If the performance regression is limited to read(), I don't think it's
really an issue, but using mmap/munmap explicitly would probably benicer
anyway (1° because it lets the glibc choose whatever heuristic is best,
2° because it would help release memory on more systems than just glibc
systems). I think limiting ourselves to systems which have
MMAP_ANONYMOUS is good enough.

Here is what the glibc malloc does btw:

/*
   Nearly all versions of mmap support MAP_ANONYMOUS,
   so the following is unlikely to be needed, but is
   supplied just in case.
*/

#ifndef MAP_ANONYMOUS

static int dev_zero_fd = -1; /* Cached file descriptor for /dev/zero. */

#define MMAP(addr, size, prot, flags) ((dev_zero_fd < 0) ? \
 (dev_zero_fd = open("/dev/zero", O_RDWR), \
  mmap((addr), (size), (prot), (flags), dev_zero_fd, 0)) : \
   mmap((addr), (size), (prot), (flags), dev_zero_fd, 0))

#else

#define MMAP(addr, size, prot, flags) \
 (mmap((addr), (size), (prot), (flags)|MAP_ANONYMOUS, -1, 0))

#endif
msg148366 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-11-25 22:45
> Hmm, quite slow indeed, are you sure you're not running in debug mode?
>

Well, yes, but it's no faster with a non-debug build: my laptop is
really crawling :-)

> If the performance regression is limited to read(), I don't think it's
> really an issue, but using mmap/munmap explicitly would probably benicer
> anyway (1° because it lets the glibc choose whatever heuristic is best,
> 2° because it would help release memory on more systems than just glibc
> systems). I think limiting ourselves to systems which have
> MMAP_ANONYMOUS is good enough.
>

Agreed.
Here's a patch.
msg148374 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-11-26 00:23
New changeset e7aa72e6aad4 by Antoine Pitrou in branch 'default':
Better resolution for issue #11849: Ensure that free()d memory arenas are really released
http://hg.python.org/cpython/rev/e7aa72e6aad4
msg202458 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-09 04:05
I just found this issue from this article:
http://python.dzone.com/articles/diagnosing-memory-leaks-python

Great job! Using mmap() for arenas is the best solution for this issue. I did something similar on a completly different project (also using its own dedicated memory allocator) for workaround the fragmentation of the heap memory.
msg202459 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2013-11-09 04:54
[@haypo]
> http://python.dzone.com/articles/diagnosing-memory-leaks-python
> Great job! Using mmap() for arenas is the best solution for this issue.

?  I read the article, and they stopped when they found "there seemed to be a ton of tiny little objects around, like integers.".  Ints aren't allocated from arenas to begin wtih - they have their own (immortal & unbounded) free list in Python2.  No change to pymalloc could make any difference to that.
msg202478 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-09 09:28
Extract of the "workaround" section:
"You could also run your Python jobs using Jython, which uses the Java JVM
and does not exhibit this behavior. Likewise, you could upgrade to Python
3.3 <http://bugs.python.org/issue11849>,"

Which contains a link to this issue.
msg310052 - (view) Author: Bob Kline (bkline) * Date: 2018-01-16 08:58
Would it be inappropriate for this fix to be applied to 2.7?
msg310053 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-01-16 08:59
It's not really a fix, it's an improvement, and as such doesn't belong in 2.7.  Using malloc() and free() is not a bug in itself.
msg310055 - (view) Author: Bob Kline (bkline) * Date: 2018-01-16 09:08
Sorry, I should have used the language of the patch author ("the resolution"). Without the resolution, Python 2.7 eventually runs out of memory and crashes for some correctly written user code.
msg310058 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-01-16 09:11
Well, memory fragmentation can happen with any allocation scheme, and it's possible even Python 3 isn't immune to this.  Backporting performance improvements is a strain on our resources and also constitutes a maintenance threat (what if the bug hides in the new code?).  And Python 2.7 is really nearing its end-of-life more and more everyday.  So IMHO it's a no-no.
msg310065 - (view) Author: Bob Kline (bkline) * Date: 2018-01-16 09:43
Thanks for your responses to my comments. I'm working as hard as I can to get my customer's systems migrated into the Python 3 world, and I appreciate the efforts of the community to provide incentives (such as the resolution for this failure) for developers to upgrade. However, it's a delicate balancing act sometimes, given that we have critical places in our system for which the same code runs more than twice as slowly on Python 3.6 as on Python 2.7.
msg310068 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2018-01-16 10:00
FYI, jemalloc can reduce memory usage, especially when application
is multithreaded.

https://www.speedshop.co/2017/12/04/malloc-doubles-ruby-memory.html
https://zapier.com/engineering/celery-python-jemalloc/
msg310086 - (view) Author: Bob Kline (bkline) * Date: 2018-01-16 12:51
> ... jemalloc can reduce memory usage ...

Thanks for the tip. I downloaded the source and successfully built the DLL, then went looking for a way to get it loaded. Unfortunately, DLL injection, which is needed to use this allocator in Python, seems to be much better supported on Linux than on Windows. Basically, Microsoft's documentation [1] for AppInit_DLL, the shim for DLL injection on Windows, says (in effect) "here's how to use this technique, but we don't recommend using it, so here's a link [2] for what we recommend you do instead. That link takes you to "Try searching for what you need. This page doesn’t exist."

[1] https://support.microsoft.com/en-us/help/197571/working-with-the-appinit-dlls-registry-value
[2] https://support.microsoft.com/en-us/help/134655
History
Date User Action Args
2022-04-11 14:57:16adminsetgithub: 56058
2018-01-16 12:51:42bklinesetmessages: + msg310086
2018-01-16 10:00:01methanesetnosy: + methane
messages: + msg310068
2018-01-16 09:43:00bklinesetmessages: + msg310065
2018-01-16 09:11:38pitrousetmessages: + msg310058
2018-01-16 09:08:40bklinesetmessages: + msg310055
2018-01-16 08:59:08pitrousetmessages: + msg310053
2018-01-16 08:58:08bklinesetnosy: + bkline
messages: + msg310052
2013-11-09 09:28:25vstinnersetmessages: + msg202478
2013-11-09 04:54:21tim.peterssetnosy: + tim.peters
messages: + msg202459
2013-11-09 04:05:24vstinnersetnosy: + vstinner
messages: + msg202458
2011-11-26 00:23:42python-devsetmessages: + msg148374
2011-11-25 22:45:18neologixsetfiles: + arenas_mmap.diff

messages: + msg148366
2011-11-25 22:00:00pitrousetmessages: + msg148364
2011-11-25 21:51:17neologixsetmessages: + msg148363
2011-11-25 12:52:55pitrousetmessages: + msg148314
2011-11-25 12:18:12pitrousetmessages: + msg148313
2011-11-25 08:17:07neologixsetmessages: + msg148308
2011-11-25 00:52:32pitrousetmessages: + msg148297
2011-11-25 00:30:00pitrousetnosy: + eli.bendersky
messages: + msg148293
2011-05-03 16:20:57pitrousetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2011-05-03 16:19:20python-devsetnosy: + python-dev
messages: + msg135049
2011-05-03 10:00:59pitrousetstage: patch review
messages: + msg135023
versions: - Python 3.1, Python 2.7, Python 3.2
2011-05-02 21:59:48neologixsetfiles: - pymem.diff
2011-05-02 21:59:34neologixsetfiles: - gc_trim.diff
2011-05-02 21:59:21neologixsetfiles: + pymalloc_frag.diff

messages: + msg135010
2011-05-02 17:50:30pitrousetmessages: + msg134995
2011-05-02 16:57:55neologixsetfiles: + pymem.diff, pymalloc_threshold.diff

messages: + msg134992
2011-04-25 19:02:09dmalcolmsetnosy: + dmalcolm
2011-04-25 15:57:06neologixsetmessages: + msg134392
2011-04-25 14:55:11pitrousetmessages: + msg134388
2011-04-25 12:36:05neologixsetmessages: + msg134380
2011-04-25 08:01:32kaifengsetfiles: + valgrind.log

messages: + msg134375
2011-04-24 23:22:16pitrousetmessages: + msg134360
2011-04-24 23:20:06pitrousettitle: ElementTree memory leak -> glibc allocator doesn't release all free()ed memory
messages: + msg134359
versions: + Python 3.3, - Python 2.5
2011-04-24 22:58:44neologixsetmessages: + msg134358
2011-04-19 17:26:47neologixsetmessages: + msg134083
2011-04-19 02:41:37kaifengsetmessages: + msg134008
2011-04-18 16:41:02neologixsetmessages: + msg133980
2011-04-18 10:01:24kaifengsetmessages: + msg133956
2011-04-18 00:37:29kaifengsetfiles: + issue11849_test2.py

messages: + msg133946
versions: + Python 2.7, Python 3.2
2011-04-17 22:27:53pitrousetnosy: + pitrou
messages: + msg133940
2011-04-17 14:41:41neologixsetfiles: + gc_trim.diff
keywords: + patch
2011-04-17 14:39:39neologixsetnosy: + neologix
messages: + msg133929
2011-04-15 12:32:47kaifengsetmessages: + msg133813
2011-04-15 11:41:26floxsetmessages: + msg133809
2011-04-15 11:39:27floxsetfiles: + issue11849_test.py

messages: + msg133808
2011-04-15 09:52:26kaifengsetmessages: + msg133800
2011-04-15 09:33:32floxsetnosy: + flox
messages: + msg133799
2011-04-15 09:08:38kaifengcreate