Issue3526
Created on 2008-08-08 10:11 by sable, last changed 2008-09-10 16:33 by sable.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | Remove |
| customized_malloc_SUN.pdf | sable, 2008-08-08 10:11 | |||
| customized_malloc_AIX.pdf | sable, 2008-08-08 10:13 | |||
| patch_dlmalloc.diff | sable, 2008-08-08 10:15 | |||
| patch_dlmalloc2.diff | sable, 2008-09-09 15:58 | |||
| patch_dlmalloc3.diff | sable, 2008-09-10 16:33 | |||
| Messages (13) | |||
|---|---|---|---|
| msg70897 - (view) | Author: Sébastien Sablé (sable) | Date: 2008-08-08 10:11 | |
Hi, We run a big application mostly written in Python (with Pyrex/C extensions) on different systems including Linux, SunOS and AIX. The memory footprint of our application on Linux is fine; however we found that on AIX and SunOS, any memory that has been allocated by our application at some stage will never be freed at the system level. After doing some analysis (see the 2 attached pdf documents), we found that this is linked to the implementation of malloc on those various systems: The malloc used on Linux (glibc) is based on dlmalloc as described in this document: http://g.oswego.edu/dl/html/malloc.html This implementation will use sbrk to allocate small chunks of memory, but it will use mmap to allocate big chunks. This ensures that the memory will actually get freed when free is called. AIX and Sun have a more naive malloc implementation, so that the memory allocated by an application through malloc is never actually freed until the application leaves (this behavior has been confirmed by some experts at IBM and Sun when we asked them for some feedback on this problem - there is a 'memory disclaim' option on AIX but it is disabled by default as it brings some major performance penalities). For long running Python applications which may allocate a lot of memory at some stage, this is a major drawback. In order to bypass this limitation of the system on AIX and SunOS, we have modified Python so that it will use the customized malloc implementation dlmalloc like in glibc (see attached patch) - dlmalloc is released in the public domain. This patch adds a --enable-dlmalloc option to configure. When activated, we observed a dramatic reduction of the memory used by our application. I think many AIX and SunOS Python users could be interested by such an improvement. -- Sébastien Sablé Sungard |
|||
| msg70908 - (view) | Author: Antoine Pitrou (pitrou) | Date: 2008-08-08 19:11 | |
This is very interesting, although it should probably go through discussion on python-dev since it involves integrating a big chunk of external code. |
|||
| msg70920 - (view) | Author: Martin v. Löwis (loewis) | Date: 2008-08-08 22:46 | |
I cannot quite see why the problem is serious: even though the memory is not returned to the system, it will be swapped out to the swap file, so it doesn't consume any real memory (just swap space). I don't think Python should integrate a separate malloc implementation. Instead, Python's own memory allocate (obmalloc) should be changed to directly use the virtual memory interfaces of the operating system (i.e. mmap), bypassing the malloc of the C library. So I'm -1 on this patch. |
|||
| msg70929 - (view) | Author: Antoine Pitrou (pitrou) | Date: 2008-08-09 10:57 | |
Le vendredi 08 août 2008 à 22:46 +0000, Martin v. Löwis a écrit : > Instead, Python's own memory allocate (obmalloc) should be changed to > directly use the virtual memory interfaces of the operating system (i.e. > mmap), bypassing the malloc of the C library. How would that interact with fork()? |
|||
| msg70940 - (view) | Author: Martin v. Löwis (loewis) | Date: 2008-08-09 17:25 | |
>> Instead, Python's own memory allocate (obmalloc) should be changed to >> directly use the virtual memory interfaces of the operating system (i.e. >> mmap), bypassing the malloc of the C library. > > How would that interact with fork()? Nicely, why do you ask? Any anonymous mapping will be copied (typically COW) to the child process, in fact, malloc itself uses anonymous mapping (at least on Linux). |
|||
| msg70945 - (view) | Author: Antoine Pitrou (pitrou) | Date: 2008-08-09 17:53 | |
Le samedi 09 août 2008 à 17:28 +0000, Martin v. Löwis a écrit : > Martin v. Löwis <martin@v.loewis.de> added the comment: > > >> Instead, Python's own memory allocate (obmalloc) should be changed to > >> directly use the virtual memory interfaces of the operating system (i.e. > >> mmap), bypassing the malloc of the C library. > > > > How would that interact with fork()? > > Nicely, why do you ask? Because I didn't know :) But looking at the dlmalloc implementation bundled in the patch, it seems that using mmap/munmap (or VirtualAlloc/VirtualFree under Windows) should be ok. Do you think we should create a separate issue for this improvement? It could also solve #3531. |
|||
| msg72382 - (view) | Author: Sébastien Sablé (sable) | Date: 2008-09-03 10:28 | |
[sorry for the late reply, I have been on holidays] Martin: you are right that this memory is moved to swap and does not consume any "real" memory; however we decided to work on this patch because we observed on our application some performances degradation due to this memory not being deallocated correctly. Since then we have done some quite extensive tests (with the help of a consultant at Sun): they have shown that this unnecessary swapping has a noticeable impact on performances and at worst, when the system memory is saturated, can completely put a server on its knees for several minutes (we're talking of top of the line SunOS and AIX servers with hundreds of GB of memory). I will write a complete document explaining the tests and observations that we did, but this memory issue was critical for us given the degradation of performances it was generating on our production servers. Concerning dlmalloc, you are right that it would be cleaner to improve obmalloc so that it uses mmap when necessary, instead of adding another layer with dlmalloc (even though that is what actually currently happens on linux systems where dlmalloc is integrated in libc). I will try to do that patch in coming weeks (obmalloc mostly allocates some 256KB arenas so it should nearly always use mmap). |
|||
| msg72750 - (view) | Author: Martin v. Löwis (loewis) | Date: 2008-09-07 19:45 | |
> I will try to do that patch in coming weeks (obmalloc mostly allocates > some 256KB arenas so it should nearly always use mmap). Exactly so. If you can, please also consider supporting Windows, in the same way. Anything in obmalloc that is not arena space should continue to come from malloc, I believe. |
|||
| msg72758 - (view) | Author: Tim Peters (tim_one) | Date: 2008-09-08 00:52 | |
> Anything in obmalloc that is not arena space should continue to come > from malloc, I believe. Sorry, but I don't understand why arena space should be different. If a platform's libc implementers think mmap should be used to obtain 256KB chunks (i.e., arenas), then surely they implement the platform malloc to defer to mmap in such cases. If they don't but "should", then bugging the platform vendor to improve the system malloc in this respect is the best idea (then all apps on the platform benefit, and Python stays simpler). OTOH, if for some compelling reason it's believed Python knows better than platform vendors, then obmalloc should be uglied-up on all paths to make the enlightened choice. |
|||
| msg72761 - (view) | Author: Martin v. Löwis (loewis) | Date: 2008-09-08 03:21 | |
> OTOH, if for some compelling reason it's believed Python knows better > than platform vendors, then obmalloc should be uglied-up on all paths to > make the enlightened choice. I'm proposing that obmalloc is changed to know better than system malloc on systems supporting anonymous mmap, and Windows, and that the call malloc(ARENA_SIZE) is replaced by mmap. This has the advantage of doing better than system malloc on Solaris, plus it also might guarantee that arenas will be POOL_SIZE aligned. OTOH, the calls realloc(arenas, nbytes) malloc(nbytes) should continue to go to system malloc, because they are typically not multiples of the system page size. |
|||
| msg72762 - (view) | Author: Tim Peters (tim_one) | Date: 2008-09-08 03:26 | |
I have to admit that if Python /didn't/ know better than platform libc implementers in some cases, there would be no point to having obmalloc at all :-( What you (Martin) suggest is reasonable enough. |
|||
| msg72876 - (view) | Author: Sébastien Sablé (sable) | Date: 2008-09-09 15:58 | |
Here is a new patch so that pymalloc can be combined with dlmalloc. I first added the --with-pymalloc-mmap option to configure.in which ensures that pymalloc arenas are allocated through mmap when possible. However I found this was not enough: PyObject_Malloc uses arenas only when handling objects smaller than 256 bytes. For bigger objects, it directly rely on the system malloc. There are also some big buffers which can be directly allocated through PyMem_MALLOC. This patch can be activated by compiling Python with: --with-pymalloc --with-pymalloc-mmap --with-dlmalloc The behavior is then like that: * PyObject_MALLOC will allocate arenas with mmap * when allocating an object smaller than 256 bytes with PyObject_MALLOC, it will be stored in an arena (like before) * when allocating an object bigger than 256 bytes with PyObject_MALLOC, it will be allocated by dlmalloc (if it is smaller than 256KB it will go in a dlmalloc pool, otherwise it will be mmaped) * allocation through PyMem_MALLOC is handled by dlmalloc I think it is a good compromise: On systems like Linux, where the system malloc is already clever enough, compiling with only --with-pymalloc should behave like before. On systems like SunOS and AIX, this patch ensures that Python can benefit of the speed of pymalloc for small objects, while ensuring that most of the memory allocated can be correctly released at the system level. |
|||
| msg72975 - (view) | Author: Sébastien Sablé (sable) | Date: 2008-09-10 16:33 | |
My previous patch has a small problem as I believed dlmalloc was always returning a non-NULL value, even when asking for 0 bytes. It turns out not to be the case, so here is a new patch (patch_dlmalloc3.diff) which must be applied after the previous one (patch_dlmalloc2.diff) to correct this problem. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2008-09-10 16:33:03 | sable | set | files:
+ patch_dlmalloc3.diff messages: + msg72975 |
| 2008-09-09 15:59:06 | sable | set | files:
+ patch_dlmalloc2.diff messages: + msg72876 |
| 2008-09-08 03:26:34 | tim_one | set | messages: + msg72762 |
| 2008-09-08 03:21:22 | loewis | set | messages: + msg72761 |
| 2008-09-08 00:52:07 | tim_one | set | nosy:
+ tim_one messages: + msg72758 |
| 2008-09-07 19:45:13 | loewis | set | messages: + msg72750 |
| 2008-09-03 10:28:09 | sable | set | messages: + msg72382 |
| 2008-08-09 17:53:52 | pitrou | set | messages: + msg70945 |
| 2008-08-09 17:25:56 | loewis | set | messages: + msg70940 |
| 2008-08-09 10:57:09 | pitrou | set | messages: + msg70929 |
| 2008-08-08 22:46:50 | loewis | set | nosy:
+ loewis messages: + msg70920 |
| 2008-08-08 19:11:07 | pitrou | set | priority: normal nosy: + pitrou messages: + msg70908 components: + Interpreter Core versions: + Python 3.1, Python 2.7 |
| 2008-08-08 10:15:35 | sable | set | files:
+ patch_dlmalloc.diff keywords: + patch |
| 2008-08-08 10:13:45 | sable | set | files: + customized_malloc_AIX.pdf |
| 2008-08-08 10:11:58 | sable | create | |