msg148399 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2011-11-26 13:12 |
Similar to issue #11849, this patch proposes to use VirtualAlloc/VirtualFree to allocate the Python allocator's memory arenas (rather than malloc() / free()). It might help release more memory if there is some fragmentation, although I don't know how Microsoft's malloc() works.
|
msg148605 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2011-11-29 20:48 |
The patch looks good to me.
To study Microsoft's malloc, see VC\crt\src\malloc.c. Typically, it uses HeapAlloc from the CRT heap, unless it's in 32-bit mode, and __active_heap is either __V6_HEAP or __V5_HEAP. This is determined at startup by __heap_select, inspecting an environment variable __MSVCRT_HEAP_SELECT. If that's not set, the CRT heap is used.
The CRT heap, in turn, is created with HeapCreate (no flags).
As an alternative approach, Python could consider completely dropping obmalloc on Windows, and using a Windows Low Fragementation Heap (LFH) instead, with HEAP_NO_SERIALIZE (as the heap would be protected by the GIL).
If we take the route proposed by this patch, I recommend also dropping all other CRT malloc() calls in Python, and make allocations from the process heap instead (that's a separate issue, though).
|
msg148611 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2011-11-29 21:02 |
> The patch looks good to me.
>
> To study Microsoft's malloc, see VC\crt\src\malloc.c. Typically, it
> uses HeapAlloc from the CRT heap, unless it's in 32-bit mode, and
> __active_heap is either __V6_HEAP or __V5_HEAP. This is determined at
> startup by __heap_select, inspecting an environment variable
> __MSVCRT_HEAP_SELECT. If that's not set, the CRT heap is used.
Ah, right, I guessed it was using HeapAlloc indeed. What would be more
interesting is how HeapAlloc works :)
I think it would be nice to know whether the patch has a chance of being
useful before committing it. I did it as a thought experiment after the
similar change was committed for Unix, but I'm not an expert in Windows
internals. Perhaps HeapAlloc deals fine with fragmentation? Tim, Brian,
do you know anything about this?
> As an alternative approach, Python could consider completely dropping
> obmalloc on Windows, and using a Windows Low Fragementation Heap (LFH)
> instead, with HEAP_NO_SERIALIZE (as the heap would be protected by the
> GIL).
I'm not sure that would serve the same purpose as obmalloc, which
(AFAIU) is very fast at the expense of compacity.
|
msg148612 - (view) |
Author: Tim Golden (tim.golden) * |
Date: 2011-11-29 21:04 |
'fraid not. I've never had to dig into the allocation stuff at this level.
|
msg148621 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2011-11-29 22:13 |
> I think it would be nice to know whether the patch has a chance of being
> useful before committing it. I did it as a thought experiment after the
> similar change was committed for Unix, but I'm not an expert in Windows
> internals. Perhaps HeapAlloc deals fine with fragmentation?
Unfortunately, the implementation of HeapAlloc isn't really documented.
If Reactos is right, it looks like this: http://bit.ly/t2NPHh
Blocks < 1024 bytes are allocated from per-size free lists.
Blocks < Heap->VirtualMemoryThreshold are allocated through the free
list for variable-sized blocks of the heap.
Other blocks are allocated through ZwAllocateVirtualMemory, adding
sizeof(HEAP_VIRTUAL_ALLOC_ENTRY) in the beginning. I think this header
will cause malloc() to allocate one extra page in front of an arena.
>> As an alternative approach, Python could consider completely dropping
>> obmalloc on Windows, and using a Windows Low Fragementation Heap (LFH)
>> instead, with HEAP_NO_SERIALIZE (as the heap would be protected by the
>> GIL).
>
> I'm not sure that would serve the same purpose as obmalloc, which
> (AFAIU) is very fast at the expense of compacity.
I'd expect that LFH heaps are also very fast. The major difference I can
see is that blocks in the LFH heap still have an 8-byte header (possibly
more on a 64-bit system). So I wouldn't expect any speed savings, but
(possibly relevant) memory savings from obmalloc.
|
msg148623 - (view) |
Author: Brian Curtin (brian.curtin) * |
Date: 2011-11-29 22:39 |
> Tim, Brian, do you know anything about this?
Unfortunately, no. It's on my todo list of things to understand but I don't see that happening in the near future.
I'm willing to run tests or benchmarks for this issue, but that's likely the most I can provide.
|
msg148625 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2011-11-29 23:12 |
Le mardi 29 novembre 2011 à 22:39 +0000, Brian Curtin a écrit :
> Brian Curtin <brian@python.org> added the comment:
>
> > Tim, Brian, do you know anything about this?
>
> Unfortunately, no. It's on my todo list of things to understand but I
> don't see that happening in the near future.
>
> I'm willing to run tests or benchmarks for this issue, but that's
> likely the most I can provide.
Benchmarks would be nice indeed.
|
msg163350 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2012-06-21 17:10 |
Here is a benchmark. Based on my assumption that this patch may reduce allocation overheads due to minimizing padding+fragmentation, it allocates a lot of memory, and then waits 20s so you can check in the process explorer what the "Commit Size" of the process is.
For the current 3.3 tree, in 32-bit mode, on a 64-bit Windows 7 installation, I get 464,756K for the unpatched version, and 450,436K for the patched version.
This is a 3% saving, which seems good enough for me.
|
msg163351 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2012-06-21 17:11 |
Here is an updated patch.
|
msg189760 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2013-05-21 14:21 |
Martin, do you think your latest patch can be committed?
|
msg189771 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2013-05-21 16:24 |
Antoine's request for benchmarks still stands. I continue to think that it should be applied even in absence of benchmarks. In the absence of third opinions on this specific aspect, I don't think it can be applied.
|
msg189824 - (view) |
Author: Charles-François Natali (neologix) * |
Date: 2013-05-22 17:03 |
I can't speak for Antoine, but I guess that the result of pybench
would be enough to make sure it doesn't introduce any regression
(which would be *really* suprising).
As for the memory savings, the benchmark you posted earlier is
conclusive enough IMO (especially since the it can be difficult to
come up with a scheme leading to heap fragmentation).
|
msg189825 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-05-22 17:24 |
I asked for benchmarks because I don't know anything about Windows virtual memory management, but if other people think this patch should go in then it's fine.
The main point of using VirtualAlloc/VirtualFree was, in my mind, to allow *releasing* memory in more cases than when relying on free() (assuming Windows uses some sbrk() equivalent). But perhaps Windows is already tuned to release memory on most free() calls.
|
msg189850 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2013-05-23 06:34 |
Ah ok. I guess tuples.py then indeed demonstrates a saving. I'll apply the patch.
|
msg189876 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-05-23 20:07 |
Set also issue #3329 which proposes an API to define memory allocators.
|
msg190427 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-05-31 23:29 |
I tested VirtualAlloc/VirtualFree versus malloc/free on Windows 7 SP1 64-bit. On my small test, when using VirtualAlloc/VirtualFree, the memory peak is lower (ex: 58.1 MB vs 59.0), and the memory usage is the same or sometimes lower. The difference is small, malloc() implementation on Windows 7 is efficient! But I am in favor of using VirtualAlloc/VirtualFree because it is the native API and the gain may be bigger on a real application.
--
I used the following script for my test:
https://bitbucket.org/haypo/misc/raw/98eb42a3ed2144141d62c75e3d07933839fe2a0c/python/python_memleak.py
I reused get_process_mem_info() code from psutil to get current and peak memory usage (I failed to install psutil, I don't understand why).
I also replace func() of my script with tuples.py to create many tuples.
--
Python < 3.3 wastes a lot of memory with python_memleak.py. Python 3.3 behaves much better thanks to the usage of mmap() on Linux, and the fixed threshold on 64-bit (min=512 bytes, instead of 256).
|
msg191252 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-06-16 02:05 |
Martin von Loewis: "If we take the route proposed by this patch, I recommend also dropping all other CRT malloc() calls in Python, and make allocations from the process heap instead (that's a separate issue, though)."
=> see issue #18203
|
msg191253 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-06-16 02:11 |
haypo> I tested VirtualAlloc/VirtualFree versus malloc/free
haypo> on Windows 7 SP1 64-bit. On my small test, ...
I realized that I was no precise: I tried attached va.diff patch. I didn't try to replace completly malloc().
|
msg191461 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-06-19 12:03 |
> Ah ok. I guess tuples.py then indeed demonstrates a saving. I'll apply the patch.
According to my test, the memory usage is a little bit better with the patch. So Martin:,do you plan to commit the patch?
Or is a benchmark required? Or should check first check the Low Fragmentation Allocator?
I plan to test the Low Fragmentation Allocator, at least on Windows 7. But I prefer to do it later, I'm working on the PEP 445 right now.
|
msg191504 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-06-20 08:24 |
> I plan to test the Low Fragmentation Allocator, at least on Windows 7.
I don't think it can be any better than raw mmap() / VirtualAlloc()...
|
msg191506 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2013-06-20 08:45 |
>> I plan to test the Low Fragmentation Allocator, at least on Windows 7.
> I don't think it can be any better than raw mmap() / VirtualAlloc()...
I mean using the Low Fragmentation Allocator for PyObject_Malloc()
instead of pymalloc.
Martin wrote (msg148605):
"As an alternative approach, Python could consider completely dropping
obmalloc on Windows, and using a Windows Low Fragementation Heap (LFH)
instead, with HEAP_NO_SERIALIZE (as the heap would be protected by the
GIL)."
|
msg191507 - (view) |
Author: Martin v. Löwis (loewis) * |
Date: 2013-06-20 10:50 |
Ok, I'm going to commit this patch. Any further revisions (including reversions) can be done then.
|
msg191939 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2013-06-27 10:24 |
New changeset 44f455e6163d by Martin v. Löwis in branch 'default':
Issue #13483: Use VirtualAlloc in obmalloc on Windows.
http://hg.python.org/cpython/rev/44f455e6163d
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:24 | admin | set | github: 57692 |
2013-06-27 10:24:52 | loewis | set | status: open -> closed resolution: fixed |
2013-06-27 10:24:20 | python-dev | set | nosy:
+ python-dev messages:
+ msg191939
|
2013-06-20 10:50:56 | loewis | set | messages:
+ msg191507 |
2013-06-20 08:45:06 | vstinner | set | messages:
+ msg191506 |
2013-06-20 08:24:56 | pitrou | set | messages:
+ msg191504 |
2013-06-19 12:03:01 | vstinner | set | messages:
+ msg191461 |
2013-06-17 22:42:29 | trent | set | nosy:
+ trent
|
2013-06-16 02:11:36 | vstinner | set | messages:
+ msg191253 |
2013-06-16 02:05:20 | vstinner | set | messages:
+ msg191252 |
2013-06-06 14:17:53 | giampaolo.rodola | set | nosy:
+ giampaolo.rodola
|
2013-05-31 23:29:07 | vstinner | set | messages:
+ msg190427 |
2013-05-23 20:07:33 | vstinner | set | nosy:
+ vstinner messages:
+ msg189876
|
2013-05-23 06:34:07 | loewis | set | messages:
+ msg189850 |
2013-05-22 17:24:57 | pitrou | set | stage: commit review messages:
+ msg189825 versions:
+ Python 3.4, - Python 3.3 |
2013-05-22 17:03:57 | neologix | set | messages:
+ msg189824 |
2013-05-21 16:24:28 | loewis | set | messages:
+ msg189771 |
2013-05-21 14:21:57 | neologix | set | messages:
+ msg189760 |
2012-06-21 17:11:04 | loewis | set | files:
+ va.diff
messages:
+ msg163351 |
2012-06-21 17:10:39 | loewis | set | files:
+ tuples.py
messages:
+ msg163350 |
2011-11-29 23:12:56 | pitrou | set | messages:
+ msg148625 |
2011-11-29 22:39:18 | brian.curtin | set | messages:
+ msg148623 |
2011-11-29 22:13:23 | loewis | set | messages:
+ msg148621 |
2011-11-29 21:04:01 | tim.golden | set | messages:
+ msg148612 |
2011-11-29 21:02:08 | pitrou | set | messages:
+ msg148611 |
2011-11-29 20:48:33 | loewis | set | nosy:
+ loewis messages:
+ msg148605
|
2011-11-26 13:12:42 | pitrou | create | |