classification
Title: Use "Low-fragmentation Heap" memory allocator on Windows
Type: performance Stage: resolved
Components: Windows Versions: Python 3.6
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: paul.moore, steve.dower, tim.golden, vstinner, zach.ware
Priority: normal Keywords:

Created on 2016-01-31 17:55 by vstinner, last changed 2018-05-29 22:11 by vstinner. This issue is now closed.

Messages (8)
msg259293 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-01-31 17:55
Python has a memory allocator optimized for allocations <= 512 bytes: PyObject_Malloc(). It was discussed to replace it by the native "Low-fragmentation Heap" memory allocator on Windows.

I'm not aware of anyone who tried that. I would nice to try, especially to run benchmarks.

See also the issue #26249: "Change PyMem_Malloc to use PyObject_Malloc allocator?".
msg259294 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-01-31 17:56
"Low-fragmentation Heap":
https://msdn.microsoft.com/en-us/library/windows/desktop/aa366750%28v=vs.85%29.aspx
msg259296 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-01-31 17:57
The issue #19246 "high fragmentation of the memory heap on Windows" was rejected but discussed the Windows Low Fragmented Heap.
msg297106 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-06-28 01:12
Is there anyway interested to experiment to write such change and run benchmarks with it?
msg297209 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2017-06-28 19:11
We tried it at one point, but it made very little difference because we don't use the Windows heap for most allocations. IIRC, replacing Python's optimised allocator with the LFH was a slight performance regression, but I'm not sure the benchmarks were reliable enough back then to be trusted. I'm also not sure what optimisations have been performed in Windows 8/10.

Since the LFH is the default though, it really should just be a case of replacing Py_Malloc with a simple HeapAlloc shim and testing it. The APIs are nearly the same (the result of GetProcessHeap() will be stable for the lifetime of the process, and there's little value in creating specific heaps unless you intend to destroy it rather than free each allocation individually).
msg297594 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-07-03 14:24
Steve: "We tried it at one point, but it made very little difference (...)"

Ok. Can I close the issue?
msg297610 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2017-07-03 19:27
I wouldn't be opposed to seeing it tried again, but I have no strong opinion. I don't think this is a major performance bottleneck right now.
msg318122 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-05-29 22:11
I failed to find the bandwidth to work on this issue since 2 years, so I just abandon this idea. However the performance benefit is non obvious.
History
Date User Action Args
2018-05-29 22:11:35vstinnersetstatus: open -> closed
resolution: out of date
messages: + msg318122

stage: resolved
2017-07-03 19:27:50steve.dowersetmessages: + msg297610
2017-07-03 14:24:01vstinnersetmessages: + msg297594
2017-06-28 19:11:58steve.dowersetmessages: + msg297209
2017-06-28 01:12:04vstinnersetmessages: + msg297106
2016-01-31 17:57:54vstinnersetmessages: + msg259296
2016-01-31 17:56:10vstinnersetmessages: + msg259294
2016-01-31 17:55:40vstinnercreate