This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Enhance obmalloc allocation strategy
Type: resource usage Stage:
Components: Interpreter Core Versions: Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: kristjan.jonsson, larry, neologix, pitrou, tim.peters, vstinner
Priority: normal Keywords: patch

Created on 2014-04-14 18:23 by kristjan.jonsson, last changed 2022-04-11 14:58 by admin.

Files
File name Uploaded Description Edit
obmalloc.patch kristjan.jonsson, 2014-04-14 18:23 review
obmalloc.patch kristjan.jonsson, 2014-04-15 15:45 review
Messages (12)
msg216156 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2014-04-14 18:23
A new allocation policy, "lowest address strategy" improves fragmentation of memory in obmalloc.  pools with available memory are chosen by lowest address preference.  This increases the likelihood that unused pools are released to their corresponding arenas.  Arenas with available pools are similarly chosen by lowest address.
This significantly helps fragmentation in programs with dynamic memory usage, e.g. long running programs.  Initial tests also indicate some minor performance benefits of the pybench, probably due to better cache behaviour.
msg216316 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2014-04-15 15:45
Update patch with suggestions from Larry
msg216338 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-04-15 16:59
In Python 3, arenas are allocated using mmap(), so wherever the arena ends up in the address space shouldn't matter, should it?
Also, the new strategy adds an O(n) component in the allocator.
msg216440 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-04-16 06:40
It was also discussed to replace pymalloc with Windows Low Fragementation Heap (LFH) allocator on Windows:
http://bugs.python.org/issue13483#msg148605
msg216441 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-04-16 06:44
It would also be interesting to compare fragmentation and performances of Python with and without pymalloc, maybe with other heap allocators like FreeBSD jemalloc and Google TCMalloc.
msg216443 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-04-16 07:03
> In Python 3, arenas are allocated using mmap(), so wherever the arena ends up in the address space shouldn't matter, should it?

Indeed, although the effect on cache locality isn't clear.
Also, I don't think this solves the problem of having a single object
allocated inside a high address arena preventing the heap from
shrinking (which was the original reason for having the arenas
allocated by mmap).

Anyway, we can only go that far with reference counting (I mean that
you'd need a proper moving garbage collector for this).
msg216529 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2014-04-16 18:01
Antoine:  The location of the arenas when they're individually allocated with mmap does not matter, no, but preferring to keep low address ones reduces vmem fragmentation, since they end up being clustered together in memory.  For the usable-arenas list, there is no extra O(n) because they were ordered anyway.  the effect of ARENA_STRATEGY is minor, but it helps for it to be consistent with POOL_STRATEGY.

The real win however is with POOL_STRATEGY.  Fragmentation is dramatically reduced.  This is demonstrated with the tools/scripts/memcrunch.py which you can use to experiment with it.  Performance e.g. of unittests also goes up.  The fact that there is a new O(n) sort operation when a pool becomes 'used' does not seem to matter for that.

Victor: I've tested using windows LFH many times before, the python obmalloc generally is much faster than that.  Annoying :).  It is actually a very good allocator.

The innovation here is the "lowest address strategy" which I have never seen before (it might be known, but then I'm not a CS) but is one that I have experimented with for often in the past.  It is suprisingly effective.  When there is memory churn, memory usage tends to migrate towards low addresses and free up memory.  Go ahead, try the scripts and see what happens.  The proof is in the pudding :)
msg216530 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2014-04-16 18:02
sorry, I meant of course "performance of pybench.py goes up"
msg216533 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-04-16 18:06
> sorry, I meant of course "performance of pybench.py goes up"

pybench is really a very poor (extremely low-level) benchmark.
It would be nice if you could test with the benchmark suite at
http://hg.python.org/benchmarks
msg216544 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2014-04-16 18:21
Sure.  I'm flying home from PyCon this afternoon.  I´ll produce and tabulate data once I'm home on my workstation again.
msg217230 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-04-27 00:22
I spend some nights to try to understand the memory usage of the following Python script:
https://bitbucket.org/haypo/misc/src/31bf03ace91db3998981ee56caf80f09c29991f5/memory/python_memleak.py?at=default

It looks like the weird memory usage (aka "memory fragmentation"?) was fixed in Python 3.3.

> This significantly helps fragmentation in programs with dynamic memory usage, e.g. long running programs.

On which programs? The fragmentation of the memory depends a lot on how the program allocates memory. For example, if a program has no "temporary memory peak", it should not be a victim of the memory fragmentation.

To measure the improvment of such memory allocator, more benchmarks (speed and fragmentation) should be run than a single test (memcruch.py included in the test) written to benchmark the allocator.
msg217353 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2014-04-28 09:42
>> This significantly helps fragmentation in programs with dynamic memory usage, e.g. long running programs.

> On which programs? The fragmentation of the memory depends a lot on how the program allocates memory. For example, if a program has no "temporary memory peak", it should not be a victim of the memory fragmentation.

"Long running programs."  E.g. web servers and so on.  Where there is a churn in memory usage.  As objects are allocated and released with some sort of churn.  New objects will be allocated from lower-address pages, where higher address pages are increasingly likely to be freed as no-longer used objects are released.  This is the second best thing to a sliding-block allocator and is motivated by the same requirements that makes such an sliding-block allocator (such as pypy uses) desirable in the first place. 


> To measure the improvment of such memory allocator, more benchmarks (speed and fragmentation) should be run than a single test (memcruch.py included in the test) written to benchmark the allocator.

Yes.  Memcrunch was specifically written to simulate a case where objects are continuously created and released, such as might be expected in a server, with a peak in memory usage followed by lower memory usage, and to demonstrate that the pages allocated during the peak will be released later as memory churn causes memory usage to migrate toward lower addresses.

However, following Antoine's advice I ran the Benchmarks testsuite and found an adverse effect in the n-body benchmark.  That can have two causes:
a) the insertion cost of the block when a block moves from 'full' to 'used'.  This is a rare event and should be unimportant.  I will instrument this for this test and see if it is really the reason
b) Cache effects because a newly 'used' block is not immediately allocated from.  Again, it happens rarely that a block is linked at the head so this shouldn't be significant.

Because of this, this change isn't yet ready to be applied.
If however I manage to change the policy so that memory usage becomes better while still preserving performance of the benchmark tests, I will report back :)
History
Date User Action Args
2022-04-11 14:58:01adminsetgithub: 65419
2014-04-28 09:42:20kristjan.jonssonsetmessages: + msg217353
2014-04-27 00:22:27vstinnersetmessages: + msg217230
2014-04-16 18:21:49kristjan.jonssonsetmessages: + msg216544
2014-04-16 18:06:08pitrousetmessages: + msg216533
2014-04-16 18:02:41kristjan.jonssonsetmessages: + msg216530
2014-04-16 18:01:58kristjan.jonssonsetmessages: + msg216529
2014-04-16 07:03:37neologixsetmessages: + msg216443
2014-04-16 06:44:08vstinnersetmessages: + msg216441
2014-04-16 06:40:18vstinnersetnosy: + vstinner
messages: + msg216440
2014-04-15 16:59:54pitrousetnosy: + tim.peters, neologix, pitrou
messages: + msg216338
2014-04-15 15:46:02kristjan.jonssonsetfiles: + obmalloc.patch

messages: + msg216316
2014-04-14 21:17:33kristjan.jonssonsetnosy: + larry
2014-04-14 18:23:06kristjan.jonssoncreate