classification
Title: Allow memory sections to be OS MERGEABLE
Type: enhancement Stage:
Components: Library (Lib) Versions:
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: Fry-kun, amaury.forgeotdarc, dmalcolm, georg.brandl, hunteke, loewis, pitrou, s7v7nislands
Priority: normal Keywords:

Created on 2010-09-24 17:52 by hunteke, last changed 2011-05-23 04:54 by loewis. This issue is now closed.

Messages (11)
msg117317 - (view) Author: Kevin Hunter (hunteke) Date: 2010-09-24 17:52
Should Python enable a way for folks to inform the OS of MADV_MERGEABLE memory?

I can't speak for other OSs, but Linux added the ability for processes to inform the kernel that they have memory that will likely not change for a while in 2.6.32.  This is done through the madvise syscall with MADV_MERGEABLE.

http://www.kernel.org/doc/Documentation/vm/ksm.txt

After initial conversations in IRC, it was suggested that this would be difficult in the Python layer, but that the OS doesn't care what byte page it's passed as "mergeable".  Thus when I, as an application programmer, know that I have some objects that will be around "for awhile", and that won't change, I can let the OS know that it might be beneficial to merge them.

I suggest this might be a library because it may only be useful for certain projects.
msg117318 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-09-24 18:04
With CPython, even objects that don't change see their reference counter modified quite frequently, just by looking at them.
What kind of memory would you mark this way?
msg117349 - (view) Author: Kevin Hunter (hunteke) Date: 2010-09-25 05:41
My first thought is "Why is the reference counter stored with the object itself?"  I imagine there are very good reasons, however, and this is not an area in which I have much mastery.

Answering the question as best I can: I don't know how the reference counter is implemented in CPython, but if it's just a field in a struct, then madvise could be sent the memory location starting with the byte immediately following the reference counter.

If there's more to it than that, I'll have to back off with "I don't know."  I'm perhaps embarrassed that I'm not at all a Python developer, merely a Python application developer.  I have a few Python projects that are memory hungry, that at first glance I believe to be creating MERGEABLE objects.
msg117353 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2010-09-25 10:26
> My first thought is "Why is the reference counter stored with the object itself?"

Because if you move the reference counter out of the object, you a) add another indirection and b) depending on how you implement it require a certain amount of memory more per object.

It's far from obvious that the possible benefits are worth this, and needs to be tested carefully, which nobody has done yet.
msg117355 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-09-25 10:34
> Answering the question as best I can: I don't know how the reference
> counter is implemented in CPython, but if it's just a field in a
> struct, then madvise could be sent the memory location starting with
> the byte immediately following the reference counter

Well, first, this would only work for large objects. Must objects in Python are quite small individually, unless you have very large (unicode or binary) strings, or very big integers.

Second, madvise() works at the page granularity (4096 bytes on most system), and it will be very likely this will include the reference count for the current object.

Third, MADV_MERGEABLE will only be efficient if you have actual duplications of whole memory pages (and, practically, if you have enough of them to make a real difference). Why do you think you might have such duplication in your workload?
msg117371 - (view) Author: Kevin Hunter (hunteke) Date: 2010-09-25 14:26
> Well, first, this would only work for large objects. [...]
> Why do you think you might have such duplication in your workload?

Some of the projects with which I work involve multiple manipulations of large datasets.  Often, we use Python scripts as "first and third" stages in a pipeline.  For example, in one current workflow, we read a large file into a cStringIO object, do a few manipulations with it, pass it off to a second process, and await the results.  Meanwhile, the large file is sitting around in memory because we need to do more manipulations after we get results back from the second application in the pipeline.  "Graphically":

Python Script A    ->    External App    ->    Python Script A
read large data          process data          more manipulations

Within a single process, I don't see any gain to be had.  However, in this one use-case, this pipeline is running concurrently with a number of copies with slightly different command line parameters.
msg117372 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-09-25 14:31
> > Well, first, this would only work for large objects. [...]
> > Why do you think you might have such duplication in your workload?
> 
> Some of the projects with which I work involve multiple manipulations
> of large datasets.  Often, we use Python scripts as "first and third"
> stages in a pipeline.  For example, in one current workflow, we read a
> large file into a cStringIO object, do a few manipulations with it,
> pass it off to a second process, and await the results.

Why do you read it into a cStringIO? A cStringIO has the same interface
as a file, so you could simply operate on the file directly.

(you could also try mmap if you need quick random access to various
portions of the file)
msg117400 - (view) Author: Kevin Hunter (hunteke) Date: 2010-09-26 04:43
> Why do you read it into a cStringIO? A cStringIO has the same interface
> as a file, so you could simply operate on the file directly.

In that particular case, because it isn't actually a file.  That workflow was my attempt at simplification to illustrate a point.

I think the point is moot however, as I've gotten what I needed from this feature request/discussion.  Not one, but three Python developers seem opposed to the idea, or at least skeptical.  That's enough to tell me that my first-order supposition that Python objects could be MERGEABLE is not on target.

Cheers.
msg119729 - (view) Author: Konstantin Svist (Fry-kun) Date: 2010-10-27 20:06
This issue sounds very interesting to me for a somewhat different reason.
My problem is that I'm trying to run multiple processes on separate CPUs/cores with os.fork(). In short, the data set is the same (~2GB) and the separate processes do whatever they need, although each fork treats the data set as read-only.
Right after the fork, data is shared and fits in RAM nicely, but after a few minutes each child process runs over a bunch of the data set (thereby modifying the ref counters) and the data is copied for each process. RAM usage jumps from 15GB to 30GB and the advantage of a fork is gone.

It would be great if there was an option to separate out the ref counters for specific data structures, since it's obviously a bad idea to turn it on by default for everything and everyone.
msg119737 - (view) Author: Dave Malcolm (dmalcolm) (Python committer) Date: 2010-10-27 20:46
One possible use for this: mark the "str" buffers of PyUnicodeObject instances when demarshalling docstrings from disk; in theory these ought not to change, and can be quite large: the bulk of the memory overhead is stored in a separate allocation from the object, and thus isn't subjected to the ob_refcnt twiddling.

No idea if it's worth it though; the syscall overhead might slow down module import; also, KSM works at the level of 4K pages, and it's not clear that the allocations would line up nicely with pages.

FWIW, various related ideas here:
  http://dmalcolm.livejournal.com/4183.html
Again, no idea if these are worthwhile, this was a brainstorm on my blog, and some of the ideas would involve major surgery to CPython to implement.
msg136589 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011-05-23 04:54
In order to arrive at some resolution of this issue, I'm answering the original question ("Should Python enable a way for folks to inform the OS of MADV_MERGEABLE memory?"). The discussion has shown that the answer is "no"; there are no pages of memory where this would provide any advantage.

Closing as "won't fix". Anybody reopening it should

a) provide a patch with the actual change to be made, and
b) accompany it with a benchmark demonstrating some gain.
History
Date User Action Args
2011-05-23 04:54:20loewissetstatus: open -> closed

nosy: + loewis
messages: + msg136589

resolution: wont fix
2011-05-22 15:18:08s7v7nislandssetnosy: + s7v7nislands
2010-10-27 20:46:33dmalcolmsetnosy: + dmalcolm
messages: + msg119737
2010-10-27 20:06:31Fry-kunsetnosy: + Fry-kun
messages: + msg119729
2010-09-26 04:43:32huntekesetmessages: + msg117400
2010-09-25 14:31:35pitrousetmessages: + msg117372
2010-09-25 14:26:03huntekesetmessages: + msg117371
2010-09-25 10:34:44pitrousetnosy: + pitrou
messages: + msg117355
2010-09-25 10:26:45georg.brandlsetnosy: + georg.brandl
messages: + msg117353
2010-09-25 05:41:29huntekesetmessages: + msg117349
2010-09-24 18:04:05amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg117318
2010-09-24 17:52:38huntekecreate