Title: "mmap.flush()" is always synchronous, hurting performance
Type: enhancement Stage:
Components: Extension Modules Versions: Python 3.4
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: jcea, josh.r, neologix
Priority: normal Keywords: easy

Created on 2013-08-23 04:55 by jcea, last changed 2014-03-07 02:57 by josh.r.

Messages (3)
msg195941 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2013-08-23 04:55
Currently, "mmap.flush()" does a synchronous write to the backend file. The call will wait until data is actually flushed to disk, because internally it is doing a "msync(MS_SYNC)".

But the value of "mmap.flush()" is to synchronize file and memory. You don't need a synchronous write in the general case.

I propose to add an optional keyword parameter with default value "SYNC" (compatibility) but that can be "ASYNC", "INVALIDATE" (can be "SYNC|INVALIDATE" and "ASYNC|INVALIDATE" too).

I am talking about UNIX MMAP. No idea about Windows.

Check "man msync" for useful cases.
msg195948 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-08-23 07:48
> I propose to add an optional keyword parameter with default value "SYNC" (compatibility) but that can be "ASYNC", "INVALIDATE" (can be "SYNC|INVALIDATE" and "ASYNC|INVALIDATE" too).

AFAICT it's mostly useless on a modern OS.
MS_INVALIDATE is a no-op on systems with merged VM-buffer cache, i.e.
it's not needed for mmap() to reflect write() and vice-versa.

So nothing's normally needed to "synchronize file and memory".

As for MS_ASYNC, it actually doesn't do anything at all on recent OS,
e.g. it's a no-op on Linux since a couple years, since modified pages
will be written back as part of the normal writeback process.

The only thing a user might actually need for an mmap object is to
make sure data is actually committed to disk, and MS_SYNC covers this.

See e.g. this post by Andrew Morton:
msg195971 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2013-08-23 13:53
Depending of a concrete OS implementation is not good. Linux is not the only OS out there, and I have very old machines in production yet:

# uname -a
Linux 2.4.37 #4 Fri Dec 12 01:10:45 CET 2008 i686 unknown

I have been hit by the VM/file cache split in the past. Portability is important.

Anyway, the Python "mmap" manual says that "mmap.flush()" is needed to be sure that you are not going to "lose" changes you made in the mmap. On "modern" OSs it is not actually needed, as you say, and the performance hit is important enough for me to investigate and write this enhancement proposal :).
Date User Action Args
2014-03-07 02:57:27josh.rsetnosy: + josh.r
2013-08-23 13:53:06jceasetmessages: + msg195971
2013-08-23 07:48:45neologixsetnosy: + neologix
messages: + msg195948
2013-08-23 04:55:08jceacreate