This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author neologix
Recipients brian.curtin, neologix, pitrou, schmir, trent
Date 2010-04-08.20:21:01
SpamBayes Score 1.1460055e-11
Marked as misclassified No
Message-id <1270758063.69.0.503818505594.issue2643@psf.upfronthosting.co.za>
In-reply-to
Content
Alright, the current behaviour is quite strange:
we don't call msync() when closing the object, we just unmap() it:
mmap_close_method(mmap_object *self, PyObject *unused)
{
[...]
#ifdef UNIX
        if (0 <= self->fd)
                (void) close(self->fd);
        self->fd = -1;
        if (self->data != NULL) {
                munmap(self->data, self->size);
                self->data = NULL;
        }
#endif
[...]
}

But we set self->data to NULL to avoid calling munmap() a second time when deallocating the object:
static void
mmap_object_dealloc(mmap_object *m_obj)
{
[ ... ]
#ifdef UNIX
        if (m_obj->fd >= 0)
                (void) close(m_obj->fd);
        if (m_obj->data!=NULL) {
                msync(m_obj->data, m_obj->size, MS_SYNC);
                munmap(m_obj->data, m_obj->size);
        }
#endif /* UNIX */
[ ...]
}

So, if the object has been closed properly before being deallocated, msync() is _not_ called.
But, if we don't close the object, then msync() is called.

The attached test script shows the _huge_ performance impact of msync:
when only close() is called (no msync()):
$ ./python /home/cf/test_mmap.py
0.35829615593

when both flush() and close() are called (msync() called):
$ ./python /home/cf/test_mmap.py
4.95999493599

when neither is called, relying on the deallocation (msync() called):
$ ./python /home/cf/test_mmap.py
4.8811671257

And a strace leaves no doubt (called 10 times in a loop) :
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb80b1000 <0.000019>
write(1, "4.12167286873\n"..., 144.12167286873
)      = 14 <0.000012>
close(3)                                = 0 <0.000010>
munmap(0xb80b2000, 4096)                = 0 <0.000023>
rt_sigaction(SIGINT, {SIG_DFL}, {0x811d630, [], 0}, 8) = 0 <0.000011>
close(5)                                = 0 <0.004889>
msync(0xb69f9000, 10000000, MS_SYNC)    = 0 <0.584054>
munmap(0xb69f9000, 10000000)            = 0 <0.000433>

See how expensive msync() is, and this is just for a 10MB file.

So the attached patch (mmap_msync.diff) removes the call to msync from mmap_object_dealloc(). Since UnmapViewOfFile() is only called inside flush() method, nothing to remove for MS Windows.

Here's the result of the same test script with the patch:
when only close() is called (no msync()):
$ ./python /home/cf/test_mmap.py
0.370584011078

when both flush() and close() are called (msync() called):
$ ./python /home/cf/test_mmap.py
4.97467517853

when neither is called, relying on the deallocation (msync() not called):
$ ./python /home/cf/test_mmap.py
0.390102148056

So we only get msync() latency when the user explicitely calls flush().
History
Date User Action Args
2010-04-08 20:21:03neologixsetrecipients: + neologix, pitrou, schmir, trent, brian.curtin
2010-04-08 20:21:03neologixsetmessageid: <1270758063.69.0.503818505594.issue2643@psf.upfronthosting.co.za>
2010-04-08 20:21:02neologixlinkissue2643 messages
2010-04-08 20:21:01neologixcreate