New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mmap_object_dealloc calls time-consuming msync(), although close doesn't #46895
Comments
on unix it does call msync however. here is the relevant part from mmapmodule.c: static void
mmap_object_dealloc(mmap_object *m_obj)
{
#ifdef MS_WINDOWS
if (m_obj->data != NULL)
UnmapViewOfFile (m_obj->data);
if (m_obj->map_handle != INVALID_HANDLE_VALUE)
CloseHandle (m_obj->map_handle);
if (m_obj->file_handle != INVALID_HANDLE_VALUE)
CloseHandle (m_obj->file_handle);
if (m_obj->tagname)
PyMem_Free(m_obj->tagname);
#endif /* MS_WINDOWS */
#ifdef UNIX
if (m_obj->fd >= 0)
(void) close(m_obj->fd);
if (m_obj->data!=NULL) {
msync(m_obj->data, m_obj->size, MS_SYNC);
munmap(m_obj->data, m_obj->size);
}
#endif /* UNIX */
Py_TYPE(m_obj)->tp_free((PyObject*)m_obj);
} |
The close method does not call msync or FlushViewOfFile. |
Added the FlushViewOfFile calls, and an msync call to the close method. Not sure how to explicitly test this, if it's possible. Current tests pass on Windows, I'll need to check *NIX when I have the access later today. As for justification, from the UnmapViewOfFile docs[1]: "To minimize the risk of data loss in the event of a power failure or a system crash, applications should explicitly flush modified pages using the FlushViewOfFile function." [1] http://msdn.microsoft.com/en-us/library/aa366882%28VS.85%29.aspx |
tab/space issue, updated the patch |
A test could explicitly close a dirtied mmaped file and then open() it to check that everything was written, no? |
I don't think that calling msync() or FlushViewOfFile() when closing the mmap object or deallocating it is a good idea. os.fsync(fd)
Force write of file with filedescriptor fd to disk. On Unix, this calls the native fsync() function; on Windows, the MS _commit() function. If you’re starting with a Python file object f, first do f.flush(), and then do os.fsync(f.fileno()), to ensure that all internal buffers associated with f are written to disk. Availability: Unix, and Windows starting in 2.2.3. The reason is the same: fsync(), like msync(), is not usually what you want, because of latencies and performance penalties it incurs. So, for performance and consistency with files, I'd suggest to remove calls to msync() and FlushViewOfFile() from close() and dealloc().
The problem is that when you open() your file, you'll read the data from cache. You have no way to read the data directly from disk (well, there may be, but are higly non portable, like O_DIRECT file or raw IO). |
Alright, the current behaviour is quite strange:
we don't call msync() when closing the object, we just unmap() it:
mmap_close_method(mmap_object *self, PyObject *unused)
{
[...]
#ifdef UNIX
if (0 <= self->fd)
(void) close(self->fd);
self->fd = -1;
if (self->data != NULL) {
munmap(self->data, self->size);
self->data = NULL;
}
#endif
[...]
}
But we set self->data to NULL to avoid calling munmap() a second time when deallocating the object:
static void
mmap_object_dealloc(mmap_object *m_obj)
{
[ ... ]
#ifdef UNIX
if (m_obj->fd >= 0)
(void) close(m_obj->fd);
if (m_obj->data!=NULL) {
msync(m_obj->data, m_obj->size, MS_SYNC);
munmap(m_obj->data, m_obj->size);
}
#endif /* UNIX */
[ ...]
} So, if the object has been closed properly before being deallocated, msync() is _not_ called. The attached test script shows the _huge_ performance impact of msync: when both flush() and close() are called (msync() called): when neither is called, relying on the deallocation (msync() called): And a strace leaves no doubt (called 10 times in a loop) : See how expensive msync() is, and this is just for a 10MB file. So the attached patch (mmap_msync.diff) removes the call to msync from mmap_object_dealloc(). Since UnmapViewOfFile() is only called inside flush() method, nothing to remove for MS Windows. Here's the result of the same test script with the patch: when both flush() and close() are called (msync() called): when neither is called, relying on the deallocation (msync() not called): So we only get msync() latency when the user explicitely calls flush(). |
Any thoughts on this? |
It's a pity that flush() is defined like this. Ideally, if mmap claims to mimick ordinary file objects, flush() should be a no-op() and there should be a separate sync() method. On the other hand, your (Charles-François's) patch is already much better than the statu quo. On a sidenote, the mmap object has received a *lot* less attention during the years than the other IO primitives (especially file objects in 3.x). It should probably only be used for specialized cases. |
I notice that there's support in bpo-678250 for making flush() a no-op. |
I think Antoine misinterpreted my message. I do think that flush should continue to msync, except in cases where there really is no underlying file to sync to. It may (or may not) be unfortunate that the method is called flush, but nothing is gained by renaming it. I agree that calling msync on close/dealloc is not really necessary. The Windows version doesn't sync, either. |
Ok, thank you. I committed the patch in r84950. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: