This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: mmap.mmap() should not necessarily clone the file descriptor
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: ZackerySpytz, christian.heimes, josh.r, manuels
Priority: normal Keywords: patch

Created on 2018-08-02 13:22 by manuels, last changed 2022-04-11 14:59 by admin.

Pull Requests
URL Status Linked Edit
PR 25425 open ZackerySpytz, 2021-04-15 16:45
Messages (3)
msg322953 - (view) Author: Manuel (manuels) Date: 2018-08-02 13:22
mmap.mmap(fileno, length, flags, prot, access, offset) always clones the file descriptor that should be used [1].

The cloning of the file descriptor seems to be done to ensure that the file cannot be closed behind mmap's back, but if you are mmap()'ing a lot of memory regions of a file this can cause a 'Too many open files' error.

I would suggest to add an option to mmap.mmap() that tells it not to clone the file descriptor. This can cause an issue if the file is closed before accessing the mmapped region, so this fact should also be pointed out in the documentation.

[1] https://github.com/python/cpython/blob/master/Modules/mmapmodule.c#L1159
msg323069 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2018-08-03 20:34
Why would it "cause an issue if the file is closed before accessing the mmapped region"? As shown in your own link, the constructor performs the mmap call immediately after the descriptor is duplicated, with the GIL held; any race condition that could close the file before the mmap occurs could equally well close it before the descriptor is duplicated.

The possible issues aren't tied to accessing the memory (once the mapping has been performed, the file descriptor can be safely closed in general), but rather, to the size and resize methods of mmap objects (the former using the fd to fstat the file, the latter using it to ftruncate the file). As long as you don't use size/resize, nothing else depends on the file descriptor after construction has completed. The size method in particular seems like a strange wart on the API; it returns the total file size, not the size of the mapping (len(mapping) gets the size of the actual mapping).
msg386013 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-01-31 11:01
This issue came up in another discussion. I have given it some thought. mmap.mmap() dups the FD because its close() and __exit__() methods close(2) the fd. The size() and resize() methods use the fd to determine the size of the underlying file or to resize the file.

The easiest way to solve the issue while avoind footguns, is an option to not track the fd at all, e.g. "trackfd" with default "True". mmap(..., trackfd=False) would neither dup the fd nor store the fd in its internal struct. In untracked fd case, size() and resize() would no longer work. That's totally fine for mappings without PROT_WRITE.

It's safe to close the fd after mmap call, see https://man7.org/linux/man-pages/man2/mmap.2.html

> After the mmap() call has returned, the file descriptor, fd, can
> be closed immediately without invalidating the mapping.
History
Date User Action Args
2022-04-11 14:59:04adminsetgithub: 78502
2021-04-15 16:45:08ZackerySpytzsetkeywords: + patch
nosy: + ZackerySpytz

pull_requests: + pull_request24159
stage: patch review
2021-01-31 11:01:53christian.heimessetmessages: + msg386013
2021-01-30 08:04:05christian.heimessetnosy: + christian.heimes

versions: + Python 3.10, - Python 2.7, Python 3.4, Python 3.5, Python 3.6, Python 3.7, Python 3.8
2018-08-03 20:34:49josh.rsetnosy: + josh.r
messages: + msg323069
2018-08-02 13:22:03manuelscreate