classification
Title: UNIX mmap unnecessarily dup() file descriptor
Type: resource usage Stage: resolved
Components: Library (Lib) Versions: Python 3.3
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: SilentGhost, eric.araujo, georg.brandl, josiahcarlson, lorenz, neologix, pitrou, schmir
Priority: normal Keywords: patch

Created on 2011-01-12 15:16 by lorenz, last changed 2013-08-03 10:31 by neologix. This issue is now closed.

Files
File name Uploaded Description Edit
no_dup.patch lorenz, 2011-01-12 15:16 Remove the call to cup() in Modules/mmapmodule.c review
mmap.diff SilentGhost, 2011-01-12 18:58 review
no_dup_no_close.py3k.patch lorenz, 2011-01-12 20:21 Remove dup() and close() in Modules/mmapmodule.c, against py3k branch review
Messages (11)
msg126104 - (view) Author: Lorenz Huedepohl (lorenz) Date: 2011-01-12 15:16
The UNIX mmap version in

  Modules/mmapmodule.c 

makes a call to dup() to duplicate the file descriptor that is passed for creating the memory-mapped region.

This way, I frequently hit the limit for the number of open file handles while creating mmap.mmap() instances, despite closing all my opened files after creating a mapping for them.

My application is scientific data (read: "large data" :-) analysis for which mmap() is very well suited together with numpy/scipy - however, the large number of files causes me to hit the resource limit on open file handles)

I propose to remove this dup(), which was apparently introduced in the process of fixing issue #728515, concerning incorrect behavior of the mmap.resize() method on UNIX, as it was feared the user could have closed the file handle already.

I think it should be the responsibility of the user not to close the file in question or either - if it needs to be closed - not to call the resize method later.

I should stress that a call to mmap(2) does not require an open file handle during the time of the mapping but that a duplicate of the file handle was only kept to allow .size() and .resize() to work. See the POSIX Programmer's Manual:

  The mmap() function shall add an extra reference to the file
  associated with the file descriptor fildes which is not removed
  by a subsequent close() on that file descriptor. This reference
  shall be removed when there are no more mappings to the file.

It would be great if this little nicety would translate better to Python!

Regards,
   Lorenz
msg126115 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-01-12 17:33
Thanks for the report and patch.  Can you refresh it against the py3k branch?  Bugs are fixed there and then backported to 3.1 and 2.7.  If possible, please include a test in your patch.
msg126126 - (view) Author: SilentGhost (SilentGhost) * (Python triager) Date: 2011-01-12 18:58
version of the lorenz's patch agains py3k branch.
msg126127 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-01-12 19:09
The problem is that it's a change in semantics. So it definitely can't be backported to the bugfix branches, nor committed into 3.2 which is in feature freeze now.

The question is which behaviour is the most expected by users of the module. I'd say that dup()ing definitely isn't intuitive, and it isn't documented; on the other hand, at least one of the examples seems to assume the original file descriptor is untouched when close()ing the mmap object:

with open("hello.txt", "r+b") as f:
    # memory-map the file, size 0 means whole file
    map = mmap.mmap(f.fileno(), 0)
    # read content via standard file methods
    print(map.readline())  # prints b"Hello Python!\n"
    # read content via slice notation
    print(map[:5])  # prints b"Hello"
    # update content using slice notation;
    # note that new content must have same size
    map[6:] = b" world!\n"
    # ... and read again using standard file methods
    map.seek(0)
    print(map.readline())  # prints b"Hello  world!\n"
    # close the map
    map.close()
msg126134 - (view) Author: Lorenz Huedepohl (lorenz) Date: 2011-01-12 20:21
Thanks SilentGhost, you were much faster than me :)

I agree with Antoine, that the dup() is unexpected: It cost me some time to figure out where these additonal file descriptors originated from. One of the powers of mmap() is, that you do not need one, so it would really be great if this feature could be saved into the python abstraction.

As for the example: I think this could be cured by removing the close(m_obj->fd); statement, again shifing the responsibility for that to the creator of the intial file handle - see my attached patch.
(This time against the py3k patch)

BTW: It is a great experience for a python user to see just how fast you guys react to an issue here! Thanks!
msg138016 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-06-09 17:06
Trying to revive this issue.
I'm +1 on this change: duping the file descriptor is definitely unexpected and has some side effects.
I think we should remove the call to dup and close, it's really the user's responsibility to avoid closing the FD.
If the FD is closed when mmap_resize is called, ftruncate will fail with EBADF, without harm.
The real problem is more if the file has been closed and another one got open, with the same FD.
If we wanted to be paranoid and avoid truncating/re-mapping another file, we could store the struct stat's st_ino field when the mapping is created, and re-check its value in mmap_resize (and if we wanted to be really paranoid we could also store st_dev, because inodes are unique only within a given file system).
msg138100 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-06-10 16:15
Well, I agree on the principle, but it is a change in behaviour and would probably break some user code...
Perhaps by adding some new argument to the mmap constructor? (dup_fd = True)
msg138167 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-06-11 16:40
> Perhaps by adding some new argument to the mmap constructor? (dup_fd = True)

I don't really like the idea of exposing the FD duplication to the
user, because:
- it's an implementation detail
- it doesn't reflect any argument of the POSIX mmap version
- it should False by default

I really think that application code expecting to be able to resize
the mmap after closing the FD is broken, but I'm also a really strong
advocate of backward compatibility...
I don't see any satisfying solution.
msg194168 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-08-02 09:14
I changed my mind, and think we should keep the current behavior: it guarantees resize() won't break, and doesn't cost much. One extra FD per mmap is not a huge hit. If that's a problem, then the user can raise the RLIMIT_NOFILE.
msg194172 - (view) Author: Lorenz Huedepohl (lorenz) Date: 2013-08-02 09:56
> If that's a problem, then the user can raise the RLIMIT_NOFILE.

This can only be raised (above the hard limit) by a privileged process,
so I would be out of luck there, as I could not convince my sysadmins
to raise this further.

Anyway, I don't really care anymore, I use my own, patched mmap module
since I filed this bug. I don't quite understand why this module made
an attempt to convert a simple mmap() to something more complex, with
hidden state and resize() methods and whatnot :) - but it has been done,
and I understand that you have to maintain backwards compatibility.

Still, it would be nice if there was some way to get the result of a
plain and simple mmap() as a buffer.

Meanwhile, I will just use my own module, so feel free to close this if
you feel like it :)
msg194244 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-08-03 10:31
> This can only be raised (above the hard limit) by a privileged
> process, so I would be out of luck there, as I could not convince
> my sysadmins to raise this further.

We all know that feeling :-)

> Meanwhile, I will just use my own module, so feel free to close
> this if you feel like it :)

OK, I'll close it then, since I don't see any satisfactory solution, and - until now - you've been the only user to get bitten.

Cheers.
History
Date User Action Args
2013-08-03 10:31:43neologixsetstatus: open -> closed
resolution: wont fix
messages: + msg194244

stage: patch review -> resolved
2013-08-02 09:56:46lorenzsetmessages: + msg194172
2013-08-02 09:14:29neologixsetmessages: + msg194168
2011-06-14 13:21:25facundobatistasetnosy: - facundobatista
2011-06-11 16:40:13neologixsetmessages: + msg138167
2011-06-10 16:15:26pitrousetmessages: + msg138100
2011-06-09 17:06:27neologixsetnosy: + neologix
messages: + msg138016
2011-01-13 14:44:23schmirsetnosy: + schmir
2011-01-12 20:21:58lorenzsetfiles: + no_dup_no_close.py3k.patch
nosy: georg.brandl, facundobatista, josiahcarlson, pitrou, eric.araujo, SilentGhost, lorenz
messages: + msg126134
2011-01-12 19:09:45pitrousetnosy: georg.brandl, facundobatista, josiahcarlson, pitrou, eric.araujo, SilentGhost, lorenz
messages: + msg126127
versions: + Python 3.3, - Python 3.1, Python 2.7, Python 3.2
2011-01-12 18:58:32SilentGhostsetfiles: + mmap.diff
nosy: georg.brandl, facundobatista, josiahcarlson, pitrou, eric.araujo, SilentGhost, lorenz
messages: + msg126126
2011-01-12 17:58:21SilentGhostsetnosy: + SilentGhost
2011-01-12 17:33:46eric.araujosetversions: - Python 2.6, Python 2.5, Python 3.3
nosy: + pitrou, josiahcarlson, eric.araujo, georg.brandl, facundobatista

messages: + msg126115

stage: patch review
2011-01-12 15:16:15lorenzcreate