Title: wrong behavior with fork and mmap
Components: Library (Lib) Versions: Python 2.7
Assigned To: Nosy List: btiplitz, r.david.murray, vstinner
Created on 2013-12-23 21:05 by btiplitz, last changed 2022-04-11 14:57 by admin.

Messages (5)
msg206871 - (view) Author: Brett Tiplitz (btiplitz) Date: 2013-12-23 21:05
When running the example mmap library (with a slight modification, plus I did not handle all the changes for the 3.3 string handling as the example posted does not work with 3.x) 

When looking at the subprocess, the spawned process will have all the mmap'd file descriptors open.  The spawned process has the responsibility of closing any FD's that are in use.  However, since the shared memory segment get's closed and the program has no knowledge of private FD's, the mmap's private FD becomes a leak in the FD table.  It seems python should set the close-on-exec attribute on the dup'd FD that it maintains.  Examples of fixing this issue are found on
import mmap,os

# write a simple example file
with open("hello.txt", "wb") as f:
    f.write(bytes("Hello Python!\n", 'UTF-8'))

with open("hello.txt", "r+b") as f:
    # memory-map the file, size 0 means whole file
    os.system("/bin/ls -l /proc/"+str(os.getpid())+"/fd")

    mm = mmap.mmap(f.fileno(), 0)
    os.system("/bin/ls -l /proc/"+str(os.getpid())+"/fd")
    os.system("/bin/ls -l /proc/self/fd")

    # read content via standard file methods
    t1 = mm.readline() # used to print out
  # prints "Hello Python!"
    # read content via slice notation
#    print mm[:5]  # prints "Hello"
    # update content using slice notation;
    # note that new content must have same size
    mm[6:] = bytes(" world!\n", 'UTF-8')
    # ... and read again using standard file methods
 #   print mm.readline()  # prints "Hello  world!"
    # close the map
msg206872 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-12-23 21:13
It seems very likely that this is addressed by PEP 446.  Since that is not a behavior change that can be backported, I think this issue should probably be closed as out of date.
msg206876 - (view) Author: Brett Tiplitz (btiplitz) Date: 2013-12-23 21:37
Changing the code to["/bin/ls", "-l", "/proc/self/fd"])
and running this on Python 3.3 does show this as being resolved by the broader fix implemented in PEP 446.  It does seem bad that the os.system call remains in place with bad behavior as I know it's widely used.
msg206877 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-12-23 22:04
This issue is not specific to mmap. Many other functions and libraries may
use private inheritable file descriptors. Python 3.4 does not fix the issue
for third party libraries.

os.system() must be avoided, use instead. It avoids an
useless shell process and closes all fds by default.

Is it a documentation issue?
msg206878 - (view) Author: Brett Tiplitz (btiplitz) Date: 2013-12-23 22:10
Man page currently says as follows: (this does not says it's deprecated or that files have to be closed on exec)...  So I'd think some more comments would help. And as mentioned, which a user can close his own fd's, the mmap call creates a special problem since the user can't work around the issue cleanly though fixed in the subprocess calls.


    Execute the command (a string) in a subshell. This is implemented by calling the Standard C function system(), and has the same limitations. Changes to sys.stdin, etc. are not reflected in the environment of the executed command.

    On Unix, the return value is the exit status of the process encoded in the format specified for wait(). Note that POSIX does not specify the meaning of the return value of the C system() function, so the return value of the Python function is system-dependent.

    On Windows, the return value is that returned by the system shell after running command, given by the Windows environment variable COMSPEC: on systems (Windows 95, 98 and ME) this is always 0; on cmd.exe systems (Windows NT, 2000 and XP) this is the exit status of the command run; on systems using a non-native shell, consult your shell documentation.

    The subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function. See the Replacing Older Functions with the subprocess Module section in the subprocess documentation for some helpful recipes.

    Availability: Unix, Windows.
