Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subprocess close_fds behavior should only close open fds #52300

Closed
gpshead opened this issue Mar 4, 2010 · 31 comments
Closed

subprocess close_fds behavior should only close open fds #52300

gpshead opened this issue Mar 4, 2010 · 31 comments
Assignees
Labels
type-bug An unexpected behavior, bug, or error

Comments

@gpshead
Copy link
Member

gpshead commented Mar 4, 2010

BPO 8052
Nosy @loewis, @gpshead, @pitrou, @vstinner, @socketpair
Files
  • subprocess-close-open-fds-gps01.diff
  • subprocess-close-open-fds-gps02.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/gpshead'
    closed_at = <Date 2012-01-22.06:27:53.761>
    created_at = <Date 2010-03-04.03:17:56.815>
    labels = ['type-bug']
    title = 'subprocess close_fds behavior should only close open fds'
    updated_at = <Date 2014-03-19.14:49:11.698>
    user = 'https://github.com/gpshead'

    bugs.python.org fields:

    activity = <Date 2014-03-19.14:49:11.698>
    actor = 'vstinner'
    assignee = 'gregory.p.smith'
    closed = True
    closed_date = <Date 2012-01-22.06:27:53.761>
    closer = 'gregory.p.smith'
    components = []
    creation = <Date 2010-03-04.03:17:56.815>
    creator = 'gregory.p.smith'
    dependencies = []
    files = ['24255', '24256']
    hgrepos = []
    issue_num = 8052
    keywords = ['patch']
    message_count = 31.0
    messages = ['100373', '125278', '125285', '125361', '125388', '125389', '132297', '132303', '132307', '132308', '132318', '132355', '132530', '132532', '132573', '132669', '151284', '151285', '151286', '151292', '151420', '151421', '151746', '151750', '151752', '151761', '151762', '214095', '214096', '214098', '214099']
    nosy_count = 12.0
    nosy_names = ['loewis', 'gregory.p.smith', 'pitrou', 'vstinner', 'ferringb', 'jyasskin', 'nadeem.vawda', 's7v7nislands', 'neologix', 'rosslagerwall', 'socketpair', 'python-dev']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'needs patch'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue8052'
    versions = ['Python 3.1', 'Python 2.7', 'Python 3.2']

    @gpshead
    Copy link
    Member Author

    gpshead commented Mar 4, 2010

    The subprocess module's close_fds=True behavior is suboptimal. It gets the maximum fd number and calls close() on each possible fd in the child process.

    When you're running on large systems configured for 1million+ file descriptor limits per process.... This makes launching a subprocess uber painful.

    I'm in the middle of some posix subprocess fixing, i'll add this to my todo list and fix it up.

    @gpshead gpshead self-assigned this Mar 4, 2010
    @gpshead gpshead added the type-bug An unexpected behavior, bug, or error label Mar 4, 2010
    @pitrou
    Copy link
    Member

    pitrou commented Jan 4, 2011

    Is it still a problem now that there's a C path?
    Furthermore, how do you plan to get a list of open fds?

    @gpshead
    Copy link
    Member Author

    gpshead commented Jan 4, 2011

    its still a problem, even the C path has to call close() a million times in that case. thats a huge number of wasted syscalls. fixing this is blocking on a good way to get the list of open fds.

    I have seen other subprocess code do it using the race condition method of having the parent process get the list from os.listdir('/prod/self/fds') but that isn't guaranteed to get everything due to the race and we I don't believe we have enough posix async signal safe syscalls to do that in between the fork+exec.

    I heard talk of actual system calls to do this type of thing (getting a list of a processes open fds) being added to future Linux kernels but I have not followed up on that to see if any have or if they are what we need.

    @rosslagerwall
    Copy link
    Mannequin

    rosslagerwall mannequin commented Jan 4, 2011

    Does this mean that it's better to call the close() syscall 1000 or 1000 0000 times rather than listing the open fds & closing say a handful?

    On Linux, the listdir function results in an open() syscall, 2 or so getdents() calls and a close() call - could this not be done inbetween the fork & exec, even if the actual python function isn't called directly?

    @nadeemvawda
    Copy link
    Mannequin

    nadeemvawda mannequin commented Jan 5, 2011

    According to POSIX [1], if a multi-threaded program calls fork(), the child process may only use async-signal-safe system calls between fork() and exec*(). readdir() is not required to be async-safe [2], so reading /proc/self/fds in the child process is undefined behaviour. This is a pity, since it would IMO be a much cleaner solution than the current code.

    Of course, procfs isn't standard in any case; would it be necessary to have a fallback for systems without it? Or do all *nix systems that we care about provide it? In the former case, I suppose it might be possible to use the procfs on systems where readdir() is known to be safe, and use the fallback where it isn't. But such special cases would complicate the code rather than simplifying it...

    [1] http://pubs.opengroup.org/onlinepubs/009695399/functions/fork.html
    [2] http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_04.html

    @pitrou
    Copy link
    Member

    pitrou commented Jan 5, 2011

    Of course, procfs isn't standard in any case; would it be necessary to
    have a fallback for systems without it? Or do all *nix systems that we
    care about provide it?

    /proc/self doesn't exist under OpenSolaris. You have to use /proc/<pid>.
    And I suppose there can be circumstances where /proc isn't available at
    all (chroot perhaps?). So we would definitely have to provide a
    fallback.

    @rosslagerwall
    Copy link
    Mannequin

    rosslagerwall mannequin commented Mar 27, 2011

    See bpo-11284 (a duplicate) for more discussion about this issue.

    @vstinner
    Copy link
    Member

    Can we use FD_CLOEXEC to archive this goal? Example: open all files with FD_CLOEXEC set and don't close explicitly files on fork. preexec_fn will get access to the files, but the problem is exec(), not preexec_fn.

    I suppose that it will slow down programs not using subprocess, and it may have border effects.

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Mar 27, 2011

    If you're suggesting to set FDs CLOEXEC by default, I think it's neither possible nor reasonable:

    • you have to take into account not only files, but also pipes, sockets, etc
    • there's no portable way to e.g. open a file and set it CLOEXEC atomically
    • first and foremost, it' going to break a lot of existing code, for example, pipe + fork, accept + fork, etc
      As for the dedicated syscalls, there's already been some discussion about closefrom and friends, but Gregory did some research and it looked like those are not async-safe - which, if it's really the case, renders those calls mostly useless.

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Mar 27, 2011

    Ooops, it's of course not going to break code containing accept + fork or pipe + fork, you obviously also need an execve ;-)
    But the point is that you can't change the semantics of FDs being inheritable across an execve (think about inetd for example).

    @rosslagerwall
    Copy link
    Mannequin

    rosslagerwall mannequin commented Mar 27, 2011

    I don't think setting the cloexec flag is a viable solution, especially since fds can be opened in custom c modules.

    For what its worth, an strace of Java's Process class appears to "cheat" by opening /proc/self/fd inbetween fork & exec.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Mar 27, 2011

    Can we use FD_CLOEXEC to archive this goal?

    I think we should use FD_CLOEXEC in all places where it's reasonable.
    As others have pointed out, we shouldn't set FD_CLOEXEC for file
    descriptors where the application hasn't explicitly requested that.

    @nadeemvawda
    Copy link
    Mannequin

    nadeemvawda mannequin commented Mar 29, 2011

    For what its worth, an strace of Java's Process class appears to "cheat" by
    opening /proc/self/fd inbetween fork & exec.

    Looking at the OpenJDK source confirms this:
    http://hg.openjdk.java.net/jdk6/jdk6/jdk/file/tip/src/solaris/native/java/lang/UNIXProcess_md.c

    I wonder whether the Java people are simply unaware of the potential problem?
    Or perhaps they have checked the Linux and Solaris implementations of readdir()
    and confirmed that it is in fact safe on those platforms. Even if this is the
    case, I would be wary of doing things the same way - there's no guarantee that
    the implementation won't change out from underneath us.

    I like the approach suggested by Gregory in bpo-11284 (msg129912) - if we are
    sure the program is not multithreaded, get the list of fds before forking, and
    use that; otherwise, do it the slow way.

    On Linux, the multithreading check could be implemented by checking the number
    of subdirectories in /proc/<pid>/task/. I don't know if *BSD (or other Unices)
    have a similar way to do this. Some quick Googling seems to indicate that
    FreeBSD, at least, doesn't make this information available through the procfs
    (which is apparently deprecated anyway). Could someone with wider *nix
    knowledge chip in here?

    @vstinner
    Copy link
    Member

    I like the approach suggested by Gregory in bpo-11284 (msg129912) - if we are
    sure the program is not multithreaded, get the list of fds before forking, and
    use that; otherwise, do it the slow way.

    A signal handler can also opens new files :-)

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Mar 30, 2011

    I wonder whether the Java people are simply unaware of the potential problem?
    Or perhaps they have checked the Linux and Solaris implementations of readdir()
    and confirmed that it is in fact safe on those platforms. Even if this is the
    case, I would be wary of doing things the same way - there's no guarantee that
    the implementation won't change out from underneath us.

    The problem is not so much readdir: if you look at the source code
    (http://fxr.googlebit.com/source/lib/libc/gen/readdir.c), it doesn't
    do much apart from locking a mutex private to the related DIR *, so as
    long as you pass it a DIR * not referenced elsewhere (which would be
    the case since it would call opendir between the fork and exec), it
    should be ok. The man page
    (http://pubs.opengroup.org/onlinepubs/007908799/xsh/readdir.html) also
    makes it clear:
    """After a call to fork(), either the parent or child (but not both)
    may continue processing the directory stream using readdir(),
    rewinddir() or seekdir(). If both the parent and child processes use
    these functions, the result is undefined."""

    The problem is more with opendir, which needs to allocate memory for
    the struct dirent before calling getdents syscall.

    I agree with you, we should definitely favor correctness over efficiency.

    As for the other approach, I'm not aware of any portable way to
    determine if a program is multi-threaded. Also, as noted by Victor,
    there might be room for some subtle races (Python-registered signal
    handlers are called synchronously from the main eval loop with the GIL
    held, so I don't think there should be a problem there, but you might
    have a problem with C-extension registered signal handlers).

    Finally, looking at this thread
    http://lists.freebsd.org/pipermail/freebsd-hackers/2007-July/021132.html,
    it seems that some closefrom implementations are definitely not
    async-safe, which is a pity...

    @vstinner
    Copy link
    Member

    test_subprocess.test_leaking_fds_on_error takes more than 5 minutes on x86 FreeBSD 7.2 build slave! This tests creates 1,024 subprocesses, and subprocess has to close 655,000 file descriptors just to create on child process (whereas there are only something like 10 open files...).

    @ferringb
    Copy link
    Mannequin

    ferringb mannequin commented Jan 15, 2012

    In bpo-13788, I've uploaded a patch modifying closerange along the same lines as this discussion; someone w/ appropriate rights should set dependencies as needed.

    Either way, here's a question: does anyone actually know of a unix that does procfs, and has a daft opendir implementation as described below? Aka, are we actually worrying about something relevant, or just hypotheticals?

    Strikes me any scenario where this actually would occur, we're already in hell from broken implementations- meaning we probably don't support them. In the same angle, has anyone asked opengroup (forums, ml, etc), or moreso figured out *where* to ask for the given reasoning here?

    Regardless, if we're dead set on adhering to the standards there (and using re-entrant readdir_r and friends isn't enough to make people happy), a few hacks come to mind:

    1. in the child (child1), split a pipe, fork/exec (child2) an ls -l (or equivalent) of /proc/$PID/fd, feeding it back to child1 which then acts on it.
    2. grab the fd list pre-fork along w/ the link count for /proc/$PID/fd; child re-stats /proc/$PID/fd, if link count is the same, the results should be able to be acted upon. I'm *reasonably* sure there is a retarded FS or two out there that doesn't set link count for a directory correctly, but I've never heard of it for a procfs. Regardless, should be detectable- nlinks would be 0. If it is, and len(fds) != 0, then you know you can't trust the results and have to fallback to brute force close the range. Additionally, we ought to be able to test for this... so... score.

    Addressing: "signal handlers can open files". Yes, they can, and our current implementation doesn't handle that case /anyways/, so it's orthogonal to speeding up closerange.

    Finally, doing some codesearching, here's the rough list of heavy hitters spotted using this:
    *) java (as mentioned)
    *) chrome's seccomp sandbox uses it
    *) glibc's nscd uses it (pretty strong indication this is safe in that case to say the least)
    *) gdm
    *) pulseaudio (limited to linux)
    *) opensolaris actually does this, although from reading the code it sounds as if there is an issue w/ vfork- thus they use getdents directly. Look for spawn_closefrom for details.

    So.. seems a bit linuxy. Could possibly enable it just there (whitelist).

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Jan 15, 2012

    Either way, here's a question: does anyone actually know of a unix that does procfs, and has a daft opendir implementation as described below? Aka, are we actually worrying about something relevant, or just hypotheticals?

    I think it's more theoretical.
    Since dirent have per-struct locks, the only reason why
    opendir/readdir would not be async-safe would be because malloc() is
    not async-safe. Since we already allow running Python code after
    fork(), we implicitely assume that malloc() (and actually most of the
    libc) is async-safe, which is true in practice because malloc() uses
    pthread_atfork to reset its internal locks after fork().
    So IMHO, calling opendir() should be safe (and as noted, many code out
    there already does this).
    The only question is: do other Unix also have /proc/<pid>/fd? e.g.
    FreeBSD, OpenBSD. That's especially important because FreeBSD can have
    a huge RLIMIT_NOFILE by default.

    @ferringb
    Copy link
    Mannequin

    ferringb mannequin commented Jan 15, 2012

    The only question is: do other Unix also have /proc/<pid>/fd? e.g.
    FreeBSD, OpenBSD. That's especially important because FreeBSD can have
    a huge RLIMIT_NOFILE by default.

    Unless the OS gives some way to optimize the process (whether inferring from procfs, or making use of spawn_closefrom), there really isn't anything we can do. O_CLOEXEC is one option, but that's basically the same as the close loop in terms of syscalls- specifically post fork looping over the range and setting it. Beyond that, it's linux specific, so only would be used if the root python was invoked from lacked procfs.

    I'm willing to extend my original patch to handle alternate OS hints as needed; in the same way, the nlinks trick I can implement although I'd be more inclined to just limit my original closerange patch to OSs that have a sane opendir and procfs.

    @rosslagerwall
    Copy link
    Mannequin

    rosslagerwall mannequin commented Jan 15, 2012

    FreeBSD has a /dev/fd as well as a procfs (deprecated AFAIK).
    However, both may not be mounted so a patch would *need* to at least fallback to the current functionality.

    @gpshead
    Copy link
    Member Author

    gpshead commented Jan 17, 2012

    attaching a patch that implements this safely.

    @gpshead
    Copy link
    Member Author

    gpshead commented Jan 17, 2012

    attaching a better formatted one for review.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 21, 2012

    New changeset 61aa484a3e54 by Gregory P. Smith in branch '3.2':
    Fixes issue bpo-8052: The posix subprocess module's close_fds behavior was
    http://hg.python.org/cpython/rev/61aa484a3e54

    New changeset 8879874d66a2 by Gregory P. Smith in branch 'default':
    Fixes issue bpo-8052: The posix subprocess module's close_fds behavior was
    http://hg.python.org/cpython/rev/8879874d66a2

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 21, 2012

    New changeset 780992c9afea by Gregory P. Smith in branch '3.2':
    Add a Misc/NEWS entry for bpo-8052.
    http://hg.python.org/cpython/rev/780992c9afea

    New changeset 1f0a01dc723c by Gregory P. Smith in branch 'default':
    A Misc/NEWS entry for bpo-8052.
    http://hg.python.org/cpython/rev/1f0a01dc723c

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 21, 2012

    New changeset d0acd8169c2a by Gregory P. Smith in branch '3.2':
    Bugfix for issue bpo-8052 fix on *BSD variants.
    http://hg.python.org/cpython/rev/d0acd8169c2a

    New changeset 5be3dadd2eef by Gregory P. Smith in branch '3.2':
    Another issue bpo-8052 bugfix (related to previous commit).
    http://hg.python.org/cpython/rev/5be3dadd2eef

    New changeset e52d81e0c750 by Gregory P. Smith in branch 'default':
    bugfix for bpo-8052 fixes on *BSD platforms.
    http://hg.python.org/cpython/rev/e52d81e0c750

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 22, 2012

    New changeset 754c2eb0a92c by Gregory P. Smith in branch '3.2':
    Fix FreeBSD, NetBSD and OpenBSD behavior of the issue bpo-8052 fix.
    http://hg.python.org/cpython/rev/754c2eb0a92c

    New changeset 7d4658a8de96 by Gregory P. Smith in branch 'default':
    Fix FreeBSD, NetBSD and OpenBSD behavior of the issue bpo-8052 fix.
    http://hg.python.org/cpython/rev/7d4658a8de96

    @gpshead
    Copy link
    Member Author

    gpshead commented Jan 22, 2012

    For FreeBSD, Python 3.2 and 3.3 now check to see if /dev/fd is valid. Be sure and "mount -t fdescfs none /dev/fd" on FreeBSD if you want faster subprocess launching. Run a FreeBSD buildbot? Please do it!

    For Python 3.1 the fix for bpo-13788 would fix this, but I believe 3.1 is in security fix only mode at this point so we're not going to backport that os.closerange change that far.

    @gpshead gpshead closed this as completed Jan 22, 2012
    @socketpair
    Copy link
    Mannequin

    socketpair mannequin commented Mar 19, 2014

    Calling getdents()/readdir64() repeatedly while closing descriptors provides unexpected behaviour. Reading directory while it modified is not safe by default. For example: http://en.it-usenet.org/thread/18514/15719/.

    So, we should re-open directory if we received full array of structures. I don't know if just lseek(dirfd, 0) sufficies.

    Please reopen bug, as Linux behaviour of stable reading /proc/<pid>/fd may be broken in future without any error at python side (!) (typically, second call returns empty list if dir was modified)

    @vstinner
    Copy link
    Member

    "Calling getdents()/readdir64() repeatedly while closing descriptors provides unexpected behaviour."

    Please open a new issue.

    @socketpair
    Copy link
    Mannequin

    socketpair mannequin commented Mar 19, 2014

    Also, it is not said in manual if getdents() may be interrupted by signal. Assuming current code, error is not checked, so some (or all) descriptors will be skipped in case of error.

    @vstinner
    Copy link
    Member

    "Also, it is not said in manual if getdents() may be interrupted by signal. Assuming current code, error is not checked, so some (or all) descriptors will be skipped in case of error."

    This issue is closed, please open a new issue to discuss that.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants