subprocess close_fds behavior should only close open fds #52300

gpshead · 2010-03-04T03:17:57Z

BPO	8052
Nosy	@loewis, @gpshead, @pitrou, @vstinner, @socketpair
Files	subprocess-close-open-fds-gps01.diff subprocess-close-open-fds-gps02.diff

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/gpshead'
closed_at = <Date 2012-01-22.06:27:53.761>
created_at = <Date 2010-03-04.03:17:56.815>
labels = ['type-bug']
title = 'subprocess close_fds behavior should only close open fds'
updated_at = <Date 2014-03-19.14:49:11.698>
user = 'https://github.com/gpshead'

bugs.python.org fields:

activity = <Date 2014-03-19.14:49:11.698>
actor = 'vstinner'
assignee = 'gregory.p.smith'
closed = True
closed_date = <Date 2012-01-22.06:27:53.761>
closer = 'gregory.p.smith'
components = []
creation = <Date 2010-03-04.03:17:56.815>
creator = 'gregory.p.smith'
dependencies = []
files = ['24255', '24256']
hgrepos = []
issue_num = 8052
keywords = ['patch']
message_count = 31.0
messages = ['100373', '125278', '125285', '125361', '125388', '125389', '132297', '132303', '132307', '132308', '132318', '132355', '132530', '132532', '132573', '132669', '151284', '151285', '151286', '151292', '151420', '151421', '151746', '151750', '151752', '151761', '151762', '214095', '214096', '214098', '214099']
nosy_count = 12.0
nosy_names = ['loewis', 'gregory.p.smith', 'pitrou', 'vstinner', 'ferringb', 'jyasskin', 'nadeem.vawda', 's7v7nislands', 'neologix', 'rosslagerwall', 'socketpair', 'python-dev']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'needs patch'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue8052'
versions = ['Python 3.1', 'Python 2.7', 'Python 3.2']

gpshead · 2010-03-04T03:17:56Z

The subprocess module's close_fds=True behavior is suboptimal. It gets the maximum fd number and calls close() on each possible fd in the child process.

When you're running on large systems configured for 1million+ file descriptor limits per process.... This makes launching a subprocess uber painful.

I'm in the middle of some posix subprocess fixing, i'll add this to my todo list and fix it up.

pitrou · 2011-01-04T01:48:04Z

Is it still a problem now that there's a C path?
Furthermore, how do you plan to get a list of open fds?

gpshead · 2011-01-04T02:39:01Z

its still a problem, even the C path has to call close() a million times in that case. thats a huge number of wasted syscalls. fixing this is blocking on a good way to get the list of open fds.

I have seen other subprocess code do it using the race condition method of having the parent process get the list from os.listdir('/prod/self/fds') but that isn't guaranteed to get everything due to the race and we I don't believe we have enough posix async signal safe syscalls to do that in between the fork+exec.

I heard talk of actual system calls to do this type of thing (getting a list of a processes open fds) being added to future Linux kernels but I have not followed up on that to see if any have or if they are what we need.

rosslagerwall · 2011-01-04T20:06:16Z

Does this mean that it's better to call the close() syscall 1000 or 1000 0000 times rather than listing the open fds & closing say a handful?

On Linux, the listdir function results in an open() syscall, 2 or so getdents() calls and a close() call - could this not be done inbetween the fork & exec, even if the actual python function isn't called directly?

nadeemvawda · 2011-01-05T00:54:33Z

According to POSIX [1], if a multi-threaded program calls fork(), the child process may only use async-signal-safe system calls between fork() and exec*(). readdir() is not required to be async-safe [2], so reading /proc/self/fds in the child process is undefined behaviour. This is a pity, since it would IMO be a much cleaner solution than the current code.

Of course, procfs isn't standard in any case; would it be necessary to have a fallback for systems without it? Or do all *nix systems that we care about provide it? In the former case, I suppose it might be possible to use the procfs on systems where readdir() is known to be safe, and use the fallback where it isn't. But such special cases would complicate the code rather than simplifying it...

[1] http://pubs.opengroup.org/onlinepubs/009695399/functions/fork.html
[2] http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_04.html

pitrou · 2011-01-05T01:01:23Z

Of course, procfs isn't standard in any case; would it be necessary to
have a fallback for systems without it? Or do all *nix systems that we
care about provide it?

/proc/self doesn't exist under OpenSolaris. You have to use /proc/<pid>.
And I suppose there can be circumstances where /proc isn't available at
all (chroot perhaps?). So we would definitely have to provide a
fallback.

rosslagerwall · 2011-03-27T09:11:55Z

See bpo-11284 (a duplicate) for more discussion about this issue.

vstinner · 2011-03-27T10:14:18Z

Can we use FD_CLOEXEC to archive this goal? Example: open all files with FD_CLOEXEC set and don't close explicitly files on fork. preexec_fn will get access to the files, but the problem is exec(), not preexec_fn.

I suppose that it will slow down programs not using subprocess, and it may have border effects.

neologix · 2011-03-27T11:00:59Z

If you're suggesting to set FDs CLOEXEC by default, I think it's neither possible nor reasonable:

you have to take into account not only files, but also pipes, sockets, etc
there's no portable way to e.g. open a file and set it CLOEXEC atomically
first and foremost, it' going to break a lot of existing code, for example, pipe + fork, accept + fork, etc
As for the dedicated syscalls, there's already been some discussion about closefrom and friends, but Gregory did some research and it looked like those are not async-safe - which, if it's really the case, renders those calls mostly useless.

neologix · 2011-03-27T11:06:15Z

Ooops, it's of course not going to break code containing accept + fork or pipe + fork, you obviously also need an execve ;-)
But the point is that you can't change the semantics of FDs being inheritable across an execve (think about inetd for example).

rosslagerwall · 2011-03-27T12:25:43Z

I don't think setting the cloexec flag is a viable solution, especially since fds can be opened in custom c modules.

For what its worth, an strace of Java's Process class appears to "cheat" by opening /proc/self/fd inbetween fork & exec.

loewis · 2011-03-27T18:39:36Z

Can we use FD_CLOEXEC to archive this goal?

I think we should use FD_CLOEXEC in all places where it's reasonable.
As others have pointed out, we shouldn't set FD_CLOEXEC for file
descriptors where the application hasn't explicitly requested that.

nadeemvawda · 2011-03-29T21:28:16Z

For what its worth, an strace of Java's Process class appears to "cheat" by
opening /proc/self/fd inbetween fork & exec.

Looking at the OpenJDK source confirms this:
http://hg.openjdk.java.net/jdk6/jdk6/jdk/file/tip/src/solaris/native/java/lang/UNIXProcess_md.c

I wonder whether the Java people are simply unaware of the potential problem?
Or perhaps they have checked the Linux and Solaris implementations of readdir()
and confirmed that it is in fact safe on those platforms. Even if this is the
case, I would be wary of doing things the same way - there's no guarantee that
the implementation won't change out from underneath us.

I like the approach suggested by Gregory in bpo-11284 (msg129912) - if we are
sure the program is not multithreaded, get the list of fds before forking, and
use that; otherwise, do it the slow way.

On Linux, the multithreading check could be implemented by checking the number
of subdirectories in /proc/<pid>/task/. I don't know if *BSD (or other Unices)
have a similar way to do this. Some quick Googling seems to indicate that
FreeBSD, at least, doesn't make this information available through the procfs
(which is apparently deprecated anyway). Could someone with wider *nix
knowledge chip in here?

vstinner · 2011-03-29T21:32:33Z

I like the approach suggested by Gregory in bpo-11284 (msg129912) - if we are
sure the program is not multithreaded, get the list of fds before forking, and
use that; otherwise, do it the slow way.

A signal handler can also opens new files :-)

neologix · 2011-03-30T07:36:47Z

I wonder whether the Java people are simply unaware of the potential problem?
Or perhaps they have checked the Linux and Solaris implementations of readdir()
and confirmed that it is in fact safe on those platforms. Even if this is the
case, I would be wary of doing things the same way - there's no guarantee that
the implementation won't change out from underneath us.

The problem is not so much readdir: if you look at the source code
(http://fxr.googlebit.com/source/lib/libc/gen/readdir.c), it doesn't
do much apart from locking a mutex private to the related DIR *, so as
long as you pass it a DIR * not referenced elsewhere (which would be
the case since it would call opendir between the fork and exec), it
should be ok. The man page
(http://pubs.opengroup.org/onlinepubs/007908799/xsh/readdir.html) also
makes it clear:
"""After a call to fork(), either the parent or child (but not both)
may continue processing the directory stream using readdir(),
rewinddir() or seekdir(). If both the parent and child processes use
these functions, the result is undefined."""

The problem is more with opendir, which needs to allocate memory for
the struct dirent before calling getdents syscall.

I agree with you, we should definitely favor correctness over efficiency.

As for the other approach, I'm not aware of any portable way to
determine if a program is multi-threaded. Also, as noted by Victor,
there might be room for some subtle races (Python-registered signal
handlers are called synchronously from the main eval loop with the GIL
held, so I don't think there should be a problem there, but you might
have a problem with C-extension registered signal handlers).

Finally, looking at this thread
http://lists.freebsd.org/pipermail/freebsd-hackers/2007-July/021132.html,
it seems that some closefrom implementations are definitely not
async-safe, which is a pity...

vstinner · 2011-03-31T13:33:39Z

test_subprocess.test_leaking_fds_on_error takes more than 5 minutes on x86 FreeBSD 7.2 build slave! This tests creates 1,024 subprocesses, and subprocess has to close 655,000 file descriptors just to create on child process (whereas there are only something like 10 open files...).

ferringb · 2012-01-15T10:17:24Z

In bpo-13788, I've uploaded a patch modifying closerange along the same lines as this discussion; someone w/ appropriate rights should set dependencies as needed.

Either way, here's a question: does anyone actually know of a unix that does procfs, and has a daft opendir implementation as described below? Aka, are we actually worrying about something relevant, or just hypotheticals?

Strikes me any scenario where this actually would occur, we're already in hell from broken implementations- meaning we probably don't support them. In the same angle, has anyone asked opengroup (forums, ml, etc), or moreso figured out *where* to ask for the given reasoning here?

Regardless, if we're dead set on adhering to the standards there (and using re-entrant readdir_r and friends isn't enough to make people happy), a few hacks come to mind:

in the child (child1), split a pipe, fork/exec (child2) an ls -l (or equivalent) of /proc/$PID/fd, feeding it back to child1 which then acts on it.
grab the fd list pre-fork along w/ the link count for /proc/$PID/fd; child re-stats /proc/$PID/fd, if link count is the same, the results should be able to be acted upon. I'm *reasonably* sure there is a retarded FS or two out there that doesn't set link count for a directory correctly, but I've never heard of it for a procfs. Regardless, should be detectable- nlinks would be 0. If it is, and len(fds) != 0, then you know you can't trust the results and have to fallback to brute force close the range. Additionally, we ought to be able to test for this... so... score.

Addressing: "signal handlers can open files". Yes, they can, and our current implementation doesn't handle that case /anyways/, so it's orthogonal to speeding up closerange.

Finally, doing some codesearching, here's the rough list of heavy hitters spotted using this:
*) java (as mentioned)
*) chrome's seccomp sandbox uses it
*) glibc's nscd uses it (pretty strong indication this is safe in that case to say the least)
*) gdm
*) pulseaudio (limited to linux)
*) opensolaris actually does this, although from reading the code it sounds as if there is an issue w/ vfork- thus they use getdents directly. Look for spawn_closefrom for details.

So.. seems a bit linuxy. Could possibly enable it just there (whitelist).

neologix · 2012-01-15T11:09:51Z

Either way, here's a question: does anyone actually know of a unix that does procfs, and has a daft opendir implementation as described below? Aka, are we actually worrying about something relevant, or just hypotheticals?

I think it's more theoretical.
Since dirent have per-struct locks, the only reason why
opendir/readdir would not be async-safe would be because malloc() is
not async-safe. Since we already allow running Python code after
fork(), we implicitely assume that malloc() (and actually most of the
libc) is async-safe, which is true in practice because malloc() uses
pthread_atfork to reset its internal locks after fork().
So IMHO, calling opendir() should be safe (and as noted, many code out
there already does this).
The only question is: do other Unix also have /proc/<pid>/fd? e.g.
FreeBSD, OpenBSD. That's especially important because FreeBSD can have
a huge RLIMIT_NOFILE by default.

ferringb · 2012-01-15T12:02:29Z

The only question is: do other Unix also have /proc/<pid>/fd? e.g.
FreeBSD, OpenBSD. That's especially important because FreeBSD can have
a huge RLIMIT_NOFILE by default.

Unless the OS gives some way to optimize the process (whether inferring from procfs, or making use of spawn_closefrom), there really isn't anything we can do. O_CLOEXEC is one option, but that's basically the same as the close loop in terms of syscalls- specifically post fork looping over the range and setting it. Beyond that, it's linux specific, so only would be used if the root python was invoked from lacked procfs.

I'm willing to extend my original patch to handle alternate OS hints as needed; in the same way, the nlinks trick I can implement although I'd be more inclined to just limit my original closerange patch to OSs that have a sane opendir and procfs.

rosslagerwall · 2012-01-15T16:46:25Z

FreeBSD has a /dev/fd as well as a procfs (deprecated AFAIK).
However, both may not be mounted so a patch would *need* to at least fallback to the current functionality.

gpshead · 2012-01-17T02:02:34Z

attaching a patch that implements this safely.

gpshead · 2012-01-17T02:08:57Z

attaching a better formatted one for review.

python-dev · 2012-01-21T22:39:42Z

New changeset 61aa484a3e54 by Gregory P. Smith in branch '3.2':
Fixes issue bpo-8052: The posix subprocess module's close_fds behavior was
http://hg.python.org/cpython/rev/61aa484a3e54

New changeset 8879874d66a2 by Gregory P. Smith in branch 'default':
Fixes issue bpo-8052: The posix subprocess module's close_fds behavior was
http://hg.python.org/cpython/rev/8879874d66a2

python-dev · 2012-01-21T22:51:05Z

New changeset 780992c9afea by Gregory P. Smith in branch '3.2':
Add a Misc/NEWS entry for bpo-8052.
http://hg.python.org/cpython/rev/780992c9afea

New changeset 1f0a01dc723c by Gregory P. Smith in branch 'default':
A Misc/NEWS entry for bpo-8052.
http://hg.python.org/cpython/rev/1f0a01dc723c

python-dev · 2012-01-21T23:21:21Z

New changeset d0acd8169c2a by Gregory P. Smith in branch '3.2':
Bugfix for issue bpo-8052 fix on *BSD variants.
http://hg.python.org/cpython/rev/d0acd8169c2a

New changeset 5be3dadd2eef by Gregory P. Smith in branch '3.2':
Another issue bpo-8052 bugfix (related to previous commit).
http://hg.python.org/cpython/rev/5be3dadd2eef

New changeset e52d81e0c750 by Gregory P. Smith in branch 'default':
bugfix for bpo-8052 fixes on *BSD platforms.
http://hg.python.org/cpython/rev/e52d81e0c750

python-dev · 2012-01-22T05:06:34Z

New changeset 754c2eb0a92c by Gregory P. Smith in branch '3.2':
Fix FreeBSD, NetBSD and OpenBSD behavior of the issue bpo-8052 fix.
http://hg.python.org/cpython/rev/754c2eb0a92c

New changeset 7d4658a8de96 by Gregory P. Smith in branch 'default':
Fix FreeBSD, NetBSD and OpenBSD behavior of the issue bpo-8052 fix.
http://hg.python.org/cpython/rev/7d4658a8de96

gpshead · 2012-01-22T06:27:53Z

For FreeBSD, Python 3.2 and 3.3 now check to see if /dev/fd is valid. Be sure and "mount -t fdescfs none /dev/fd" on FreeBSD if you want faster subprocess launching. Run a FreeBSD buildbot? Please do it!

For Python 3.1 the fix for bpo-13788 would fix this, but I believe 3.1 is in security fix only mode at this point so we're not going to backport that os.closerange change that far.

socketpair · 2014-03-19T14:22:56Z

Calling getdents()/readdir64() repeatedly while closing descriptors provides unexpected behaviour. Reading directory while it modified is not safe by default. For example: http://en.it-usenet.org/thread/18514/15719/.

So, we should re-open directory if we received full array of structures. I don't know if just lseek(dirfd, 0) sufficies.

Please reopen bug, as Linux behaviour of stable reading /proc/<pid>/fd may be broken in future without any error at python side (!) (typically, second call returns empty list if dir was modified)

vstinner · 2014-03-19T14:30:24Z

"Calling getdents()/readdir64() repeatedly while closing descriptors provides unexpected behaviour."

Please open a new issue.

socketpair · 2014-03-19T14:48:05Z

Also, it is not said in manual if getdents() may be interrupted by signal. Assuming current code, error is not checked, so some (or all) descriptors will be skipped in case of error.

vstinner · 2014-03-19T14:49:12Z

"Also, it is not said in manual if getdents() may be interrupted by signal. Assuming current code, error is not checked, so some (or all) descriptors will be skipped in case of error."

This issue is closed, please open a new issue to discuss that.

gpshead self-assigned this Mar 4, 2010

gpshead added the type-bug An unexpected behavior, bug, or error label Mar 4, 2010

gpshead closed this as completed Jan 22, 2012

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

subprocess close_fds behavior should only close open fds #52300

subprocess close_fds behavior should only close open fds #52300

gpshead commented Mar 4, 2010

gpshead commented Mar 4, 2010

pitrou commented Jan 4, 2011

gpshead commented Jan 4, 2011

rosslagerwall mannequin commented Jan 4, 2011

nadeemvawda mannequin commented Jan 5, 2011

pitrou commented Jan 5, 2011

rosslagerwall mannequin commented Mar 27, 2011

vstinner commented Mar 27, 2011

neologix mannequin commented Mar 27, 2011

neologix mannequin commented Mar 27, 2011

rosslagerwall mannequin commented Mar 27, 2011

loewis mannequin commented Mar 27, 2011

nadeemvawda mannequin commented Mar 29, 2011

vstinner commented Mar 29, 2011

neologix mannequin commented Mar 30, 2011

vstinner commented Mar 31, 2011

ferringb mannequin commented Jan 15, 2012

neologix mannequin commented Jan 15, 2012

ferringb mannequin commented Jan 15, 2012

rosslagerwall mannequin commented Jan 15, 2012

gpshead commented Jan 17, 2012

gpshead commented Jan 17, 2012

python-dev mannequin commented Jan 21, 2012

python-dev mannequin commented Jan 21, 2012

python-dev mannequin commented Jan 21, 2012

python-dev mannequin commented Jan 22, 2012

gpshead commented Jan 22, 2012

socketpair mannequin commented Mar 19, 2014

vstinner commented Mar 19, 2014

socketpair mannequin commented Mar 19, 2014

vstinner commented Mar 19, 2014

subprocess close_fds behavior should only close open fds #52300

subprocess close_fds behavior should only close open fds #52300

Comments

gpshead commented Mar 4, 2010

gpshead commented Mar 4, 2010

pitrou commented Jan 4, 2011

gpshead commented Jan 4, 2011

rosslagerwall mannequin commented Jan 4, 2011

nadeemvawda mannequin commented Jan 5, 2011

pitrou commented Jan 5, 2011

rosslagerwall mannequin commented Mar 27, 2011

vstinner commented Mar 27, 2011

neologix mannequin commented Mar 27, 2011

neologix mannequin commented Mar 27, 2011

rosslagerwall mannequin commented Mar 27, 2011

loewis mannequin commented Mar 27, 2011

nadeemvawda mannequin commented Mar 29, 2011

vstinner commented Mar 29, 2011

neologix mannequin commented Mar 30, 2011

vstinner commented Mar 31, 2011

ferringb mannequin commented Jan 15, 2012

neologix mannequin commented Jan 15, 2012

ferringb mannequin commented Jan 15, 2012

rosslagerwall mannequin commented Jan 15, 2012

gpshead commented Jan 17, 2012

gpshead commented Jan 17, 2012

python-dev mannequin commented Jan 21, 2012

python-dev mannequin commented Jan 21, 2012

python-dev mannequin commented Jan 21, 2012

python-dev mannequin commented Jan 22, 2012

gpshead commented Jan 22, 2012

socketpair mannequin commented Mar 19, 2014

vstinner commented Mar 19, 2014

socketpair mannequin commented Mar 19, 2014

vstinner commented Mar 19, 2014