classification
Title: subprocess close_fds behavior should only close open fds
Type: behavior Stage: needs patch
Components: Versions: Python 3.1, Python 3.2, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: gregory.p.smith Nosy List: ferringb, gregory.p.smith, haypo, jyasskin, loewis, mmarkk, nadeem.vawda, neologix, pitrou, python-dev, rosslagerwall, s7v7nislands
Priority: normal Keywords: patch

Created on 2010-03-04 03:17 by gregory.p.smith, last changed 2014-03-19 14:49 by haypo. This issue is now closed.

Files
File name Uploaded Description Edit
subprocess-close-open-fds-gps01.diff gregory.p.smith, 2012-01-17 02:02
subprocess-close-open-fds-gps02.diff gregory.p.smith, 2012-01-17 02:08
Messages (31)
msg100373 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2010-03-04 03:17
The subprocess module's close_fds=True behavior is suboptimal.  It gets the maximum fd number and calls close() on each possible fd in the child process.

When you're running on large systems configured for 1million+ file descriptor limits per process.... This makes launching a subprocess uber painful.

I'm in the middle of some posix subprocess fixing, i'll add this to my todo list and fix it up.
msg125278 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-01-04 01:48
Is it still a problem now that there's a C path?
Furthermore, how do you plan to get a list of open fds?
msg125285 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2011-01-04 02:39
its still a problem, even the C path has to call close() a million times in that case.  thats a huge number of wasted syscalls.  fixing this is blocking on a good way to get the list of open fds.

I have seen other subprocess code do it using the race condition method of having the parent process get the list from os.listdir('/prod/self/fds') but that isn't guaranteed to get everything due to the race and we I don't believe we have enough posix async signal safe syscalls to do that in between the fork+exec.

I heard talk of actual system calls to do this type of thing (getting a list of a processes open fds) being added to future Linux kernels but I have not followed up on that to see if any have or if they are what we need.
msg125361 - (view) Author: Ross Lagerwall (rosslagerwall) (Python committer) Date: 2011-01-04 20:06
Does this mean that it's better to call the close() syscall 1000 or 1000 0000 times rather than listing the open fds & closing say a handful?

On Linux, the listdir function results in an open() syscall, 2 or so getdents() calls and a close() call - could this not be done inbetween the fork & exec, even if the actual python function isn't called directly?
msg125388 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2011-01-05 00:54
According to POSIX [1], if a multi-threaded program calls fork(), the child process may only use async-signal-safe system calls between fork() and exec*(). readdir() is not required to be async-safe [2], so reading /proc/self/fds in the child process is undefined behaviour. This is a pity, since it would IMO be a much cleaner solution than the current code.

Of course, procfs isn't standard in any case; would it be necessary to have a fallback for systems without it? Or do all *nix systems that we care about provide it? In the former case, I suppose it might be possible to use the procfs on systems where readdir() is known to be safe, and use the fallback where it isn't. But such special cases would complicate the code rather than simplifying it...

[1] http://pubs.opengroup.org/onlinepubs/009695399/functions/fork.html
[2] http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_04.html
msg125389 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-01-05 01:01
> Of course, procfs isn't standard in any case; would it be necessary to
> have a fallback for systems without it? Or do all *nix systems that we
> care about provide it?

/proc/self doesn't exist under OpenSolaris. You have to use /proc/<pid>.
And I suppose there can be circumstances where /proc isn't available at
all (chroot perhaps?). So we would definitely have to provide a
fallback.
msg132297 - (view) Author: Ross Lagerwall (rosslagerwall) (Python committer) Date: 2011-03-27 09:11
See #11284 (a duplicate) for more discussion about this issue.
msg132303 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-03-27 10:14
Can we use FD_CLOEXEC to archive this goal? Example: open all files with FD_CLOEXEC set and don't close explicitly files on fork. preexec_fn will get access to the files, but the problem is exec(), not preexec_fn.

I suppose that it will slow down programs not using subprocess, and it may have border effects.
msg132307 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-03-27 11:00
If you're suggesting to set FDs CLOEXEC by default, I think it's neither possible nor reasonable:
- you have to take into account not only files, but also pipes, sockets, etc
- there's no portable way to e.g. open a file and set it CLOEXEC atomically
- first and foremost, it' going to break a lot of existing code, for example, pipe + fork, accept + fork, etc
As for the dedicated syscalls, there's already been some discussion about closefrom and friends, but Gregory did some research and it looked like those are not async-safe - which, if it's really the case, renders those calls mostly useless.
msg132308 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-03-27 11:06
Ooops, it's of course not going to break code containing accept + fork or pipe + fork, you obviously also need an execve ;-)
But the point is that you can't change the semantics of FDs being inheritable across an execve (think about inetd for example).
msg132318 - (view) Author: Ross Lagerwall (rosslagerwall) (Python committer) Date: 2011-03-27 12:25
I don't think setting the cloexec flag is a viable solution, especially since fds can be opened in custom c modules.

For what its worth, an strace of Java's Process class appears to "cheat" by opening /proc/self/fd inbetween fork & exec.
msg132355 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011-03-27 18:39
> Can we use FD_CLOEXEC to archive this goal? 

I think we should use FD_CLOEXEC in all places where it's reasonable.
As others have pointed out, we shouldn't set FD_CLOEXEC for file
descriptors where the application hasn't explicitly requested that.
msg132530 - (view) Author: Nadeem Vawda (nadeem.vawda) * (Python committer) Date: 2011-03-29 21:28
> For what its worth, an strace of Java's Process class appears to "cheat" by
> opening /proc/self/fd inbetween fork & exec.

Looking at the OpenJDK source confirms this:
http://hg.openjdk.java.net/jdk6/jdk6/jdk/file/tip/src/solaris/native/java/lang/UNIXProcess_md.c

I wonder whether the Java people are simply unaware of the potential problem?
Or perhaps they have checked the Linux and Solaris implementations of readdir()
and confirmed that it is in fact safe on those platforms. Even if this is the
case, I would be wary of doing things the same way - there's no guarantee that
the implementation won't change out from underneath us.

I like the approach suggested by Gregory in issue11284 (msg129912) - if we are
sure the program is not multithreaded, get the list of fds before forking, and
use that; otherwise, do it the slow way.

On Linux, the multithreading check could be implemented by checking the number
of subdirectories in /proc/<pid>/task/. I don't know if *BSD (or other Unices)
have a similar way to do this. Some quick Googling seems to indicate that
FreeBSD, at least, doesn't make this information available through the procfs
(which is apparently deprecated anyway). Could someone with wider *nix
knowledge chip in here?
msg132532 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-03-29 21:32
> I like the approach suggested by Gregory in issue11284 (msg129912) - if we are
> sure the program is not multithreaded, get the list of fds before forking, and
> use that; otherwise, do it the slow way.

A signal handler can also opens new files :-)
msg132573 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-03-30 07:36
> I wonder whether the Java people are simply unaware of the potential problem?
> Or perhaps they have checked the Linux and Solaris implementations of readdir()
> and confirmed that it is in fact safe on those platforms. Even if this is the
> case, I would be wary of doing things the same way - there's no guarantee that
> the implementation won't change out from underneath us.

The problem is not so much readdir: if you look at the source code
(http://fxr.googlebit.com/source/lib/libc/gen/readdir.c), it doesn't
do much apart from locking a mutex private to the related DIR *, so as
long as you pass it a DIR * not referenced elsewhere (which would be
the case since it would call opendir between the fork and exec), it
should be ok. The man page
(http://pubs.opengroup.org/onlinepubs/007908799/xsh/readdir.html) also
makes it clear:
"""After a call to fork(), either the parent or child (but not both)
may continue processing the directory stream using readdir(),
rewinddir() or seekdir().  If both the parent and child processes use
these functions, the result is undefined."""

The problem is more with opendir, which needs to allocate memory for
the struct dirent before calling getdents syscall.

I agree with you, we should definitely favor correctness over efficiency.

As for the other approach, I'm not aware of any portable way to
determine if a program is multi-threaded. Also, as noted by Victor,
there might be room for some subtle races (Python-registered signal
handlers are called synchronously from the main eval loop with the GIL
held, so I don't think there should be a problem there, but you might
have a problem with C-extension registered signal handlers).

Finally, looking at this thread
http://lists.freebsd.org/pipermail/freebsd-hackers/2007-July/021132.html,
it seems that some closefrom implementations are definitely not
async-safe, which is a pity...
msg132669 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-03-31 13:33
test_subprocess.test_leaking_fds_on_error takes more than 5 minutes on x86 FreeBSD 7.2 build slave! This tests creates 1,024 subprocesses, and subprocess has to close 655,000 file descriptors just to create on child process (whereas there are only something like 10 open files...).
msg151284 - (view) Author: Ferringb (ferringb) * Date: 2012-01-15 10:17
In #13788, I've uploaded a patch modifying closerange along the same lines as this discussion; someone w/ appropriate rights should set dependencies as needed.

Either way, here's a question: does anyone actually know of a unix that does procfs, and has a daft opendir implementation as described below?  Aka, are we actually worrying about something relevant, or just hypotheticals?

Strikes me any scenario where this actually would occur, we're already in hell from broken implementations- meaning we probably don't support them.  In the same angle, has anyone asked opengroup (forums, ml, etc), or moreso figured out *where* to ask for the given reasoning here?

Regardless, if we're dead set on adhering to the standards there (and using re-entrant readdir_r and friends isn't enough to make people happy), a few hacks come to mind:

1) in the child (child1), split a pipe, fork/exec (child2) an ls -l (or equivalent) of /proc/$PID/fd, feeding it back to child1 which then acts on it.
2) grab the fd list pre-fork along w/ the link count for /proc/$PID/fd; child re-stats /proc/$PID/fd, if link count is the same, the results should be able to be acted upon.  I'm *reasonably* sure there is a retarded FS or two out there that doesn't set link count for a directory correctly, but I've never heard of it for a procfs.  Regardless, should be detectable- nlinks would be 0.  If it is, and len(fds) != 0, then you know you can't trust the results and have to fallback to brute force close the range.  Additionally, we ought to be able to test for this... so... score.

Addressing: "signal handlers can open files".  Yes, they can, and our current implementation doesn't handle that case /anyways/, so it's orthogonal to speeding up closerange.

Finally, doing some codesearching, here's the rough list of heavy hitters spotted using this:
*) java (as mentioned)
*) chrome's seccomp sandbox uses it
*) glibc's nscd uses it (pretty strong indication this is safe in that case to say the least)
*) gdm
*) pulseaudio (limited to linux)
*) opensolaris actually does this, although from reading the code it sounds as if there is an issue w/ vfork- thus they use getdents directly.  Look for spawn_closefrom for details.

So.. seems a bit linuxy.  Could possibly enable it just there (whitelist).
msg151285 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2012-01-15 11:09
> Either way, here's a question: does anyone actually know of a unix that does procfs, and has a daft opendir implementation as described below?  Aka, are we actually worrying about something relevant, or just hypotheticals?

I think it's more theoretical.
Since dirent have per-struct locks, the only reason why
opendir/readdir would not be async-safe would be because malloc() is
not async-safe. Since we already allow running Python code after
fork(), we implicitely assume that malloc() (and actually most of the
libc) is async-safe, which is true in practice because malloc() uses
pthread_atfork to reset its internal locks after fork().
So IMHO, calling opendir() should be safe (and as noted, many code out
there already does this).
The only question is: do other Unix also have /proc/<pid>/fd? e.g.
FreeBSD, OpenBSD. That's especially important because FreeBSD can have
a huge RLIMIT_NOFILE by default.
msg151286 - (view) Author: Ferringb (ferringb) * Date: 2012-01-15 12:02
>The only question is: do other Unix also have /proc/<pid>/fd? e.g.
>FreeBSD, OpenBSD. That's especially important because FreeBSD can have
>a huge RLIMIT_NOFILE by default.

Unless the OS gives some way to optimize the process (whether inferring from procfs, or making use of spawn_closefrom), there really isn't anything we can do.  O_CLOEXEC is one option, but that's basically the same as the close loop in terms of syscalls- specifically post fork looping over the range and setting it.  Beyond that, it's linux specific, so only would be used if the root python was invoked from lacked procfs.

I'm willing to extend my original patch to handle alternate OS hints as needed; in the same way, the nlinks trick I can implement although I'd be more inclined to just limit my original closerange patch to OSs that have a sane opendir and procfs.
msg151292 - (view) Author: Ross Lagerwall (rosslagerwall) (Python committer) Date: 2012-01-15 16:46
FreeBSD has a /dev/fd as well as a procfs (deprecated AFAIK).
However, both may not be mounted so a patch would *need* to at least fallback to the current functionality.
msg151420 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2012-01-17 02:02
attaching a patch that implements this safely.
msg151421 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2012-01-17 02:08
attaching a better formatted one for review.
msg151746 - (view) Author: Roundup Robot (python-dev) Date: 2012-01-21 22:39
New changeset 61aa484a3e54 by Gregory P. Smith in branch '3.2':
Fixes issue #8052: The posix subprocess module's close_fds behavior was
http://hg.python.org/cpython/rev/61aa484a3e54

New changeset 8879874d66a2 by Gregory P. Smith in branch 'default':
Fixes issue #8052: The posix subprocess module's close_fds behavior was
http://hg.python.org/cpython/rev/8879874d66a2
msg151750 - (view) Author: Roundup Robot (python-dev) Date: 2012-01-21 22:51
New changeset 780992c9afea by Gregory P. Smith in branch '3.2':
Add a Misc/NEWS entry for issue 8052.
http://hg.python.org/cpython/rev/780992c9afea

New changeset 1f0a01dc723c by Gregory P. Smith in branch 'default':
A Misc/NEWS entry for issue 8052.
http://hg.python.org/cpython/rev/1f0a01dc723c
msg151752 - (view) Author: Roundup Robot (python-dev) Date: 2012-01-21 23:21
New changeset d0acd8169c2a by Gregory P. Smith in branch '3.2':
Bugfix for issue #8052 fix on *BSD variants.
http://hg.python.org/cpython/rev/d0acd8169c2a

New changeset 5be3dadd2eef by Gregory P. Smith in branch '3.2':
Another issue #8052 bugfix (related to previous commit).
http://hg.python.org/cpython/rev/5be3dadd2eef

New changeset e52d81e0c750 by Gregory P. Smith in branch 'default':
bugfix for issue 8052 fixes on *BSD platforms.
http://hg.python.org/cpython/rev/e52d81e0c750
msg151761 - (view) Author: Roundup Robot (python-dev) Date: 2012-01-22 05:06
New changeset 754c2eb0a92c by Gregory P. Smith in branch '3.2':
Fix FreeBSD, NetBSD and OpenBSD behavior of the issue #8052 fix.
http://hg.python.org/cpython/rev/754c2eb0a92c

New changeset 7d4658a8de96 by Gregory P. Smith in branch 'default':
Fix FreeBSD, NetBSD and OpenBSD behavior of the issue #8052 fix.
http://hg.python.org/cpython/rev/7d4658a8de96
msg151762 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2012-01-22 06:27
For FreeBSD, Python 3.2 and 3.3 now check to see if /dev/fd is valid.  Be sure and "mount -t fdescfs none /dev/fd" on FreeBSD if you want faster subprocess launching.  Run a FreeBSD buildbot?  Please do it!

For Python 3.1 the fix for #13788 would fix this, but I believe 3.1 is in security fix only mode at this point so we're not going to backport that os.closerange change that far.
msg214095 - (view) Author: Марк Коренберг (mmarkk) * Date: 2014-03-19 14:22
Calling getdents()/readdir64() repeatedly while closing descriptors provides unexpected behaviour. Reading directory while it modified is not safe by default. For example: http://en.it-usenet.org/thread/18514/15719/. 

So, we should re-open directory if we received full array of structures. I don't know if just lseek(dirfd, 0) sufficies.

Please reopen bug, as Linux behaviour of stable reading /proc/<pid>/fd may be broken in future without any error at python side (!) (typically, second call returns empty list if dir was modified)
msg214096 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-03-19 14:30
"Calling getdents()/readdir64() repeatedly while closing descriptors provides unexpected behaviour."

Please open a new issue.
msg214098 - (view) Author: Марк Коренберг (mmarkk) * Date: 2014-03-19 14:48
Also, it is not said in manual if getdents() may be interrupted by signal. Assuming current code, error is not checked, so some (or all) descriptors will be skipped in case of error.
msg214099 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-03-19 14:49
"Also, it is not said in manual if getdents() may be interrupted by signal. Assuming current code, error is not checked, so some (or all) descriptors will be skipped in case of error."

This issue is closed, please open a new issue to discuss that.
History
Date User Action Args
2014-03-19 14:49:11hayposetmessages: + msg214099
2014-03-19 14:48:05mmarkksetmessages: + msg214098
2014-03-19 14:30:23hayposetmessages: + msg214096
2014-03-19 14:22:56mmarkksetnosy: + mmarkk
messages: + msg214095
2012-01-22 06:27:53gregory.p.smithsetstatus: open -> closed
resolution: fixed
messages: + msg151762
2012-01-22 05:06:34python-devsetmessages: + msg151761
2012-01-21 23:21:20python-devsetmessages: + msg151752
2012-01-21 22:51:05python-devsetmessages: + msg151750
2012-01-21 22:39:42python-devsetnosy: + python-dev
messages: + msg151746
2012-01-17 02:08:57gregory.p.smithsetfiles: + subprocess-close-open-fds-gps02.diff

messages: + msg151421
2012-01-17 02:02:38gregory.p.smithsetfiles: + subprocess-close-open-fds-gps01.diff
keywords: + patch
messages: + msg151420
2012-01-15 16:46:25rosslagerwallsetmessages: + msg151292
2012-01-15 15:25:49gregory.p.smithunlinkissue13788 superseder
2012-01-15 12:02:29ferringbsetmessages: + msg151286
2012-01-15 11:09:51neologixsetmessages: + msg151285
2012-01-15 10:17:26ferringbsetnosy: + ferringb
messages: + msg151284
2012-01-15 09:15:30neologixlinkissue13788 superseder
2011-03-31 13:33:39hayposetmessages: + msg132669
2011-03-30 07:36:47neologixsetmessages: + msg132573
2011-03-29 21:32:32hayposetmessages: + msg132532
2011-03-29 21:28:16nadeem.vawdasetmessages: + msg132530
2011-03-27 18:39:36loewissetmessages: + msg132355
2011-03-27 12:25:43rosslagerwallsetmessages: + msg132318
2011-03-27 11:06:15neologixsetmessages: + msg132308
2011-03-27 11:00:58neologixsetmessages: + msg132307
2011-03-27 10:14:17hayposetmessages: + msg132303
2011-03-27 09:11:55rosslagerwallsetnosy: + loewis, haypo, s7v7nislands, neologix
messages: + msg132297
2011-01-05 01:01:23pitrousetnosy: gregory.p.smith, pitrou, jyasskin, nadeem.vawda, rosslagerwall
messages: + msg125389
2011-01-05 00:54:33nadeem.vawdasetnosy: + nadeem.vawda
messages: + msg125388
2011-01-04 20:06:15rosslagerwallsetnosy: + rosslagerwall
messages: + msg125361
2011-01-04 02:39:01gregory.p.smithsetnosy: gregory.p.smith, pitrou, jyasskin
messages: + msg125285
2011-01-04 01:48:04pitrousetnosy: + pitrou
messages: + msg125278
2010-08-05 03:35:11BreamoreBoysetstage: needs patch
versions: + Python 3.1
2010-03-04 03:17:56gregory.p.smithcreate