Issue8052
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2010-03-04 03:17 by gregory.p.smith, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
subprocess-close-open-fds-gps01.diff | gregory.p.smith, 2012-01-17 02:02 | |||
subprocess-close-open-fds-gps02.diff | gregory.p.smith, 2012-01-17 02:08 |
Messages (31) | |||
---|---|---|---|
msg100373 - (view) | Author: Gregory P. Smith (gregory.p.smith) * | Date: 2010-03-04 03:17 | |
The subprocess module's close_fds=True behavior is suboptimal. It gets the maximum fd number and calls close() on each possible fd in the child process. When you're running on large systems configured for 1million+ file descriptor limits per process.... This makes launching a subprocess uber painful. I'm in the middle of some posix subprocess fixing, i'll add this to my todo list and fix it up. |
|||
msg125278 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2011-01-04 01:48 | |
Is it still a problem now that there's a C path? Furthermore, how do you plan to get a list of open fds? |
|||
msg125285 - (view) | Author: Gregory P. Smith (gregory.p.smith) * | Date: 2011-01-04 02:39 | |
its still a problem, even the C path has to call close() a million times in that case. thats a huge number of wasted syscalls. fixing this is blocking on a good way to get the list of open fds. I have seen other subprocess code do it using the race condition method of having the parent process get the list from os.listdir('/prod/self/fds') but that isn't guaranteed to get everything due to the race and we I don't believe we have enough posix async signal safe syscalls to do that in between the fork+exec. I heard talk of actual system calls to do this type of thing (getting a list of a processes open fds) being added to future Linux kernels but I have not followed up on that to see if any have or if they are what we need. |
|||
msg125361 - (view) | Author: Ross Lagerwall (rosslagerwall) | Date: 2011-01-04 20:06 | |
Does this mean that it's better to call the close() syscall 1000 or 1000 0000 times rather than listing the open fds & closing say a handful? On Linux, the listdir function results in an open() syscall, 2 or so getdents() calls and a close() call - could this not be done inbetween the fork & exec, even if the actual python function isn't called directly? |
|||
msg125388 - (view) | Author: Nadeem Vawda (nadeem.vawda) * | Date: 2011-01-05 00:54 | |
According to POSIX [1], if a multi-threaded program calls fork(), the child process may only use async-signal-safe system calls between fork() and exec*(). readdir() is not required to be async-safe [2], so reading /proc/self/fds in the child process is undefined behaviour. This is a pity, since it would IMO be a much cleaner solution than the current code. Of course, procfs isn't standard in any case; would it be necessary to have a fallback for systems without it? Or do all *nix systems that we care about provide it? In the former case, I suppose it might be possible to use the procfs on systems where readdir() is known to be safe, and use the fallback where it isn't. But such special cases would complicate the code rather than simplifying it... [1] http://pubs.opengroup.org/onlinepubs/009695399/functions/fork.html [2] http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_04.html |
|||
msg125389 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2011-01-05 01:01 | |
> Of course, procfs isn't standard in any case; would it be necessary to > have a fallback for systems without it? Or do all *nix systems that we > care about provide it? /proc/self doesn't exist under OpenSolaris. You have to use /proc/<pid>. And I suppose there can be circumstances where /proc isn't available at all (chroot perhaps?). So we would definitely have to provide a fallback. |
|||
msg132297 - (view) | Author: Ross Lagerwall (rosslagerwall) | Date: 2011-03-27 09:11 | |
See #11284 (a duplicate) for more discussion about this issue. |
|||
msg132303 - (view) | Author: STINNER Victor (vstinner) * | Date: 2011-03-27 10:14 | |
Can we use FD_CLOEXEC to archive this goal? Example: open all files with FD_CLOEXEC set and don't close explicitly files on fork. preexec_fn will get access to the files, but the problem is exec(), not preexec_fn. I suppose that it will slow down programs not using subprocess, and it may have border effects. |
|||
msg132307 - (view) | Author: Charles-François Natali (neologix) * | Date: 2011-03-27 11:00 | |
If you're suggesting to set FDs CLOEXEC by default, I think it's neither possible nor reasonable: - you have to take into account not only files, but also pipes, sockets, etc - there's no portable way to e.g. open a file and set it CLOEXEC atomically - first and foremost, it' going to break a lot of existing code, for example, pipe + fork, accept + fork, etc As for the dedicated syscalls, there's already been some discussion about closefrom and friends, but Gregory did some research and it looked like those are not async-safe - which, if it's really the case, renders those calls mostly useless. |
|||
msg132308 - (view) | Author: Charles-François Natali (neologix) * | Date: 2011-03-27 11:06 | |
Ooops, it's of course not going to break code containing accept + fork or pipe + fork, you obviously also need an execve ;-) But the point is that you can't change the semantics of FDs being inheritable across an execve (think about inetd for example). |
|||
msg132318 - (view) | Author: Ross Lagerwall (rosslagerwall) | Date: 2011-03-27 12:25 | |
I don't think setting the cloexec flag is a viable solution, especially since fds can be opened in custom c modules. For what its worth, an strace of Java's Process class appears to "cheat" by opening /proc/self/fd inbetween fork & exec. |
|||
msg132355 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2011-03-27 18:39 | |
> Can we use FD_CLOEXEC to archive this goal? I think we should use FD_CLOEXEC in all places where it's reasonable. As others have pointed out, we shouldn't set FD_CLOEXEC for file descriptors where the application hasn't explicitly requested that. |
|||
msg132530 - (view) | Author: Nadeem Vawda (nadeem.vawda) * | Date: 2011-03-29 21:28 | |
> For what its worth, an strace of Java's Process class appears to "cheat" by > opening /proc/self/fd inbetween fork & exec. Looking at the OpenJDK source confirms this: http://hg.openjdk.java.net/jdk6/jdk6/jdk/file/tip/src/solaris/native/java/lang/UNIXProcess_md.c I wonder whether the Java people are simply unaware of the potential problem? Or perhaps they have checked the Linux and Solaris implementations of readdir() and confirmed that it is in fact safe on those platforms. Even if this is the case, I would be wary of doing things the same way - there's no guarantee that the implementation won't change out from underneath us. I like the approach suggested by Gregory in issue11284 (msg129912) - if we are sure the program is not multithreaded, get the list of fds before forking, and use that; otherwise, do it the slow way. On Linux, the multithreading check could be implemented by checking the number of subdirectories in /proc/<pid>/task/. I don't know if *BSD (or other Unices) have a similar way to do this. Some quick Googling seems to indicate that FreeBSD, at least, doesn't make this information available through the procfs (which is apparently deprecated anyway). Could someone with wider *nix knowledge chip in here? |
|||
msg132532 - (view) | Author: STINNER Victor (vstinner) * | Date: 2011-03-29 21:32 | |
> I like the approach suggested by Gregory in issue11284 (msg129912) - if we are > sure the program is not multithreaded, get the list of fds before forking, and > use that; otherwise, do it the slow way. A signal handler can also opens new files :-) |
|||
msg132573 - (view) | Author: Charles-François Natali (neologix) * | Date: 2011-03-30 07:36 | |
> I wonder whether the Java people are simply unaware of the potential problem? > Or perhaps they have checked the Linux and Solaris implementations of readdir() > and confirmed that it is in fact safe on those platforms. Even if this is the > case, I would be wary of doing things the same way - there's no guarantee that > the implementation won't change out from underneath us. The problem is not so much readdir: if you look at the source code (http://fxr.googlebit.com/source/lib/libc/gen/readdir.c), it doesn't do much apart from locking a mutex private to the related DIR *, so as long as you pass it a DIR * not referenced elsewhere (which would be the case since it would call opendir between the fork and exec), it should be ok. The man page (http://pubs.opengroup.org/onlinepubs/007908799/xsh/readdir.html) also makes it clear: """After a call to fork(), either the parent or child (but not both) may continue processing the directory stream using readdir(), rewinddir() or seekdir(). If both the parent and child processes use these functions, the result is undefined.""" The problem is more with opendir, which needs to allocate memory for the struct dirent before calling getdents syscall. I agree with you, we should definitely favor correctness over efficiency. As for the other approach, I'm not aware of any portable way to determine if a program is multi-threaded. Also, as noted by Victor, there might be room for some subtle races (Python-registered signal handlers are called synchronously from the main eval loop with the GIL held, so I don't think there should be a problem there, but you might have a problem with C-extension registered signal handlers). Finally, looking at this thread http://lists.freebsd.org/pipermail/freebsd-hackers/2007-July/021132.html, it seems that some closefrom implementations are definitely not async-safe, which is a pity... |
|||
msg132669 - (view) | Author: STINNER Victor (vstinner) * | Date: 2011-03-31 13:33 | |
test_subprocess.test_leaking_fds_on_error takes more than 5 minutes on x86 FreeBSD 7.2 build slave! This tests creates 1,024 subprocesses, and subprocess has to close 655,000 file descriptors just to create on child process (whereas there are only something like 10 open files...). |
|||
msg151284 - (view) | Author: Ferringb (ferringb) * | Date: 2012-01-15 10:17 | |
In #13788, I've uploaded a patch modifying closerange along the same lines as this discussion; someone w/ appropriate rights should set dependencies as needed. Either way, here's a question: does anyone actually know of a unix that does procfs, and has a daft opendir implementation as described below? Aka, are we actually worrying about something relevant, or just hypotheticals? Strikes me any scenario where this actually would occur, we're already in hell from broken implementations- meaning we probably don't support them. In the same angle, has anyone asked opengroup (forums, ml, etc), or moreso figured out *where* to ask for the given reasoning here? Regardless, if we're dead set on adhering to the standards there (and using re-entrant readdir_r and friends isn't enough to make people happy), a few hacks come to mind: 1) in the child (child1), split a pipe, fork/exec (child2) an ls -l (or equivalent) of /proc/$PID/fd, feeding it back to child1 which then acts on it. 2) grab the fd list pre-fork along w/ the link count for /proc/$PID/fd; child re-stats /proc/$PID/fd, if link count is the same, the results should be able to be acted upon. I'm *reasonably* sure there is a retarded FS or two out there that doesn't set link count for a directory correctly, but I've never heard of it for a procfs. Regardless, should be detectable- nlinks would be 0. If it is, and len(fds) != 0, then you know you can't trust the results and have to fallback to brute force close the range. Additionally, we ought to be able to test for this... so... score. Addressing: "signal handlers can open files". Yes, they can, and our current implementation doesn't handle that case /anyways/, so it's orthogonal to speeding up closerange. Finally, doing some codesearching, here's the rough list of heavy hitters spotted using this: *) java (as mentioned) *) chrome's seccomp sandbox uses it *) glibc's nscd uses it (pretty strong indication this is safe in that case to say the least) *) gdm *) pulseaudio (limited to linux) *) opensolaris actually does this, although from reading the code it sounds as if there is an issue w/ vfork- thus they use getdents directly. Look for spawn_closefrom for details. So.. seems a bit linuxy. Could possibly enable it just there (whitelist). |
|||
msg151285 - (view) | Author: Charles-François Natali (neologix) * | Date: 2012-01-15 11:09 | |
> Either way, here's a question: does anyone actually know of a unix that does procfs, and has a daft opendir implementation as described below? Aka, are we actually worrying about something relevant, or just hypotheticals? I think it's more theoretical. Since dirent have per-struct locks, the only reason why opendir/readdir would not be async-safe would be because malloc() is not async-safe. Since we already allow running Python code after fork(), we implicitely assume that malloc() (and actually most of the libc) is async-safe, which is true in practice because malloc() uses pthread_atfork to reset its internal locks after fork(). So IMHO, calling opendir() should be safe (and as noted, many code out there already does this). The only question is: do other Unix also have /proc/<pid>/fd? e.g. FreeBSD, OpenBSD. That's especially important because FreeBSD can have a huge RLIMIT_NOFILE by default. |
|||
msg151286 - (view) | Author: Ferringb (ferringb) * | Date: 2012-01-15 12:02 | |
>The only question is: do other Unix also have /proc/<pid>/fd? e.g. >FreeBSD, OpenBSD. That's especially important because FreeBSD can have >a huge RLIMIT_NOFILE by default. Unless the OS gives some way to optimize the process (whether inferring from procfs, or making use of spawn_closefrom), there really isn't anything we can do. O_CLOEXEC is one option, but that's basically the same as the close loop in terms of syscalls- specifically post fork looping over the range and setting it. Beyond that, it's linux specific, so only would be used if the root python was invoked from lacked procfs. I'm willing to extend my original patch to handle alternate OS hints as needed; in the same way, the nlinks trick I can implement although I'd be more inclined to just limit my original closerange patch to OSs that have a sane opendir and procfs. |
|||
msg151292 - (view) | Author: Ross Lagerwall (rosslagerwall) | Date: 2012-01-15 16:46 | |
FreeBSD has a /dev/fd as well as a procfs (deprecated AFAIK). However, both may not be mounted so a patch would *need* to at least fallback to the current functionality. |
|||
msg151420 - (view) | Author: Gregory P. Smith (gregory.p.smith) * | Date: 2012-01-17 02:02 | |
attaching a patch that implements this safely. |
|||
msg151421 - (view) | Author: Gregory P. Smith (gregory.p.smith) * | Date: 2012-01-17 02:08 | |
attaching a better formatted one for review. |
|||
msg151746 - (view) | Author: Roundup Robot (python-dev) | Date: 2012-01-21 22:39 | |
New changeset 61aa484a3e54 by Gregory P. Smith in branch '3.2': Fixes issue #8052: The posix subprocess module's close_fds behavior was http://hg.python.org/cpython/rev/61aa484a3e54 New changeset 8879874d66a2 by Gregory P. Smith in branch 'default': Fixes issue #8052: The posix subprocess module's close_fds behavior was http://hg.python.org/cpython/rev/8879874d66a2 |
|||
msg151750 - (view) | Author: Roundup Robot (python-dev) | Date: 2012-01-21 22:51 | |
New changeset 780992c9afea by Gregory P. Smith in branch '3.2': Add a Misc/NEWS entry for issue 8052. http://hg.python.org/cpython/rev/780992c9afea New changeset 1f0a01dc723c by Gregory P. Smith in branch 'default': A Misc/NEWS entry for issue 8052. http://hg.python.org/cpython/rev/1f0a01dc723c |
|||
msg151752 - (view) | Author: Roundup Robot (python-dev) | Date: 2012-01-21 23:21 | |
New changeset d0acd8169c2a by Gregory P. Smith in branch '3.2': Bugfix for issue #8052 fix on *BSD variants. http://hg.python.org/cpython/rev/d0acd8169c2a New changeset 5be3dadd2eef by Gregory P. Smith in branch '3.2': Another issue #8052 bugfix (related to previous commit). http://hg.python.org/cpython/rev/5be3dadd2eef New changeset e52d81e0c750 by Gregory P. Smith in branch 'default': bugfix for issue 8052 fixes on *BSD platforms. http://hg.python.org/cpython/rev/e52d81e0c750 |
|||
msg151761 - (view) | Author: Roundup Robot (python-dev) | Date: 2012-01-22 05:06 | |
New changeset 754c2eb0a92c by Gregory P. Smith in branch '3.2': Fix FreeBSD, NetBSD and OpenBSD behavior of the issue #8052 fix. http://hg.python.org/cpython/rev/754c2eb0a92c New changeset 7d4658a8de96 by Gregory P. Smith in branch 'default': Fix FreeBSD, NetBSD and OpenBSD behavior of the issue #8052 fix. http://hg.python.org/cpython/rev/7d4658a8de96 |
|||
msg151762 - (view) | Author: Gregory P. Smith (gregory.p.smith) * | Date: 2012-01-22 06:27 | |
For FreeBSD, Python 3.2 and 3.3 now check to see if /dev/fd is valid. Be sure and "mount -t fdescfs none /dev/fd" on FreeBSD if you want faster subprocess launching. Run a FreeBSD buildbot? Please do it! For Python 3.1 the fix for #13788 would fix this, but I believe 3.1 is in security fix only mode at this point so we're not going to backport that os.closerange change that far. |
|||
msg214095 - (view) | Author: Марк Коренберг (socketpair) * | Date: 2014-03-19 14:22 | |
Calling getdents()/readdir64() repeatedly while closing descriptors provides unexpected behaviour. Reading directory while it modified is not safe by default. For example: http://en.it-usenet.org/thread/18514/15719/. So, we should re-open directory if we received full array of structures. I don't know if just lseek(dirfd, 0) sufficies. Please reopen bug, as Linux behaviour of stable reading /proc/<pid>/fd may be broken in future without any error at python side (!) (typically, second call returns empty list if dir was modified) |
|||
msg214096 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-03-19 14:30 | |
"Calling getdents()/readdir64() repeatedly while closing descriptors provides unexpected behaviour." Please open a new issue. |
|||
msg214098 - (view) | Author: Марк Коренберг (socketpair) * | Date: 2014-03-19 14:48 | |
Also, it is not said in manual if getdents() may be interrupted by signal. Assuming current code, error is not checked, so some (or all) descriptors will be skipped in case of error. |
|||
msg214099 - (view) | Author: STINNER Victor (vstinner) * | Date: 2014-03-19 14:49 | |
"Also, it is not said in manual if getdents() may be interrupted by signal. Assuming current code, error is not checked, so some (or all) descriptors will be skipped in case of error." This issue is closed, please open a new issue to discuss that. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:58 | admin | set | github: 52300 |
2014-03-19 14:49:11 | vstinner | set | messages: + msg214099 |
2014-03-19 14:48:05 | socketpair | set | messages: + msg214098 |
2014-03-19 14:30:23 | vstinner | set | messages: + msg214096 |
2014-03-19 14:22:56 | socketpair | set | nosy:
+ socketpair messages: + msg214095 |
2012-01-22 06:27:53 | gregory.p.smith | set | status: open -> closed resolution: fixed messages: + msg151762 |
2012-01-22 05:06:34 | python-dev | set | messages: + msg151761 |
2012-01-21 23:21:20 | python-dev | set | messages: + msg151752 |
2012-01-21 22:51:05 | python-dev | set | messages: + msg151750 |
2012-01-21 22:39:42 | python-dev | set | nosy:
+ python-dev messages: + msg151746 |
2012-01-17 02:08:57 | gregory.p.smith | set | files:
+ subprocess-close-open-fds-gps02.diff messages: + msg151421 |
2012-01-17 02:02:38 | gregory.p.smith | set | files:
+ subprocess-close-open-fds-gps01.diff keywords: + patch messages: + msg151420 |
2012-01-15 16:46:25 | rosslagerwall | set | messages: + msg151292 |
2012-01-15 15:25:49 | gregory.p.smith | unlink | issue13788 superseder |
2012-01-15 12:02:29 | ferringb | set | messages: + msg151286 |
2012-01-15 11:09:51 | neologix | set | messages: + msg151285 |
2012-01-15 10:17:26 | ferringb | set | nosy:
+ ferringb messages: + msg151284 |
2012-01-15 09:15:30 | neologix | link | issue13788 superseder |
2011-03-31 13:33:39 | vstinner | set | messages: + msg132669 |
2011-03-30 07:36:47 | neologix | set | messages: + msg132573 |
2011-03-29 21:32:32 | vstinner | set | messages: + msg132532 |
2011-03-29 21:28:16 | nadeem.vawda | set | messages: + msg132530 |
2011-03-27 18:39:36 | loewis | set | messages: + msg132355 |
2011-03-27 12:25:43 | rosslagerwall | set | messages: + msg132318 |
2011-03-27 11:06:15 | neologix | set | messages: + msg132308 |
2011-03-27 11:00:58 | neologix | set | messages: + msg132307 |
2011-03-27 10:14:17 | vstinner | set | messages: + msg132303 |
2011-03-27 09:11:55 | rosslagerwall | set | nosy:
+ loewis, vstinner, s7v7nislands, neologix messages: + msg132297 |
2011-01-05 01:01:23 | pitrou | set | nosy:
gregory.p.smith, pitrou, jyasskin, nadeem.vawda, rosslagerwall messages: + msg125389 |
2011-01-05 00:54:33 | nadeem.vawda | set | nosy:
+ nadeem.vawda messages: + msg125388 |
2011-01-04 20:06:15 | rosslagerwall | set | nosy:
+ rosslagerwall messages: + msg125361 |
2011-01-04 02:39:01 | gregory.p.smith | set | nosy:
gregory.p.smith, pitrou, jyasskin messages: + msg125285 |
2011-01-04 01:48:04 | pitrou | set | nosy:
+ pitrou messages: + msg125278 |
2010-08-05 03:35:11 | BreamoreBoy | set | stage: needs patch versions: + Python 3.1 |
2010-03-04 03:17:56 | gregory.p.smith | create |