New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
subprocess close_fds behavior should only close open fds #52300
Comments
The subprocess module's close_fds=True behavior is suboptimal. It gets the maximum fd number and calls close() on each possible fd in the child process. When you're running on large systems configured for 1million+ file descriptor limits per process.... This makes launching a subprocess uber painful. I'm in the middle of some posix subprocess fixing, i'll add this to my todo list and fix it up. |
Is it still a problem now that there's a C path? |
its still a problem, even the C path has to call close() a million times in that case. thats a huge number of wasted syscalls. fixing this is blocking on a good way to get the list of open fds. I have seen other subprocess code do it using the race condition method of having the parent process get the list from os.listdir('/prod/self/fds') but that isn't guaranteed to get everything due to the race and we I don't believe we have enough posix async signal safe syscalls to do that in between the fork+exec. I heard talk of actual system calls to do this type of thing (getting a list of a processes open fds) being added to future Linux kernels but I have not followed up on that to see if any have or if they are what we need. |
Does this mean that it's better to call the close() syscall 1000 or 1000 0000 times rather than listing the open fds & closing say a handful? On Linux, the listdir function results in an open() syscall, 2 or so getdents() calls and a close() call - could this not be done inbetween the fork & exec, even if the actual python function isn't called directly? |
According to POSIX [1], if a multi-threaded program calls fork(), the child process may only use async-signal-safe system calls between fork() and exec*(). readdir() is not required to be async-safe [2], so reading /proc/self/fds in the child process is undefined behaviour. This is a pity, since it would IMO be a much cleaner solution than the current code. Of course, procfs isn't standard in any case; would it be necessary to have a fallback for systems without it? Or do all *nix systems that we care about provide it? In the former case, I suppose it might be possible to use the procfs on systems where readdir() is known to be safe, and use the fallback where it isn't. But such special cases would complicate the code rather than simplifying it... [1] http://pubs.opengroup.org/onlinepubs/009695399/functions/fork.html |
/proc/self doesn't exist under OpenSolaris. You have to use /proc/<pid>. |
See bpo-11284 (a duplicate) for more discussion about this issue. |
Can we use FD_CLOEXEC to archive this goal? Example: open all files with FD_CLOEXEC set and don't close explicitly files on fork. preexec_fn will get access to the files, but the problem is exec(), not preexec_fn. I suppose that it will slow down programs not using subprocess, and it may have border effects. |
If you're suggesting to set FDs CLOEXEC by default, I think it's neither possible nor reasonable:
|
Ooops, it's of course not going to break code containing accept + fork or pipe + fork, you obviously also need an execve ;-) |
I don't think setting the cloexec flag is a viable solution, especially since fds can be opened in custom c modules. For what its worth, an strace of Java's Process class appears to "cheat" by opening /proc/self/fd inbetween fork & exec. |
I think we should use FD_CLOEXEC in all places where it's reasonable. |
Looking at the OpenJDK source confirms this: I wonder whether the Java people are simply unaware of the potential problem? I like the approach suggested by Gregory in bpo-11284 (msg129912) - if we are On Linux, the multithreading check could be implemented by checking the number |
A signal handler can also opens new files :-) |
The problem is not so much readdir: if you look at the source code The problem is more with opendir, which needs to allocate memory for I agree with you, we should definitely favor correctness over efficiency. As for the other approach, I'm not aware of any portable way to Finally, looking at this thread |
test_subprocess.test_leaking_fds_on_error takes more than 5 minutes on x86 FreeBSD 7.2 build slave! This tests creates 1,024 subprocesses, and subprocess has to close 655,000 file descriptors just to create on child process (whereas there are only something like 10 open files...). |
In bpo-13788, I've uploaded a patch modifying closerange along the same lines as this discussion; someone w/ appropriate rights should set dependencies as needed. Either way, here's a question: does anyone actually know of a unix that does procfs, and has a daft opendir implementation as described below? Aka, are we actually worrying about something relevant, or just hypotheticals? Strikes me any scenario where this actually would occur, we're already in hell from broken implementations- meaning we probably don't support them. In the same angle, has anyone asked opengroup (forums, ml, etc), or moreso figured out *where* to ask for the given reasoning here? Regardless, if we're dead set on adhering to the standards there (and using re-entrant readdir_r and friends isn't enough to make people happy), a few hacks come to mind:
Addressing: "signal handlers can open files". Yes, they can, and our current implementation doesn't handle that case /anyways/, so it's orthogonal to speeding up closerange. Finally, doing some codesearching, here's the rough list of heavy hitters spotted using this: So.. seems a bit linuxy. Could possibly enable it just there (whitelist). |
I think it's more theoretical. |
Unless the OS gives some way to optimize the process (whether inferring from procfs, or making use of spawn_closefrom), there really isn't anything we can do. O_CLOEXEC is one option, but that's basically the same as the close loop in terms of syscalls- specifically post fork looping over the range and setting it. Beyond that, it's linux specific, so only would be used if the root python was invoked from lacked procfs. I'm willing to extend my original patch to handle alternate OS hints as needed; in the same way, the nlinks trick I can implement although I'd be more inclined to just limit my original closerange patch to OSs that have a sane opendir and procfs. |
FreeBSD has a /dev/fd as well as a procfs (deprecated AFAIK). |
attaching a patch that implements this safely. |
attaching a better formatted one for review. |
New changeset 61aa484a3e54 by Gregory P. Smith in branch '3.2': New changeset 8879874d66a2 by Gregory P. Smith in branch 'default': |
New changeset 780992c9afea by Gregory P. Smith in branch '3.2': New changeset 1f0a01dc723c by Gregory P. Smith in branch 'default': |
New changeset d0acd8169c2a by Gregory P. Smith in branch '3.2': New changeset 5be3dadd2eef by Gregory P. Smith in branch '3.2': New changeset e52d81e0c750 by Gregory P. Smith in branch 'default': |
New changeset 754c2eb0a92c by Gregory P. Smith in branch '3.2': New changeset 7d4658a8de96 by Gregory P. Smith in branch 'default': |
For FreeBSD, Python 3.2 and 3.3 now check to see if /dev/fd is valid. Be sure and "mount -t fdescfs none /dev/fd" on FreeBSD if you want faster subprocess launching. Run a FreeBSD buildbot? Please do it! For Python 3.1 the fix for bpo-13788 would fix this, but I believe 3.1 is in security fix only mode at this point so we're not going to backport that os.closerange change that far. |
Calling getdents()/readdir64() repeatedly while closing descriptors provides unexpected behaviour. Reading directory while it modified is not safe by default. For example: http://en.it-usenet.org/thread/18514/15719/. So, we should re-open directory if we received full array of structures. I don't know if just lseek(dirfd, 0) sufficies. Please reopen bug, as Linux behaviour of stable reading /proc/<pid>/fd may be broken in future without any error at python side (!) (typically, second call returns empty list if dir was modified) |
"Calling getdents()/readdir64() repeatedly while closing descriptors provides unexpected behaviour." Please open a new issue. |
Also, it is not said in manual if getdents() may be interrupted by signal. Assuming current code, error is not checked, so some (or all) descriptors will be skipped in case of error. |
"Also, it is not said in manual if getdents() may be interrupted by signal. Assuming current code, error is not checked, so some (or all) descriptors will be skipped in case of error." This issue is closed, please open a new issue to discuss that. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: