classification
Title: os.closerange optimization
Type: performance Stage:
Components: Extension Modules Versions: Python 3.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: christian.heimes, ferringb, georg.brandl, gregory.p.smith, haypo, ronaldoussoren, rosslagerwall
Priority: normal Keywords: patch

Created on 2012-01-15 04:29 by ferringb, last changed 2013-11-23 19:31 by gregory.p.smith.

Files
File name Uploaded Description Edit
closerange-optimization.patch ferringb, 2012-01-15 04:43 for non-windows systems, if /proc/${PID}/fd is available make use of it review
Messages (9)
msg151273 - (view) Author: Ferringb (ferringb) * Date: 2012-01-15 04:29
The current implementation of closerange essentially is a bruteforce invocation of close for every integer in the range.

While this works, it's rather noisy for stracing, and for most invocations, is near a thousand close invocations more than needed.

As such it should be aware of /proc/${PID}/fd, and use that to isolate down just what is actually open, and close that.
msg151274 - (view) Author: Ferringb (ferringb) * Date: 2012-01-15 04:43
Fixed tabs/spaces...
msg151275 - (view) Author: Ross Lagerwall (rosslagerwall) (Python committer) Date: 2012-01-15 04:48
Thanks for the patch.

However, this cannot as far as I understand be used for the subprocess implementation due to the limitation of what can be called after a fork() and before an exec().

Take a look at #8052 for some more discussion of this.
msg151287 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2012-01-15 12:38
fwiw, s/MSDOS_WINDOWS/MS_WINDOWS/.
msg151289 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2012-01-15 15:25
Reopening.  Comments added to the code review.

This issue is independent of the subprocess module issue in #8052.  The _posixsubprocess.c has its own fd closing loop.

 http://hg.python.org/cpython/file/050c07b31192/Modules/_posixsubprocess.c#l118
msg192738 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2013-07-09 10:53
Two small technical comments:

1) I'd add a configure or compile-time check to determine if the procfs
   interface might be available. I don't like probing for features that
   you know are not available.

2) MacOSX has simular functionality using /dev/fd instead of
   /proc/${PID}/fd  (and other BSD systems might have this as well)
msg192739 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-07-09 11:30
In case someone is wondering if the approach really reduces the amount of syscalls: yes, it does. readdir() doesn't do a syscall for each entry. On Linux it uses the internal syscall getdents() to fill a buffer of directory entry structs. http://man7.org/linux/man-pages/man2/getdents.2.html

On my system os.listdir() does four syscalls:

$ strace python -c "import os; os.listdir('/home/heimes')"

openat(AT_FDCWD, "/home/heimes", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
getdents(3, /* 381 entries */, 32768)   = 12880
getdents(3, /* 0 entries */, 32768)     = 0
close(3)

On Linux you can also use /proc/self/fd instead of /proc/YOURPID/fd.

Other operating systems have different APIs to get a list of open FDs. AFAK /dev/fd is static on FreeBSD and Mac OS X:

FreeBSD:
  http://www.manualpages.de/FreeBSD/FreeBSD-7.4-RELEASE/man3/kinfo_getfile.3.html

Darwin / Mac OS X:
  proc_pidinfo()
msg192760 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2013-07-09 16:03
_posixsubprocess already uses the Linux getdent64 syscall when available (though for different reasons: readdir is not safe in that context). http://hg.python.org/cpython/file/3f3cbfd52f94/Modules/_posixsubprocess.c#l227

Probing for procfs at configure time could be problematic. It is a virtual filesystem. It is entirely possible for a system to choose not to mount it. It might be reasonable to assume that it "might be present" only if the system had it mounted at compile time but a configure flag to override that might be desirable for some systems (not the Linux systems I usually deal with).

If we're going through all of these hoops for closerange: I'd love to see an API exposed in the os module to return a list of open fd's. It is an abstraction nobody should have to write for themselves.
msg192776 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-07-09 20:16
FreeBSD and other OSes provide closefrom(). Why not exposing this function which is probably implemented as a single syscall?
History
Date User Action Args
2013-11-23 19:31:36gregory.p.smithsetassignee: gregory.p.smith ->
2013-07-09 20:16:54hayposetnosy: + haypo
messages: + msg192776
2013-07-09 16:03:57gregory.p.smithsetmessages: + msg192760
2013-07-09 11:31:00christian.heimessetmessages: + msg192739
2013-07-09 10:53:29ronaldoussorensetnosy: + ronaldoussoren
messages: + msg192738
2013-07-08 17:17:15christian.heimessetnosy: + christian.heimes
stage: resolved ->

versions: + Python 3.4
2012-01-15 15:25:49gregory.p.smithsetstatus: closed -> open

superseder: subprocess close_fds behavior should only close open fds ->
assignee: gregory.p.smith

nosy: + gregory.p.smith
messages: + msg151289
resolution: duplicate ->
2012-01-15 12:38:01georg.brandlsetnosy: + georg.brandl
messages: + msg151287
2012-01-15 09:15:30neologixsetstatus: open -> closed
resolution: duplicate
superseder: subprocess close_fds behavior should only close open fds
stage: resolved
2012-01-15 04:48:53rosslagerwallsetnosy: + rosslagerwall
messages: + msg151275
2012-01-15 04:43:56ferringbsetfiles: - closerange-optimization.patch
2012-01-15 04:43:26ferringbsetfiles: + closerange-optimization.patch

messages: + msg151274
2012-01-15 04:29:24ferringbcreate