classification
Title: FreeBSD: Optimize subprocess.Popen(close_fds=True) using closefrom()
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: koobs, vstinner
Priority: normal Keywords:

Created on 2019-09-09 07:25 by vstinner, last changed 2020-02-10 23:41 by vstinner.

Messages (13)
msg351351 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-09-09 07:25
The default value of subprocess.Popen "close_fds" parameter changed to True in Python 3. Compared to Python 2, close_fds=True can make Popen 10x slower: see bpo-37790.

A single close(fd) syscall is cheap. But when MAXFDS (maximum file descriptor number) is high, the loop calling close(fd) on each file descriptor can take several milliseconds.

On FreeBSD, the _posixsubprocess could use closefrom() to reduce the number of syscalls.

On Linux, _posixsubprocess lists the content of /proc/self/fd/ to only close open file descriptor, after fork() and before exec().
msg351353 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-09-09 07:28
A workaround is to use close_fds=False which is only safe if all code used in Python implements the PEP 446.

This PEP mentions the FreeBSD issue, mentioning bpo-11284 with MAXFD=655,000:

"The operation can be slow if MAXFD is large. For example, on a FreeBSD buildbot with MAXFD=655,000, the operation took 300 ms: see issue #11284: slow close file descriptors."

Note: the subprocess module is now able to use posix_spawn() syscall, but it doesn't use it if close_fds=True, so it's outside the scope of this issue.
msg351355 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-09-09 07:31
See also bpo-13788: "os.closerange optimization".
msg351356 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-09-09 07:41
> On Linux, _posixsubprocess lists the content of /proc/self/fd/ to only close open file descriptor, after fork() and before exec().

This code is specific to Linux because it uses the SYS_getdents64 syscall.

FreeBSD has a similar concept using /dev/fd "file-descriptor file system". See fdescfs manual page:
https://www.freebsd.org/cgi/man.cgi?query=fdescfs&sektion=5

I'm not sure how it is supposed to work. When I open a file in Python, I don't see its file descriptor in /dev/fd:

vstinner@freebsd$ python3
Python 3.6.9 (default, Aug  8 2019, 01:16:42) 
>>> import os
>>> os.listdir("/dev/fd")
['0', '1', '2']
>>> f=open("/etc/motd")
>>> os.listdir("/dev/fd")
['0', '1', '2']
>>> f.fileno()
3

>>> os.set_inheritable(f.fileno(), True)
>>> os.listdir("/dev/fd")
['0', '1', '2']

>>> import sys, subprocess
>>> rc=subprocess.call([sys.executable, "-c", "import os; print(os.listdir('/dev/fd')); print(os.fstat(3))"], close_fds=False)
['0', '1', '2']
os.stat_result(st_mode=33188, st_ino=321134, st_dev=83, st_nlink=1, st_uid=0, st_gid=0, st_size=899, st_atime=1568014491, st_mtime=1566390614, st_ctime=1566390614)


The file descriptor 3 is not listed in /dev/fd/. It is inherited: fstat(3) in a child process doesn't fail and it's not listed in /dev/fd/ in the child process.
msg351357 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-09-09 08:16
Oh, _posixsubprocess uses /dev/fd/ on FreeBSD, but only if it's mounted file system (expected to be fdescfs filesystem):

#if defined(__FreeBSD__) || (defined(__APPLE__) && defined(__MACH__))
# define FD_DIR "/dev/fd"
#else
# define FD_DIR "/proc/self/fd"
#endif

#if defined(__FreeBSD__)
/* When /dev/fd isn't mounted it is often a static directory populated
 * with 0 1 2 or entries for 0 .. 63 on FreeBSD, NetBSD and OpenBSD.
 * NetBSD and OpenBSD have a /proc fs available (though not necessarily
 * mounted) and do not have fdescfs for /dev/fd.  MacOS X has a devfs
 * that properly supports /dev/fd.
 */
static int
_is_fdescfs_mounted_on_dev_fd(void)
{
    struct stat dev_stat;
    struct stat dev_fd_stat;
    if (stat("/dev", &dev_stat) != 0)
        return 0;
    if (stat(FD_DIR, &dev_fd_stat) != 0)
        return 0;
    if (dev_stat.st_dev == dev_fd_stat.st_dev)
        return 0;  /* / == /dev == /dev/fd means it is static. #fail */
    return 1;
}
#endif


On my FreeBSD 12 VM, MAXFD is around 230k which is quite large.

vstinner@freebsd$ uname -a
FreeBSD freebsd 12.0-RELEASE-p10 FreeBSD 12.0-RELEASE-p10 GENERIC  amd64

vstinner@freebsd$ ./python 
Python 3.9.0a0 (heads/master:4db25d5c39, Sep  9 2019, 07:52:01) 
>>> import os; os.sysconf("SC_OPEN_MAX")
229284

It's easy to measure the overhead of 230k close() syscalls:

vstinner@freebsd$ python3 -m pyperf timeit -s 'import subprocess; args=["/usr/bin/true"]' 'subprocess.Popen(args, close_fds=False).wait()' 
.....................
Mean +- std dev: 1.22 ms +- 0.12 ms
vstinner@freebsd$ python3 -m pyperf timeit -s 'import subprocess; args=["/usr/bin/true"]' 'subprocess.Popen(args, close_fds=True).wait()' 
.....................
Mean +- std dev: 53.3 ms +- 1.4 ms

The overhead is 52.08 ms per Popen().wait() call (with close_fds=True).



If I mount manually the fdescfs filesystem, suddenly, subprocess is efficient again:

vstinner@freebsd$ sudo mount -t fdescfs  /dev/fd 
vstinner@freebsd$ python3 -m pyperf timeit -s 'import subprocess; args=["/usr/bin/true"]' 'subprocess.Popen(args, close_fds=True).wait()' 
.....................
Mean +- std dev: 1.20 ms +- 0.06 ms

close_fds=True becomes as efficient as close_fds=False.
msg351358 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-09-09 08:33
> FreeBSD has a similar concept using /dev/fd "file-descriptor file system". (...) I'm not sure how it is supposed to work.

Sadly, on my FreeBSD VM, it seems like /dev/fd/ is not mounted with fdescfs by default, but as a regular directory with 3 hardcoded files 0, 1 and 2 which are character devices.

I had to mount fdescfs filesystem manually at /dev/fd/ :-(

$ cat /etc/fstab 
# Custom /etc/fstab for FreeBSD VM images
/dev/gpt/rootfs   /       ufs     rw      1       1
/dev/gpt/swapfs  none    swap    sw      0       0

Maybe it's an issue with "FreeBSD VM images" that I chose.

--

The FreeBSD CURRENT buildbot worker mounts /dev/fd:

CURRENT-amd64% cat /etc/fstab
# Device	Mountpoint	FStype	Options	Dump	Pass#
/dev/da0p2	none		swap	sw	0	0
/dev/da0p3	/		ufs	rw	1	1
fdescfs		/dev/fd		fdescfs	rw	0	0

CURRENT-amd64% mount|grep fd
fdescfs on /dev/fd (fdescfs)
msg351367 - (view) Author: Kubilay Kocak (koobs) (Python triager) Date: 2019-09-09 09:02
We're tracking this in our downstream bug: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221700

There's a patch there you may base something upstream on

Open question:

closefrom(2) test in ./configure && ifdef __FreeBSD__ ?
msg351369 - (view) Author: Kubilay Kocak (koobs) (Python triager) Date: 2019-09-09 09:07
@Victor I mounted fdescfs on the buildbot workers to make the tests run faster. You're correct that it's not enabled by default.

If we could look at the initial support being as simple as possible, we can look to backport this to 2.7 as well
msg351383 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-09-09 09:40
> @Victor I mounted fdescfs on the buildbot workers to make the tests run faster. You're correct that it's not enabled by default.

Would it be possible to modify FreeBSD to enable it by default? Or is there a reason to not enable it by default?

> We're tracking this in our downstream bug: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221700 There's a patch there you may base something upstream on

Which patch? https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221700#c3 or something else?


> Open question: closefrom(2) test in ./configure && ifdef __FreeBSD__ ?

No need to make closefrom test conditional depending on the platform. We have plently of tests which are platform specific but run on all platforms.

It's common that a function is first available on a single platform, but then available on more platforms.

It seems like closefrom() is available on:

* FreeBSD
* NetBSD: https://netbsd.gw.com/cgi-bin/man-cgi?closefrom++NetBSD-current
* OpenBSD: https://man.openbsd.org/closefrom.2
* Solaris: https://docs.oracle.com/cd/E23823_01/html/816-5168/closefrom-3c.html
msg351457 - (view) Author: Kubilay Kocak (koobs) (Python triager) Date: 2019-09-09 13:26
> Would it be possible to modify FreeBSD to enable it by default? Or is there a reason to not enable it by default?

That's very unlikely to happen. I believe it was added as an opt-in feature for some linux compatibility situations. The correct solution is to use closefrom(2), as it is the optimal current solution, and in our case, safe to use.

> Which patch? https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221700#c3 or something else?

Yep, that one.

> It seems like closefrom() is available on:

Per #8052 there was some concerns about closefrom(2) not being async-signal safe. [1] 

I can't speak to other implementations, but this is what I requested clarity from our FreeBSD developers on, and confirmed that our implementation is indeed safe. That is the reason why I thought scoping closefrom(2) to __FreeBSD__ may be warranted, just like like the linux specific bits in https://hg.python.org/cpython/rev/61aa484a3e54 from #8052

But I'll leave the decision as to how its implemented (configure checkls, ifdef'd or not) in your capable hands.

Summary: FreeBSD's closefrom(2) is safe to anywhere in Python where it needs to close a range of fd's, including the subprocess module.

[1] https://bugs.python.org/issue8052#msg132307
msg351461 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-09-09 13:36
> That's very unlikely to happen. I believe it was added as an opt-in feature for some linux compatibility situations. The correct solution is to use closefrom(2), as it is the optimal current solution, and in our case, safe to use.

If you have like 50k open file descriptors and want to pass fd 50000 to a child process using Popen(pass_fds={50000}), you will have to call close() 49 997 times: close(3)..close(49999), then closefrom(50001).

For the pass_fds case, being able to enumerate open file descriptors (ex: fdescfs) is more efficient than closefrom().

Well, the best would be a system call "close all file descriptors except of <list of fds>", but it's not available yet. Or maybe at least having a system like close_range(a, b).

In May 2019, the close_range() syscall was in the Linux kernel !
https://lwn.net/Articles/789023/
... but it seems like the feature was not accepted yet.

I'm disappointed that posix_spawn() doesn't implement a feature like Python subprocess close_fds=True.
msg351465 - (view) Author: Kubilay Kocak (koobs) (Python triager) Date: 2019-09-09 13:42
On "close_except", close_range and posix_spawn(close_fds=True), I'll talk to my interested FreeBSD colleagues about potential inroads in that regard.

In the meantime, we're good on closefrom(2) anywhere it can be used in Python
msg361750 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-02-10 23:41
FreeBSD change has been merged:
"lang/python{27,35,36,37,38}: Add closefrom(2) support"
https://svnweb.freebsd.org/ports?view=revision&revision=518640
History
Date User Action Args
2020-02-10 23:41:37vstinnersetmessages: + msg361750
2019-09-09 13:42:44koobssetmessages: + msg351465
2019-09-09 13:36:56vstinnersetmessages: + msg351461
2019-09-09 13:26:37koobssetmessages: + msg351457
2019-09-09 09:40:43vstinnersetmessages: + msg351383
2019-09-09 09:07:35koobssetmessages: + msg351369
2019-09-09 09:02:21koobssetmessages: + msg351367
2019-09-09 08:33:37vstinnersetmessages: + msg351358
2019-09-09 08:16:13vstinnersetmessages: + msg351357
2019-09-09 07:41:13vstinnersetmessages: + msg351356
2019-09-09 07:31:58vstinnersetmessages: + msg351355
2019-09-09 07:29:00vstinnersetnosy: + koobs
2019-09-09 07:28:51vstinnersetmessages: + msg351353
2019-09-09 07:25:36vstinnercreate