classification
Title: Concurrently closing files and iterating over the open files directory is not well specified
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.10, Python 3.9
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: gregory.p.smith Nosy List: bup, gregory.p.smith, iritkatriel, neologix, pitrou, serhiy.storchaka, sstewartgallus, vstinner
Priority: normal Keywords: patch

Created on 2014-06-01 17:22 by sstewartgallus, last changed 2021-09-20 08:29 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
cloexec.patch sstewartgallus, 2014-06-02 00:56 The patch review
fixed-setcloexec.patch sstewartgallus, 2014-06-12 19:55 The fixed up patch review
Messages (17)
msg219510 - (view) Author: Steven Stewart-Gallus (sstewartgallus) * Date: 2014-06-01 17:22
Hello,

I noticed some possible bad behaviour while working on Python issue
21618 (see http://bugs.python.org/issue21618). Python has the
following code in _posixsubmodules.c for closing extra files before
spawning a process:

static void
_close_open_fd_range_safe(int start_fd, int end_fd, PyObject* py_fds_to_keep)
{
    int fd_dir_fd;
    if (start_fd >= end_fd)
        return;

    fd_dir_fd = _Py_open(FD_DIR, O_RDONLY);
    if (fd_dir_fd == -1) {
        /* No way to get a list of open fds. */
        _close_fds_by_brute_force(start_fd, end_fd, py_fds_to_keep);
        return;
    } else {
        char buffer[sizeof(struct linux_dirent64)];
        int bytes;
        while ((bytes = syscall(SYS_getdents64, fd_dir_fd,
                                (struct linux_dirent64 *)buffer,
                                sizeof(buffer))) > 0) {
            struct linux_dirent64 *entry;
            int offset;
            for (offset = 0; offset < bytes; offset += entry->d_reclen) {
                int fd;
                entry = (struct linux_dirent64 *)(buffer + offset);
                if ((fd = _pos_int_from_ascii(entry->d_name)) < 0)
                    continue;  /* Not a number. */
                if (fd != fd_dir_fd && fd >= start_fd && fd < end_fd &&
                    !_is_fd_in_sorted_fd_sequence(fd, py_fds_to_keep)) {
                    while (close(fd) < 0 && errno == EINTR);
                }
            }
        }
        close(fd_dir_fd);
    }
}

In the code FD_DIR is "/proc/self/fd" on Linux. I'm not sure this code
is correct. This seems as if it would have the same problems as
iterating over a list and modifying it at the same time.

I can think of a few solutions but they all have problems.

One could allocate a list of open files once and then iterate through
that list and close the files but this is not signal safe so this
solution is incorrect. One possible workaround is to use mmap to
allocate the memory. This is a direct system call and I don't see a
reason for it not to be signal safe but I'm not sure. One neat hack
would be too mmap the memory ahead of time and then rely on lazy
paging to allocate the memory lazily. Another solution is to search
for the largest open file and then iterate over and close all the
possible file descriptors in between. So far most possible solutions
just seem really hacky to me though.

I feel the best solution is to let the OS close the files and to set
the O_CLOEXEC flag on the files that need to be closed.
msg219541 - (view) Author: Steven Stewart-Gallus (sstewartgallus) * Date: 2014-06-02 00:50
Okay, I've made a simple proof of concept patch.
msg219542 - (view) Author: Steven Stewart-Gallus (sstewartgallus) * Date: 2014-06-02 00:56
Gah, I had trouble figuring out mecurial.

Close files after reading open files and not concurrently

This commit removes the old behaviour of closing files concurrently
with reading the system's list of open files and instead closes the
files after the list has been read.

Because no memory can be allocated in that specific context the
approach of setting the CLOEXEC flag and letting the following exec
close the files has been used.

For consistency, the brute force approach to closing the files has
also been modified to set the CLOEXEC flag.
msg219579 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-06-02 12:36
I have no opinon on close() vs setting CLOEXEC flag, but you should use _Py_set_inheritable(fd, 0, NULL). _Py_set_inheritable() is able to set the CLOEXEC flag in a single syscall, instead of 2 syscalls. _close_fds_by_brute_force() is already very slow when the maximum file descriptor is large:
http://legacy.python.org/dev/peps/pep-0446/#closing-all-open-file-descriptors
msg219582 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-06-02 12:47
Since Python 3.4, all file descriptors directly created by Python are marked as non-inheritable. It should now be safe to use subprocess with close_fds=False, but the default value was not changed because third party modules (especially code written in C) may not respect the PEP 446.

If you are interested to change the default value, you should audit the major Python modules on PyPI to check if they were adapted to respect the PEP 446 (mark all file descriptors as non-inheritable).
msg219651 - (view) Author: Steven Stewart-Gallus (sstewartgallus) * Date: 2014-06-02 23:42
It occurred to me that the current patch I have is wrong and that using _Py_set_inheritable is wrong because EBADF can occur in the brute force version which in the case of _Py_set_inheritable raises an error which I am not sure is asynch signal safe. I could test ahead of time but that is a bit hacky. The most principled approach would be to extract either a common set_cloexec or set_inheritable function. If set_inheritable is done then I would have to report either a Windows error or an errno error which would be messy. I'm not sure where the best place to put the set_cloexec function would be.
msg219773 - (view) Author: Steven Stewart-Gallus (sstewartgallus) * Date: 2014-06-04 22:03
Okay, so the Python directory seems to be where wrappers over low level or missing functionality is placed. So I guess I should make a _Py_setcloexec function in a Python/setcloexec.c and a definition in Include/setcloexec.h?
msg219908 - (view) Author: Steven Stewart-Gallus (sstewartgallus) * Date: 2014-06-07 02:43
Okay, now I'm confused. How would I conditionally compile and use the setcloexec object and header on POSIX platforms and not on Windows?
msg219922 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-06-07 08:14
The _posixsubprocess module is not compiled on Windows.
msg220376 - (view) Author: Steven Stewart-Gallus (sstewartgallus) * Date: 2014-06-12 19:53
Okay, I made a patch that I hoped dealt with all the criticisms and that fixed up a problem I noted myself.
msg382586 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2020-12-06 02:10
I can't find _posixsubmodules.c or _close_open_fd_range_safe in the codebase. Is this issue out of date?
msg382603 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-12-06 19:02
It is Modules/_posixsubprocess.c.

We still do not have a reproducer, so I am not sure that the code has this problem.
msg382618 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2020-12-07 03:11
The Linux kernel code is not sufficiently easy to follow to understand if it has this issue or if it pre-creates the dirent structures for all fds at opendir time for /proc/self/fd or if it is iterating through the list of fds in sorted order so an older closed fd will not interfere with its internal iteration.

Regardless, I've yet to knowingly witness a problem from this come up in practice.  knowingly and yet being key words. :)

But I like the general theme of your patch to set CLOEXEC on all of the fd's rather than explicitly call close(fd) in the directory reading loop.
msg382724 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-12-08 10:27
Linux 5.9 added close_range() syscall that we can use.

Linux 5.10 may get a new CLOSE_RANGE_CLOEXEC flag for close_range().
https://lwn.net/Articles/837816/

We may use close_range() rather than iterating on /proc/self/fd/ in user space. It may be faster and safer.
msg402073 - (view) Author: Dan Snider (bup) * Date: 2021-09-17 18:53
On Android, if os.close_range closes the file descriptor of the scandir iterator, the interpreter immediately crashes eg:

>>> import os
>>> os.scandir()
<posix.ScandirIterator object at 0x7082d6ef10>
>>> os.closerange(3,9999)
fdsan: attempted to close file descriptor 3, expected to be unowned, actually owned by DIR* 0x7263390290
Aborted
$

Sorry if this isn't an appropriate place to add this, but the title matched the concept I think and I don't know enough to judge whether this is something that needs to be fixed.
msg402208 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-09-20 07:57
Dan Snider: "On Android, if os.close_range closes the file descriptor of the scandir iterator, the interpreter immediately crashes"

I don't see anything wrong with Python. It's not different than calling os.close(3).

If it hurts, don't do that :-)
msg402212 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-09-20 08:29
I close the issue as "not a bug". If you disagree, please comment the issue :-)

Steven Stewart-Gallus:
> In the code FD_DIR is "/proc/self/fd" on Linux. I'm not sure this code
> is correct. This seems as if it would have the same problems as
> iterating over a list and modifying it at the same time.

I don't understand the initial issue. What makes you think that the current code is not reliable? Python is using this code for years, and so far, nobody reported any crash on this code.

You didn't report a crash, you only wrote that you suspect that there is a bug. Can you prove that the current Python code is wrong? Did you look at the implementation of readdir() in the Linux glibc for example?

About close() vs marking file descriptor non inheritable, I'm not convinced that marking file descriptors is faster or more reliable. On many operating systems, _Py_set_inheritable() requires 2 syscalls (get flags, set flags), whereas close() is a single syscall: calling close() is faster.

---

Python now has a _Py_closerange() function which supports:

* Linux closerange() syscall, calling the glibc closerange() function
* FreeBSD closefrom() function
* Solaris fdwalk() function

On Linux, _posixsubprocess still iterates /proc/self/pid/ if this directory can be opened, even if the closerange() function is available. I'm not sure which option is the safest or the fastest. 

By the way, Steven Stewart-Gallus did mention his operating system.
History
Date User Action Args
2021-09-20 08:29:22vstinnersetstatus: open -> closed
resolution: not a bug
messages: + msg402212

stage: resolved
2021-09-20 07:57:57vstinnersetmessages: + msg402208
2021-09-17 18:53:06bupsetnosy: + bup
messages: + msg402073
2020-12-08 10:27:00vstinnersetmessages: + msg382724
2020-12-07 03:11:12gregory.p.smithsetversions: + Python 3.9, Python 3.10, - Python 3.5
type: behavior
messages: + msg382618

assignee: gregory.p.smith
components: + Library (Lib)
resolution: out of date -> (no value)
2020-12-06 19:02:04serhiy.storchakasetstatus: pending -> open
nosy: + serhiy.storchaka
messages: + msg382603

2020-12-06 02:10:02iritkatrielsetstatus: open -> pending

nosy: + iritkatriel
messages: + msg382586

resolution: out of date
2014-06-12 19:55:11sstewartgallussetfiles: + fixed-setcloexec.patch
2014-06-12 19:54:57sstewartgallussetfiles: - fixed-setcloexec.patch
2014-06-12 19:53:35sstewartgallussetfiles: + fixed-setcloexec.patch

messages: + msg220376
2014-06-07 08:14:51vstinnersetmessages: + msg219922
2014-06-07 02:43:53sstewartgallussetmessages: + msg219908
2014-06-04 22:03:18sstewartgallussetmessages: + msg219773
2014-06-02 23:42:07sstewartgallussetmessages: + msg219651
2014-06-02 12:47:13vstinnersetmessages: + msg219582
2014-06-02 12:36:02vstinnersetnosy: + vstinner
messages: + msg219579
2014-06-02 12:27:41vstinnersetnosy: + pitrou, neologix
2014-06-02 11:22:55berker.peksagsetnosy: + gregory.p.smith

versions: + Python 3.5
2014-06-02 00:56:58sstewartgallussetfiles: + cloexec.patch
keywords: + patch
messages: + msg219542
2014-06-02 00:50:03sstewartgallussetmessages: + msg219541
2014-06-01 17:22:33sstewartgalluscreate