Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use vfork() in subprocess on Linux #80004

Closed
izbyshev mannequin opened this issue Jan 25, 2019 · 30 comments
Closed

Use vfork() in subprocess on Linux #80004

izbyshev mannequin opened this issue Jan 25, 2019 · 30 comments
Assignees
Labels
3.10 only security fixes extension-modules C modules in the Modules dir type-feature A feature request or enhancement

Comments

@izbyshev
Copy link
Mannequin

izbyshev mannequin commented Jan 25, 2019

BPO 35823
Nosy @gpshead, @ronaldoussoren, @socketpair, @koobs, @izbyshev, @pablogsal, @Jongy
PRs
  • bpo-35823: subprocess: Use vfork() instead of fork() on Linux when safe #11671
  • bpo-35823: subprocess: Use vfork() instead of fork() on Linux when safe #11671
  • bpo-35823: subprocess: Use vfork() instead of fork() on Linux when safe #11671
  • bpo-35537: Document posix_spawn() change in subprocess #11668
  • bpo-35537: Document posix_spawn() change in subprocess #11668
  • bpo-35823: subprocess: Fix handling of pthread_sigmask() errors #22944
  • bpo-35823: Allow setsid() after vfork() on Linux. #22945
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/izbyshev'
    closed_at = <Date 2020-10-24.02:20:26.839>
    created_at = <Date 2019-01-25.02:03:25.984>
    labels = ['extension-modules', 'type-feature', '3.10']
    title = 'Use vfork() in subprocess on Linux'
    updated_at = <Date 2022-04-08.10:51:23.997>
    user = 'https://github.com/izbyshev'

    bugs.python.org fields:

    activity = <Date 2022-04-08.10:51:23.997>
    actor = 'socketpair'
    assignee = 'izbyshev'
    closed = True
    closed_date = <Date 2020-10-24.02:20:26.839>
    closer = 'gregory.p.smith'
    components = ['Extension Modules']
    creation = <Date 2019-01-25.02:03:25.984>
    creator = 'izbyshev'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 35823
    keywords = ['patch', 'patch', 'patch']
    message_count = 30.0
    messages = ['334336', '334345', '334353', '334363', '334402', '334404', '334407', '334534', '334559', '372426', '372429', '372444', '378664', '378703', '379503', '379504', '379507', '379511', '379527', '379529', '379530', '379531', '379535', '379536', '379537', '379538', '379568', '416901', '416970', '416975']
    nosy_count = 7.0
    nosy_names = ['gregory.p.smith', 'ronaldoussoren', 'socketpair', 'koobs', 'izbyshev', 'pablogsal', 'Yonatan Goldschmidt']
    pr_nums = ['11671', '11671', '11671', '11668', '11668', '22944', '22945']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue35823'
    versions = ['Python 3.10']

    @izbyshev
    Copy link
    Mannequin Author

    izbyshev mannequin commented Jan 25, 2019

    This issue is to propose a (complementary) alternative to the usage of posix_spawn() in subprocess (see bpo-35537).

    As mentioned by Victor Stinner in msg332236, posix_spawn() has the potential of being faster and safer than fork()/exec() approach. However, some of the currently available implementations of posix_spawn() have technical problems (this mostly summarizes discussions in bpo-35537):

    • In glibc < 2.24 on Linux, posix_spawn() doesn't report errors to the parent properly, breaking existing subprocess behavior.

    • In glibc >= 2.25 on Linux, posix_spawn() doesn't report errors to the parent in certain environments, such as QEMU user-mode emulation and Windows subsystem for Linux.

    • In FreeBSD, as of this writing, posix_spawn() doesn't block signals in the child process, so a signal handler executed between vfork() and execve() may change memory shared with the parent [1].

    Regardless of implementation, posix_spawn() is also unsuitable for some subprocess use cases:

    • posix_spawnp() can't be used directly to implement file searching logic of subprocess because of different semantics, requiring workarounds.

    • posix_spawn() has no standard way to specify the current working directory for the child.

    • posix_spawn() has no way to close all file descriptors > 2 in the child, which is the *default* mode of operation of subprocess.Popen().

    May be even more importantly, fundamentally, posix_spawn() will always be less flexible than fork()/exec() approach. Any additions will have to go through POSIX standardization or be unportable. Even if approved, a change will take years to get to actual users because of the requirement to update the C library, which may be more than a decade behind in enterprise Linux distros. This is in contrast to having an addition implemented in CPython. For example, a setrlimit() action for posix_spawn() is currently rejected in POSIX[2], despite being trivial to add.

    I'm interested in avoiding posix_spawn() problems on Linux while still delivering comparable performance and safety. To that end I've studied implementations of posix_spawn() in glibc[3] and musl[4], which use vfork()/execve()-like approach, and investigated challenges of using vfork() safely on Linux (e.g. [5]) -- all of that for the purpose of using vfork()/exec() instead of fork()/exec() or posix_spawn() in subprocess where possible.

    The unique property of vfork() is that the child shares the address space (including heap and stack) as well as thread-local storage with the parent, which means that the child must be very careful not to surprise the parent by changing the shared resources under its feet. The parent is suspended until the child performs execve(), _exit() or dies in any other way.

    The most safe way to use vfork() is if one has access to the C library internals and can do the the following:

    1. Disable thread cancellation before vfork() to ensure that the parent thread is not suddenly cancelled by another thread with pthread_cancel() while being in the middle of child creation.

    2. Block all signals before vfork(). This ensures that no signal handlers are run in the child. But the signal mask is preserved by execve(), so the child must restore the original signal mask. To do that safely, it must reset dispositions of all non-ignored signals to the default, ensuring that no signal handlers are executed in the window between restoring the mask and execve().

    Note that libc-internal signals should be blocked too, in particular, to avoid "setxid problem"[5].

    1. Use a separate stack for the child via clone(CLONE_VM|CLONE_VFORK), which has exactly the same semantics as vfork(), but allows the caller to provide a separate stack. This way potential compiler bugs arising from the fact that vfork() returns twice to the same stack frame are avoided.

    2. Call only async-signal-safe functions in the child.

    In an application, only (1) and (4) can be done easily.

    One can't disable internal libc signals for (2) without using syscall(), which requires knowledge of the kernel ABI for the particular architecture.

    clone(CLONE_VM) can't be used at least before glibc 2.24 because it corrupts the glibc pid/tid cache in the parent process[6,7]. (As may be guessed, this problem was solved by glibc developers when they implemented posix_spawn() via clone()). Even now, the overall message seems to be that clone() is a low-level function not intended to be used by applications.

    Even with the above, I still think that in context of subprocess/CPython the sufficient vfork()-safety requirements are provided by the following.

    Despite being easy, (1) seems to be not necessary: CPython never uses pthread_cancel() internally, so Python code can't do that. A non-Python thread in an embedding app could try, but cancellation, in my knowledge, is not supported by CPython in any case (there is no way for an app to cleanup after the cancelled thread), so subprocess has no reason to care.

    For (2), we don't have to worry about the internal signal used for thread cancellation because of the above. The only other internal signal is used for setxid syncronization[5]. The "setxid problem" is mitigated in Python because the spawning thread holds GIL, so Python code can't call os.setuid() concurrently. Again, a non-Python thread could, but I argue that an application that spawns a child and calls setuid() in non-synchronized manner is not worth supporting: a child will have "random" privileges depending on who wins the race, so this is hardly a good security practice. Even if such apps are considered worthy to support, we may limit vfork()/exec() path only to the non-embedded use case.

    For (3), with production-quality compilers, using vfork() should be OK. Both GCC and Clang recognize it and handle in a special way (similar to setjmp(), which also has "returning twice" semantics). The supporting evidence is that Java has been using vfork() for ages, Go has migrated to vfork(), and, coincidentally, dotnet is doing it right now[8].

    (4) is already done in _posixsubprocess on Linux.

    I've implemented a simple proof-of-concept that uses vfork() in subprocess on Linux by default in all cases except if preexec_fn is not None. It passes all tests on OpenSUSE (Linux 4.15, glibc 2.27) and Ubuntu 14.04 (Linux 4.4, glibc 2.19), but triggers spurious GCC warnings, probably due to a long-standing GCC bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=21161

    I've also run a variant of subprocess_bench.py (by Victor Stinner from bpo-35537) with close_fds=False and restore_signals=False removed on OpenSUSE:

    $ env/bin/python -m perf compare_to fork.json vfork.json
    Mean +- std dev: [fork] 154 ms +- 18 ms -> [vfork] 1.23 ms +- 0.04 ms: 125.52x faster (-99%)

    Compared to posix_spawn, the results on the same machine are similar:

    $ env/bin/python -m perf compare_to posix_spawn.json vfork.json
    Mean +- std dev: [posix_spawn] 1.24 ms +- 0.04 ms -> [vfork] 1.22 ms +- 0.05 ms: 1.02x faster (-2%)

    Note that my implementation should work even for QEMU user-mode (and probably WSL) because it doesn't rely on address space sharing.

    Things to do:

    • Decide whether pthread_setcancelstate() should be used. I'd be grateful for opinions from Python threading experts.

    • Decide whether "setxid problem"[5] is important enough to worry about.

    • Deal with GCC warnings.

    • Test in user-mode QEMU and WSL.

    [1] https://svnweb.freebsd.org/base/head/lib/libc/gen/posix_spawn.c?view=markup&pathrev=326193
    [2] http://austingroupbugs.net/view.php?id=603
    [3] https://sourceware.org/git/?p=glibc.git;a=history;f=sysdeps/unix/sysv/linux/spawni.c;h=353bcf5b333457d191320e358d35775a2e9b319b;hb=HEAD
    [4] http://git.musl-libc.org/cgit/musl/log/src/process/posix_spawn.c
    [5] https://ewontfix.com/7
    [6] https://sourceware.org/bugzilla/show_bug.cgi?id=10311
    [7] https://sourceware.org/bugzilla/show_bug.cgi?id=18862
    [8] dotnet/corefx#33289

    @izbyshev izbyshev mannequin added 3.8 only security fixes extension-modules C modules in the Modules dir type-feature A feature request or enhancement labels Jan 25, 2019
    @ronaldoussoren
    Copy link
    Contributor

    Issue bpo-34663 contains some earlier discussion about vfork, in the context of supporting a specific posix_spawn flag on Linux.

    W.r.t. closing all file descriptors > 2: posix_spawn_file_actions_addclose can do this when using posix_spawn. That would have a performance cost, you'd basically have to resort to closing all possible file descriptors and cannot use the smarter logic used in _posixsubprocess. However, the smarter closing code in _posixsubprocess is not safe w.r.t. vfork according to the comment above _close_open_fds_maybe_unsafe: that function uses some functions that aren't async-safe and one of those calls malloc.

    @izbyshev
    Copy link
    Mannequin Author

    izbyshev mannequin commented Jan 25, 2019

    W.r.t. closing all file descriptors > 2: posix_spawn_file_actions_addclose can do this when using posix_spawn. That would have a performance cost, you'd basically have to resort to closing all possible file descriptors and cannot use the smarter logic used in _posixsubprocess.

    This is costly to the point of people reporting bugs: bpo-35757. If that really causes 0.1s delay like the reporter said, it would defeat the purpose of using posix_spawn() in the first place.

    However, the smarter closing code in _posixsubprocess is not safe w.r.t. vfork according to the comment above _close_open_fds_maybe_unsafe: that function uses some functions that aren't async-safe and one of those calls malloc.

    No, it's not so on Linux:

    #else /* NOT (defined(__linux__) && defined(HAVE_SYS_SYSCALL_H)) */

    Moreover, as I already argued in msg332235, using malloc() after vfork() should be *safer* than after fork() for a simple reason: all memory-based locks will still work as though the child is merely a thread in the parent process. This is true even for things like futex(FUTEX_WAKE_PRIVATE), despite that this operation is mistakenly documented as "process-private" in man pages. It's actually more like "private to tasks sharing the same address space".

    This is in contrast with fork(): if it's called while another thread is holding some mutex in malloc(), nobody would unlock it in the child unless the C library has precautions against that.

    @ronaldoussoren
    Copy link
    Contributor

    BTW. I hadn't noticed _close_open_fds_safe, that should be safe when using vfork().

    Finally: I have no strong opinion on this patch, your explanation looks fine and I'm not up-to-speed w.r.t. implementation details of vfork and the like to feel comfortable about giving a proper review without spending a lot more time on it.

    @izbyshev
    Copy link
    Mannequin Author

    izbyshev mannequin commented Jan 26, 2019

    I've checked subprocess.Popen() error reporting in QEMU user-mode and WSL and confirm that it works both with my patch (vfork/exec) and the traditional fork/exec, but doesn't work with glibc's posix_spawn.

    The first command below uses posix_spawn() internally because close_fds=False. Under QEMU posix_spawn() from glibc doesn't report errors because it relies on address-space sharing of clone(CLONE_VM|CLONE_VFORK), which is not emulated by QEMU. The second command uses vfork()/exec() in _posixsubprocess, but lack of address-space sharing doesn't matter because the error data is transferred via a pipe.

    $ qemu-x86_64 --version
    qemu-x86_64 version 2.11.1
    Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers
    $ ldd --version
    ldd (GNU libc) 2.27
    Copyright (C) 2018 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    Written by Roland McGrath and Ulrich Drepper.
    $ qemu-x86_64 ./python3 -c 'import subprocess; subprocess.call("/xxx", close_fds=False)'
    $ qemu-x86_64 ./python3 -c 'import subprocess; subprocess.call("/xxx")'
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/home/izbyshev/install/python-3.8a/lib/python3.8/subprocess.py", line 324, in call
        with Popen(*popenargs, **kwargs) as p:
      File "/home/izbyshev/install/python-3.8a/lib/python3.8/subprocess.py", line 830, in __init__
        self._execute_child(args, executable, preexec_fn, close_fds,
      File "/home/izbyshev/install/python-3.8a/lib/python3.8/subprocess.py", line 1648, in _execute_child
        raise child_exception_type(errno_num, err_msg, err_filename)
    FileNotFoundError: [Errno 2] No such file or directory: 'xxx'

    For WSL, I've tested Ubuntu 18.04 (glibc 2.27) on Windows 10 1709 (Fall Creators Update) and 1803.

    In 1709, the result is the same as for QEMU (i.e. the first command silently succeeds). In 1803, both commands raise the exception because address-space sharing was fixed for clone(CLONE_VM|CLONE_VFORK) and vfork() in WSL: microsoft/WSL#1878

    I've also run all subprocess tests under QEMU user-mode (by making sys.executable to point to a wrapper that runs python under QEMU) for the following code bases:

    • current master (fork/exec or posix_spawn can be used)
    • classic (fork/exec always)
    • my patch + master (vfork/exec or posix_spawn)
    • vfork/exec always
      All tests pass in all four flavors, which indicates that we don't have a test for error reporting with close_fds=False :)

    Out of curiosity I also did the same on WSL in 1803. Only 3 groups of tests fail, all because of WSL bugs:

    • It's not possible to send signals to zombies, e.g.:
      ======================================================================
      ERROR: test_terminate_dead (test.test_subprocess.POSIXProcessTestCase)
      ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/home/test/cpython/Lib/test/test_subprocess.py", line 1972, in test_terminate_dead
        self._kill_dead_process('terminate')
      File "/home/test/cpython/Lib/test/test_subprocess.py", line 1941, in _kill_dead_process
        getattr(p, method)(*args)
      File "/home/test/cpython/Lib/subprocess.py", line 1877, in terminate
        self.send_signal(signal.SIGTERM)
      File "/home/test/cpython/Lib/subprocess.py", line 1872, in send_signal
        os.kill(self.pid, sig)
    ProcessLookupError: [Errno 3] No such process

    • getdents64 syscall doesn't work properly (doesn't return the rest of entries on the second call, so close_fds=True doesn't fully work with large number of open descriptors).

    ======================================================================
    FAIL: test_close_fds_when_max_fd_is_lowered (test.test_subprocess.POSIXProcessTestCase)
    Confirm that bpo-21618 is fixed (may fail under valgrind).
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/test/cpython/Lib/test/test_subprocess.py", line 2467, in test_close_fds_when_max_fd_is_lowered
        self.assertFalse(remaining_fds & opened_fds,
    AssertionError: {34, 35, 36, 37, 38, 39, 40, 41, 42} is not false : Some fds were left open.

    • Signal masks in /proc/self/status are not correct (always zero)

    ======================================================================
    FAIL: test_restore_signals (test.test_subprocess.POSIXProcessTestCase)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/home/test/cpython/Lib/test/test_subprocess.py", line 1643, in test_restore_signals
        self.assertNotEqual(default_sig_ign_mask, restored_sig_ign_mask,
    AssertionError: b'SigIgn:\t0000000000000000' == b'SigIgn:\t0000000000000000' : restore_signals=True should've unblocked SIGPIPE and friends.

    None of the above is related to fork/vfork/posix_spawn -- the set of failing tests is the same in all four flavors.

    @gpshead
    Copy link
    Member

    gpshead commented Jan 26, 2019

    Thanks for your _extremely detailed_ analysii of the (often sad) state of posix_spawn() on platforms in the world today.

    My first reaction to this was "but then we'll be owning our own custom posix_spawn-like implementation as if we'll do better at it than every individual libc variant."

    After reading this through and looking at your PR... I now consider that a good thing. =) We clearly can do better. vfork() is pretty simple and allows us to keep our semantics; providing benefits to existing users at no cost.

    The plethora of libc bugs surrounding posix_spawn() seem likely to persist within various environments in the world for years to come. No sense in us waiting for that to settle.

    As for your PR... a configure check for vfork, a news entry, and whatever other documentation updates seem appropriate.

    With this in place we may want to make the _use_posix_spawn() logic in subprocess.py stricter? That could be its own followup PR.

    @gpshead
    Copy link
    Member

    gpshead commented Jan 26, 2019

    This is a scary issue. But I think a reasonable approach could be to never use vfork when running as whatever we choose to define a "privileged user" to be.

    getuid() or geteuid() return 0? don't use vfork.

    the concept of "privileged user" can obviously mean a lot more than that and likely goes beyond what we should attempt to ascertain ourselves.

    How about also providing a disable-only global setting so that someone writing code they consider to have elevated privileges can prevent its use entirely. subprocess.disable_use_of_vfork() and subprocess.is_vfork_enabled() calls perhaps (just setting/reading a static int vfork_disallowed = 0 flag within _posixsubprocess.c).

    If we did that, on systems where posix_spawn() _might_ be implemented using vfork() we'd want to avoid using it based on is_vfork_enabled().

    True setuid vs vfork attack security would suggest code needs to opt-in to vfork() or posix_spawn() rather than opt-out. Which would destroy the benefit for most users (who won't bother) for the sake of an issue that just is not what most code ever does (setuid/setgid/etc. calls are very uncommon for most software).

    I think documenting "HEY, if you are running as with elevated privileges, here's a reason why you might want to disable vfork, and how to do it." should be enough. Hopefully not famous last words.

    @izbyshev
    Copy link
    Mannequin Author

    izbyshev mannequin commented Jan 29, 2019

    Thank you for the review and your thoughts, Gregory.

    With this in place we may want to make the _use_posix_spawn() logic in subprocess.py stricter? That could be its own followup PR.

    Yes, I think we can always use vfork() on Linux unless we find some issues. (Victor Stinner doesn't seem to be against: #11668 (comment)). Though C libraries are in a better position to provide safe vfork/exec, by using our implementation we will:

    • avoid breaking QEMU user-mode (and preserve existing subprocess behavior)
    • automatically support fast process creation on musl (which has a good posix_spawn, but doesn't have easy means (macros) on detect that we're on musl and thus avoid unacceptable posix_spawn).
    • avoid having different code paths (and thus potential surprises) depending on arguments of subprocess.Popen()

    This is a scary issue. But I think a reasonable approach could be to never use vfork when running as whatever we choose to define a "privileged user" to be.

    getuid() or geteuid() return 0? don't use vfork.

    I thought about something like this, but initially dismissed it because I (mistakenly) decided that an application may concurrently switch between arbitrary uids back and forth, so that checking getuid() becomes useless (if we're already talking about an exotic app that calls setuid() concurrently with spawning children, why stop there?). But now I realize that an app may use *arbitrary* uids only if one of its real, effective or saved uids is zero (that is, if we don't take capabilities into account), so at least we could call getresuid() to get all 3 uids in a race-free way and check whether the app is *definitely* privileged...

    the concept of "privileged user" can obviously mean a lot more than that and likely goes beyond what we should attempt to ascertain ourselves.

    ...but you're making a good point here. So I'm not sure that we want such checks, but if we do, we'd probably need to allow users to disable them -- what if some heavy privileged daemon wants a faster subprocess?

    How about also providing a disable-only global setting so that someone writing code they consider to have elevated privileges can prevent its use entirely. subprocess.disable_use_of_vfork() and subprocess.is_vfork_enabled() calls perhaps (just setting/reading a static int vfork_disallowed = 0 flag within _posixsubprocess.c).

    I think it's reasonable. I'll look into it when I'm done with current issues.

    True setuid vs vfork attack security would suggest code needs to opt-in to vfork() or posix_spawn() rather than opt-out. Which would destroy the benefit for most users (who won't bother) for the sake of an issue that just is not what most code ever does (setuid/setgid/etc. calls are very uncommon for most software).

    Agree, especially since we're not talking about "just" using setuid() but rather about using setuid() *in a multithreaded process in a racy manner while spawning children*. Honestly, I'm not sure such apps can blame Python if they get a security hole :)

    @izbyshev
    Copy link
    Mannequin Author

    izbyshev mannequin commented Jan 30, 2019

    I've been struggling with fixing spurious -Wclobbered GCC warnings. Originally, I've got the following:

    /scratch2/izbyshev/cpython/Modules/_posixsubprocess.c: In functionsubprocess_fork_exec’:
    /scratch2/izbyshev/cpython/Modules/_posixsubprocess.c:612:15: warning: variablegc_modulemight be clobbered bylongjmporvfork’ [-Wclobbered]
         PyObject *gc_module = NULL;
                   ^~~~~~~~~
    /scratch2/izbyshev/cpython/Modules/_posixsubprocess.c:616:15: warning: variablepreexec_fn_args_tuplemight be clobbered bylongjmporvfork’ [-Wclobbered]
         PyObject *preexec_fn_args_tuple = NULL;
                   ^~~~~~~~~~~~~~~~~~~~~
    /scratch2/izbyshev/cpython/Modules/_posixsubprocess.c:621:17: warning: variablecwdmight be clobbered bylongjmporvfork’ [-Wclobbered]
         const char *cwd;
                     ^~~
    /scratch2/izbyshev/cpython/Modules/_posixsubprocess.c:623:9: warning: variableneed_to_reenable_gcmight be clobbered bylongjmporvfork’ [-Wclobbered]
         int need_to_reenable_gc = 0;
             ^~~~~~~~~~~~~~~~~~~
    /scratch2/izbyshev/cpython/Modules/_posixsubprocess.c:624:38: warning: variableargvmight be clobbered bylongjmporvfork’ [-Wclobbered]
         char *const *exec_array, *const *argv = NULL, *const *envp = NULL;
                                          ^~~~
    /scratch2/izbyshev/cpython/Modules/_posixsubprocess.c:624:59: warning: variableenvpmight be clobbered bylongjmporvfork’ [-Wclobbered]
         char *const *exec_array, *const *argv = NULL, *const *envp = NULL;
                                                               ^~~~
    /scratch2/izbyshev/cpython/Modules/_posixsubprocess.c:626:9: warning: variableneed_after_forkmight be clobbered bylongjmporvfork’ [-Wclobbered]
         int need_after_fork = 0;
             ^~~~~~~~~~~~~~~
    /scratch2/izbyshev/cpython/Modules/_posixsubprocess.c:627:9: warning: variablesaved_errnomight be clobbered bylongjmporvfork’ [-Wclobbered]
         int saved_errno = 0;
             ^~~~~~~~~~~

    I've checked that all warnings are spurious: all flagged variables are either not modified in the child or modified and used only by the child. A simple way to suppress the warnings would be "volatile", but I don't want to spray "volatile" over the huge declaration block of subprocess_fork_exec().

    Another way is to move vfork() to a separate function and ensure that this function does as little as possible with its local variables. I've implemented two versions of this approach, both are ugly in some sense. I've pushed the first into the PR branch and the second into a separate branch https://github.com/izbyshev/cpython/tree/single-do-fork-exec.

    Yet another way would be to simply disable this diagnostic for _posixsubprocess (e.g. via #pragma GCC diagnostic), but I'm not sure we want to do that -- may be it'll be fixed in the future or a real defect will be introduced into our code.

    @gpshead gpshead added 3.10 only security fixes and removed 3.8 only security fixes labels Jun 8, 2020
    @Jongy
    Copy link
    Mannequin

    Jongy mannequin commented Jun 26, 2020

    With bpo-35537 merged, do we have any intention to get on with this? From what I can tell, it provides roughly the same performance benefit - but works also with close_fds=True, which is the default of Popen, so IMO it's a worthy addition.

    I tested a rebase on current master, test_subprocess passes on my Linux box and there are no more -Wclobbered warnings after Alexey's latest change of the do_fork_exec() helper.

    @izbyshev
    Copy link
    Mannequin Author

    izbyshev mannequin commented Jun 26, 2020

    I'd really like to get this merged eventually because vfork()-based solution is fundamentally more generic than posix_spawn(). Apart from having no issue with close_fds=True, it will also continue to allow subprocess to add any process context tweaks without waiting for availability of corresponding features in posix_spawn().

    If there is no objection from Gregory or other devs, I can pick up from where I left in two weeks.

    @gpshead
    Copy link
    Member

    gpshead commented Jun 26, 2020

    No objections, it would be great to see this finished up and land.

    I've encountered a minority of users who are using a wrapped vfork-based C/C++ library for process spawning as fork+exec isn't fast enough for them.

    @izbyshev
    Copy link
    Mannequin Author

    izbyshev mannequin commented Oct 15, 2020

    Well, much later than promised, but I'm picking it up. Since in the meantime support for setting uid/gid/groups was merged, and I'm aware about potential issues with calling corresponding C library functions in a vfork()-child, I asked a question on musl mailing list: https://www.openwall.com/lists/musl/2020/10/12/1

    So, it seems we'll need to fallback to fork() if set*id() is needed, which is in line with our previous discussion about avoidance of vfork() in privileged processes anyway.

    I'm also discussing -Wclobbered warnings with a GCC developer. I wouldn't like to restructure code just to avoid GCC false positives, so currently I'm leaning towards disabling this warning entirely for subprocess_fork_exec() and documenting that arbitrary stores to local variables between vfork() and child_exec() are not allowed due to stack sharing, but we'll see if a better solution emerges.

    @izbyshev izbyshev mannequin self-assigned this Oct 15, 2020
    @izbyshev
    Copy link
    Mannequin Author

    izbyshev mannequin commented Oct 16, 2020

    I've updated my PR.

    • After a discussion with Alexander Monakov (a GCC developer), moved vfork() into a small function to isolate it from both subprocess_fork_exec() and child_exec(). This appears to be the best strategy to avoid -Wclobbered false positives as well as potential compiler bugs.

    • Got rid of VFORK_USABLE checks in function parameter lists. Now child_sigmask parameter is always present, but is NULL if vfork() is not available or usable. The type is void * to avoid hard dependence on sigset_t, which other CPython source files appear to avoid.

    • Disabled vfork() in case setuid()/setgid()/setgroups() needed.

    • Added more comments explaining vfork()-related stuff.

    • Added commit message and NEWS entry.

    Potential improvements:

    • To avoid repeating long parameter lists in several functions, we can move them to a struct. The downside is that we'd need to convert child_exec() to use that struct all over the place. I don't have a strong preference here.

    • Are any documentation changes needed? We don't change any interfaces, so I'm not sure.

    Potential future directions:

    • If desired, a vfork() opt-out may be implemented. But it'd need to disable posix_spawn() on Linux as well, since it might be implemented via vfork(), and we have no good way to check that.

    @gpshead
    Copy link
    Member

    gpshead commented Oct 24, 2020

    New changeset 976da90 by Alexey Izbyshev in branch 'master':
    bpo-35823: subprocess: Use vfork() instead of fork() on Linux when safe (GH-11671)
    976da90

    @gpshead
    Copy link
    Member

    gpshead commented Oct 24, 2020

    now waiting to see how happy all of the buildbots are...

    We currently have a __linux__ check in the code deciding VFORK_USABLE.

    From what I can tell, vfork probably also works on macOS (darwin).

    Lets let this run for a bit on Linux and it can be a separate issue to open vfork usage up to other platforms.

    @gpshead
    Copy link
    Member

    gpshead commented Oct 24, 2020

    • To avoid repeating long parameter lists in several functions, we can move them to a struct. The downside is that we'd need to convert child_exec() to use that struct all over the place. I don't have a strong preference here.

    Agreed that the long parameter lists are annoying. I don't like shoving that much on the stack and by adding an additional call with so many params we've increase stack usage. I consider this refactoring work that can be done on its own outside of this issue though.

    • Are any documentation changes needed? We don't change any interfaces, so I'm not sure.

    I don't think so. I looked things over and I think all that is needed is the existing Misc/NEWS entry.

    Potential future directions:

    • If desired, a vfork() opt-out may be implemented. But it'd need to disable posix_spawn() on Linux as well, since it might be implemented via vfork(), and we have no good way to check that.

    We have such an opt-out capabilty for posix_spawn via subprocess._USE_POSIX_SPAWN https://github.com/python/cpython/blob/master/Lib/subprocess.py#L651
    along with code in there that determines the default value based on the detected runtime environment.

    I'm not sure we'll need that for vfork() as it seems to pretty much be a low level raw system call wrapper rather than something to be implemented via libc that can have its own bugs. If we do wind up wanting one, I'd structure it the same way.

    @gpshead
    Copy link
    Member

    gpshead commented Oct 24, 2020

    Thank you for taking this on! I'm calling it fixed for now as the buildbots are looking happy with it. If issues with it arise we can address them.

    @gpshead gpshead closed this as completed Oct 24, 2020
    @ronaldoussoren
    Copy link
    Contributor

    From what I can tell, vfork probably also works on macOS (darwin).

    Lets let this run for a bit on Linux and it can be a separate issue to
    open vfork usage up to other platforms.

    I'd prefer to not use vfork on macOS. For one I don't particularly trust that vfork would work reliably when using higher level APIs, but more importantly posix_spawn on macOS has some options that are hard to achieve otherwise and might be useful to expose in subprocess. An example of this is selecting the CPU architecture to use for the launched process when using fat binaries and a system that supports CPU emulation, such as the intel emulation on the upcoming Apple Silicon systems.

    @izbyshev
    Copy link
    Mannequin Author

    izbyshev mannequin commented Oct 24, 2020

    Thank you for taking this on! I'm calling it fixed for now as the buildbots are looking happy with it. If issues with it arise we can address them.

    Thank you for reviewing and merging!

    Using POSIX_CALL for pthread_sigmask() is incorrect, however, since it *returns* the error instead of setting errno. I've opened a PR with a fix and additional handling of the first pthread_sigmask() call in the parent (which can be done easily).

    I've also noticed a memory leak why doing the above: cwd_obj2 is not released on the error path. It probably slipped with patches adding user/groups support. I'll file a separate issue.

    Would you also clarify why you disallowed vfork() in case setsid() is needed? Have you discovered any potential issues with setsid()? Note that setsid() doesn't suffer from thread sync issues like setuid()/setgid()/etc.

    @gpshead
    Copy link
    Member

    gpshead commented Oct 24, 2020

    New changeset 473db47 by Alexey Izbyshev in branch 'master':
    bpo-35823: subprocess: Fix handling of pthread_sigmask() errors (GH-22944)
    473db47

    @gpshead
    Copy link
    Member

    gpshead commented Oct 24, 2020

    regarding excluding the setsid() case: I was being conservative as I couldn't find a reference of what was and wasn't allowed after vfork.

    I found one thing suggesting that on macOS setsid() was not safe after vfork(). But that appeared to be a Darwin-ism. I expect that is not true on Linux as it should just be a syscall updating a couple of fields in the process info. Confirming, in glibc is appears to be a shim for the setsid syscall (based on not finding any code implementing anything special for it) and in uclibc (much easier to read) it is clearly just a setsid syscall shim.

    I'll make a PR to undo the setsid restriction given we're Linux only.

    @gpshead
    Copy link
    Member

    gpshead commented Oct 24, 2020

    New changeset be3c3a0 by Gregory P. Smith in branch 'master':
    bpo-35823: Allow setsid() after vfork() on Linux. (GH-22945)
    be3c3a0

    @izbyshev
    Copy link
    Mannequin Author

    izbyshev mannequin commented Oct 24, 2020

    regarding excluding the setsid() case: I was being conservative as I couldn't find a reference of what was and wasn't allowed after vfork.

    Yes, there is no list of functions allowed after vfork(), except for the conservative POSIX.1 list consisting only of _exit() and execve(), so we can only take async-signal-safe functions as a first approximation and work from there. Thankfully, on Linux, C libraries don't do anything fancy in most cases. But, for example, it appears that on FreeBSD we wouldn't be able to use sigprocmask()/sigaction()[1]. BTW, commit[2] and the linked review are an interesting reading for anybody who would like to support posix_spawn() and/or vfork() in subprocess on FreeBSD.

    Confirming, in glibc is appears to be a shim for the setsid syscall (based on not finding any code implementing anything special for it) and in uclibc (much easier to read) it is clearly just a setsid syscall shim.

    I also recommend musl[3] when simplicity (and correctness) is required :)

    [1] https://svnweb.freebsd.org/base/head/lib/libc/gen/posix_spawn.c?view=markup&pathrev=362111#l126
    [2] https://svnweb.freebsd.org/base?view=revision&revision=352712
    [3] https://git.musl-libc.org/cgit/musl/tree/src/unistd/setsid.c?id=a5aff1972c9e3981566414b09a28e331ccd2be5d

    @izbyshev
    Copy link
    Mannequin Author

    izbyshev mannequin commented Oct 24, 2020

    @ronaldoussoren

    I'd prefer to not use vfork on macOS. For one I don't particularly trust that vfork would work reliably when using higher level APIs, but more importantly posix_spawn on macOS has some options that are hard to achieve otherwise and might be useful to expose in subprocess.

    I can't comment on vfork() usability on macOS myself, but given the number of issues/considerations described here, I expect that significant research would be needed to check that.

    Regarding your second point about extra posix_spawn() options, I actually don't see why it would be an argument against vfork(). Even on Linux, subprocess prefers posix_spawn() to vfork()/fork() when possible, so vfork() support doesn't hinder posix_spawn().

    @gpshead
    Copy link
    Member

    gpshead commented Oct 24, 2020

    Nice links. LOL, yes, musl source was going to be my next stop if the larger libc sources proved impossible for a mere mortal to reason about. :)

    regarding macOS, agreed. If someone needs vfork() to work there, I believe it could be made to.

    Options like selecting the architecture of the child process could be higher level options to the subprocess API. Hiding the platform specific details of how that happens and deciding which underlying approach to use based on the flags.

    multi-arch systems are a thing. It is conceivable that we may even see non-esoteric multi-arch hardware at some point.

    @gpshead
    Copy link
    Member

    gpshead commented Oct 25, 2020

    Performance improvement measured on a 1.4Ghz quad aarch64 A57 (nvidia jetson nano):

    #define VFORK_USABLE 1
     test_subprocess: 36.5 seconds

    #undef VFORK_USABLE
    test_subprocess: 45 seconds

    Nice. I really didn't expect a 20% speedup on its testsuite alone!

    Lets dive into that with a microbenchmark:

    $ ./build-novfork/python ten_seconds_of_truth.py
    Launching /bin/true for 10 seconds in a loop.
    Launched 2713 subprocesses in 10.00194378796732 seconds.
    271.247275281014 subprocesses/second.
    Increased our mapped pages by 778240KiB.
    Launching /bin/true for 10 seconds in a loop.
    Launched 212 subprocesses in 10.006392606999725 seconds.
    21.186456331095847 subprocesses/second.
    
    $ ./build/python ten_seconds_of_truth.py
    Launching /bin/true for 10 seconds in a loop.
    Launched 3310 subprocesses in 10.001623224001378 seconds.
    330.94628000551285 subprocesses/second.
    Increased our mapped pages by 778240KiB.
    Launching /bin/true for 10 seconds in a loop.
    Launched 3312 subprocesses in 10.001519071985967 seconds.
    331.1496959773679 subprocesses/second.

    Demonstrating perfectly the benefit of vfork(). The more mapped pages, the slower regular fork() becomes. With vfork() the size of the process's memory map does not matter.

    ten_seconds_of_truth.py:

    from time import monotonic as now
    import subprocess
    
    def benchmark_for_ten():
        print('Launching /bin/true for 10 seconds in a loop.')
    
        count = 0
        start = now()
        while now() - start < 10.:
            subprocess.run(['/bin/true'])
            count += 1
        end = now()
        duration = end-start
    
        print(f'Launched {count} subprocesses in {duration} seconds.')
        print(f'{count/duration} subprocesses/second.')
    
    benchmark_for_ten()
    
    map_a_bunch_of_pages = '4agkahglahaa^#\0ag3\3'*1024*1024*40
    
    print(f'Increased our mapped pages by {len(map_a_bunch_of_pages)//1024}KiB.')
    
    benchmark_for_ten()

    @socketpair
    Copy link
    Mannequin

    socketpair mannequin commented Apr 6, 2022

    See bpo-47245.

    https://github.com/bminor/glibc/blob/master/sysdeps/unix/sysv/linux/spawni.c#L309

    In short - do not use vfork(). Use clone(CLONE_VM | CLONE_VFORK). And build separate stack.

    Current implementation is heavily broken.

    Another guy has failed: https://bugzilla.kernel.org/show_bug.cgi?id=215813.

    Please comment calling vfork() as soon as possible in order to fallback to fork() implementation. Until you (or me?) write good vfork children code.

    @izbyshev
    Copy link
    Mannequin Author

    izbyshev mannequin commented Apr 8, 2022

    The preceding comment is wrong, see discussion in bpo-47245 and https://bugzilla.kernel.org/show_bug.cgi?id=215813#c14 for explanation of why that bug report is irrelevant for CPython.

    @socketpair
    Copy link
    Mannequin

    socketpair mannequin commented Apr 8, 2022

    Yes, you are almost right. Error-path is not so clear (it is discussed in another issue), but in general, yes, my previous comment is wrong. So, finally, there are no bugs around the stack at all.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    bdraco added a commit to bdraco/home-assistant that referenced this issue Feb 12, 2023
    By default python 3.10 will use the fork() which has to
    copy all the memory of the parent process (in our case
    this can be huge since Home Assistant core can use
    hundreds of megabytes of RAM). By using posix_spawn
    this is avoided.
    
    In python 3.11 vfork will also be available
    python/cpython#80004 (comment)
    python/cpython#11671 but we won't
    always be able to use it and posix_spawn is considered safer
    https://bugzilla.kernel.org/show_bug.cgi?id=215813#c14
    
    The subprocess library doesn't know about musl though
    even though it supports posix_spawn https://git.musl-libc.org/cgit/musl/log/src/process/posix_spawn.c
    so we have to teach it since it only has checks for glibc
    https://github.com/python/cpython/blob/1b736838e6ae1b4ef42cdd27c2708face908f92c/Lib/subprocess.py#L745
    
    The constant is documented as being able to be flipped here:
    https://docs.python.org/3/library/subprocess.html#disabling-use-of-vfork-or-posix-spawn
    bdraco added a commit to home-assistant/core that referenced this issue Feb 12, 2023
    By default python 3.10 will use the fork() which has to
    copy memory of the parent process (in our case
    this can be huge since Home Assistant core can use
    hundreds of megabytes of RAM). By using posix_spawn
    this is avoided and subprocess creation does not
    get discernibly slow the larger the Home Assistant
    python process grows.
    
    In python 3.11 vfork will also be available
    python/cpython#80004 (comment)
    python/cpython#11671 but we won't
    always be able to use it and posix_spawn is considered safer
    https://bugzilla.kernel.org/show_bug.cgi?id=215813#c14
    
    The subprocess library doesn't know about musl though
    even though it supports posix_spawn https://git.musl-libc.org/cgit/musl/log/src/process/posix_spawn.c
    so we have to teach it since it only has checks for glibc
    https://github.com/python/cpython/blob/1b736838e6ae1b4ef42cdd27c2708face908f92c/Lib/subprocess.py#L745
    
    The constant is documented as being able to be flipped here:
    https://docs.python.org/3/library/subprocess.html#disabling-use-of-vfork-or-posix-spawn
    balloob pushed a commit to home-assistant/core that referenced this issue Feb 13, 2023
    )
    
    * use posix spawn on alpine
    
    * Avoid subprocess memory copy when c library supports posix_spawn
    
    By default python 3.10 will use the fork() which has to
    copy all the memory of the parent process (in our case
    this can be huge since Home Assistant core can use
    hundreds of megabytes of RAM). By using posix_spawn
    this is avoided.
    
    In python 3.11 vfork will also be available
    python/cpython#80004 (comment)
    python/cpython#11671 but we won't
    always be able to use it and posix_spawn is considered safer
    https://bugzilla.kernel.org/show_bug.cgi?id=215813#c14
    
    The subprocess library doesn't know about musl though
    even though it supports posix_spawn https://git.musl-libc.org/cgit/musl/log/src/process/posix_spawn.c
    so we have to teach it since it only has checks for glibc
    https://github.com/python/cpython/blob/1b736838e6ae1b4ef42cdd27c2708face908f92c/Lib/subprocess.py#L745
    
    The constant is documented as being able to be flipped here:
    https://docs.python.org/3/library/subprocess.html#disabling-use-of-vfork-or-posix-spawn
    
    * Avoid subprocess memory copy when c library supports posix_spawn
    
    By default python 3.10 will use the fork() which has to
    copy memory of the parent process (in our case
    this can be huge since Home Assistant core can use
    hundreds of megabytes of RAM). By using posix_spawn
    this is avoided and subprocess creation does not
    get discernibly slow the larger the Home Assistant
    python process grows.
    
    In python 3.11 vfork will also be available
    python/cpython#80004 (comment)
    python/cpython#11671 but we won't
    always be able to use it and posix_spawn is considered safer
    https://bugzilla.kernel.org/show_bug.cgi?id=215813#c14
    
    The subprocess library doesn't know about musl though
    even though it supports posix_spawn https://git.musl-libc.org/cgit/musl/log/src/process/posix_spawn.c
    so we have to teach it since it only has checks for glibc
    https://github.com/python/cpython/blob/1b736838e6ae1b4ef42cdd27c2708face908f92c/Lib/subprocess.py#L745
    
    The constant is documented as being able to be flipped here:
    https://docs.python.org/3/library/subprocess.html#disabling-use-of-vfork-or-posix-spawn
    
    * missed some
    
    * adjust more tests
    
    * coverage
    AlePerla pushed a commit to AlePerla/homeassistant_core that referenced this issue Feb 17, 2023
    …e-assistant#87958)
    
    * use posix spawn on alpine
    
    * Avoid subprocess memory copy when c library supports posix_spawn
    
    By default python 3.10 will use the fork() which has to
    copy all the memory of the parent process (in our case
    this can be huge since Home Assistant core can use
    hundreds of megabytes of RAM). By using posix_spawn
    this is avoided.
    
    In python 3.11 vfork will also be available
    python/cpython#80004 (comment)
    python/cpython#11671 but we won't
    always be able to use it and posix_spawn is considered safer
    https://bugzilla.kernel.org/show_bug.cgi?id=215813#c14
    
    The subprocess library doesn't know about musl though
    even though it supports posix_spawn https://git.musl-libc.org/cgit/musl/log/src/process/posix_spawn.c
    so we have to teach it since it only has checks for glibc
    https://github.com/python/cpython/blob/1b736838e6ae1b4ef42cdd27c2708face908f92c/Lib/subprocess.py#L745
    
    The constant is documented as being able to be flipped here:
    https://docs.python.org/3/library/subprocess.html#disabling-use-of-vfork-or-posix-spawn
    
    * Avoid subprocess memory copy when c library supports posix_spawn
    
    By default python 3.10 will use the fork() which has to
    copy memory of the parent process (in our case
    this can be huge since Home Assistant core can use
    hundreds of megabytes of RAM). By using posix_spawn
    this is avoided and subprocess creation does not
    get discernibly slow the larger the Home Assistant
    python process grows.
    
    In python 3.11 vfork will also be available
    python/cpython#80004 (comment)
    python/cpython#11671 but we won't
    always be able to use it and posix_spawn is considered safer
    https://bugzilla.kernel.org/show_bug.cgi?id=215813#c14
    
    The subprocess library doesn't know about musl though
    even though it supports posix_spawn https://git.musl-libc.org/cgit/musl/log/src/process/posix_spawn.c
    so we have to teach it since it only has checks for glibc
    https://github.com/python/cpython/blob/1b736838e6ae1b4ef42cdd27c2708face908f92c/Lib/subprocess.py#L745
    
    The constant is documented as being able to be flipped here:
    https://docs.python.org/3/library/subprocess.html#disabling-use-of-vfork-or-posix-spawn
    
    * missed some
    
    * adjust more tests
    
    * coverage
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes extension-modules C modules in the Modules dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants