Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

potential undefined behavior with subprocess using vfork() on Linux? #91401

Closed
socketpair mannequin opened this issue Apr 6, 2022 · 18 comments
Closed

potential undefined behavior with subprocess using vfork() on Linux? #91401

socketpair mannequin opened this issue Apr 6, 2022 · 18 comments
Assignees
Labels
3.10 only security fixes 3.11 only security fixes extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error

Comments

@socketpair
Copy link
Mannequin

socketpair mannequin commented Apr 6, 2022

BPO 47245
Nosy @gpshead, @socketpair, @izbyshev

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = 'https://github.com/gpshead'
closed_at = <Date 2022-04-08.13:33:50.215>
created_at = <Date 2022-04-06.20:24:52.654>
labels = ['extension-modules', 'type-bug', '3.10', '3.11']
title = 'potential undefined behavior with subprocess using vfork() on Linux?'
updated_at = <Date 2022-04-08.13:33:50.214>
user = 'https://github.com/socketpair'

bugs.python.org fields:

activity = <Date 2022-04-08.13:33:50.214>
actor = 'socketpair'
assignee = 'gregory.p.smith'
closed = True
closed_date = <Date 2022-04-08.13:33:50.215>
closer = 'socketpair'
components = ['Extension Modules']
creation = <Date 2022-04-06.20:24:52.654>
creator = 'socketpair'
dependencies = []
files = []
hgrepos = []
issue_num = 47245
keywords = ['3.10regression']
message_count = 13.0
messages = ['416897', '416898', '416899', '416900', '416904', '416905', '416909', '416917', '416946', '416964', '416974', '416976', '416984']
nosy_count = 3.0
nosy_names = ['gregory.p.smith', 'socketpair', 'izbyshev']
pr_nums = []
priority = 'normal'
resolution = 'works for me'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue47245'
versions = ['Python 3.10', 'Python 3.11']

@socketpair
Copy link
Mannequin Author

socketpair mannequin commented Apr 6, 2022

Using vfork in bpo-35823 is VERY tricky. Please comment out vfork() usage for now. Yes, we can (should) use vfork(), but we have to rewrite the child code.

https://bugzilla.kernel.org/show_bug.cgi?id=215813

I would say it's URGENT.

@socketpair socketpair mannequin added 3.10 only security fixes 3.11 only security fixes extension-modules C modules in the Modules dir labels Apr 6, 2022
@socketpair
Copy link
Mannequin Author

socketpair mannequin commented Apr 6, 2022

Rewriting in a way that guarantee no stack (and heap) usage. Because stack is shared between child and parent. It seems there is no crossplatform way. Happily we can use some code like I wrote by link in the first message. OR, stick to posix_nspawn which is cross-platform.

@socketpair
Copy link
Mannequin Author

socketpair mannequin commented Apr 6, 2022

child_exec(exec_array, argv, envp, cwd,

child_exec(exec_array, argv, envp, cwd,
           p2cread, p2cwrite, c2pread, c2pwrite,
           errread, errwrite, errpipe_read, errpipe_write,
           close_fds, restore_signals, call_setsid,
           call_setgid, gid, call_setgroups, groups_size, groups,
           call_setuid, uid, child_umask, child_sigmask,
           py_fds_to_keep, preexec_fn, preexec_fn_args_tuple);

----------------

Py_NO_INLINE static void
child_exec(char *const exec_array[],
char *const argv[],
char *const envp[],
const char *cwd,
int p2cread, int p2cwrite,
int c2pread, int c2pwrite,

-------------

So, calling child_exec pushes to stack GUARANTEED. This is the bug. In fact everything works, but is too weak and break at any moment.

Again. Please comment calling vfork() for now. And re-implement child part later.

@socketpair
Copy link
Mannequin Author

socketpair mannequin commented Apr 6, 2022

Solution:

https://github.com/bminor/glibc/blob/master/sysdeps/unix/sysv/linux/spawni.c#L309

In short - do not use vfork(). Use clone(CLONE_VM | CLONE_VFORK). and do something with stack.

@gpshead
Copy link
Member

gpshead commented Apr 6, 2022

Can you provide a reproducable way to demonstrate evidence of a problem in CPython's use of the Linux libc vfork() implementation? A test case that causes a CPython parent or child process on Linux when built with HAVE_VFORK failing to function properly would help prioritize this because in practice nobody has reported problems in 3.10.

(we've deployed the subprocess vfork code into thousands production Python programs at work in the past year w/o issues being reported - though we have a constrained environment with use on only a couple of libc versions and limited set of kernels on a few very common architectures)

General thinking (possible dated and incorrect - against what https://man7.org/linux/man-pages/man2/vfork.2.html wording claims with its "or calls any other function" text):

Pushing additional data onto the stack in the child process _should_ not a problem. That by definition lands in previously unused pre-allocated stack space. If that page faults, that could map a new page into the process state shared by both the paused parent and running child. But this page mapping should be fine - the child exec that resumes the parent means the parent is the only one who sees it.

When the parent process resumes, sure, that data will be in that memory on the unallocated portion of stack, but critically the *stack pointer* in the parent process (a register) never changes. As far as I understand things, registers are not shared between vfork()ed processes. So the parent only sees some temporary data having been written to the unused region of the stack by the since-replaced by exec() child process. No big deal.

**Untrue wishful thinking**: If a new stack were needed on a given platform for use in the vfork()ed child, I'd like it to be the job of libc to take care of that for us. glibc sources do no such thing (every vfork supporting architecture has a vfork.S code that appears to make the syscall and jump back to the caller without stack manipulation).

@gpshead gpshead changed the title Subprocess with vfork() is broken potential undefined behavior with subprocess using vfork() on Linux Apr 6, 2022
@gpshead gpshead self-assigned this Apr 6, 2022
@gpshead gpshead added the type-bug An unexpected behavior, bug, or error label Apr 6, 2022
@gpshead
Copy link
Member

gpshead commented Apr 6, 2022

Immediate action item: Add a way for people to disable vfork at runtime by setting a flag in the subprocess module, just in case.

This can be backported to 3.10 - It'd provide an escape hatch for anyone without a need to rebuild Python to disable use of vfork() without resorting to LD_PRELOAD hacks.

This is not an urgent issue unless actual practical problems are being observed in the field.

@gpshead
Copy link
Member

gpshead commented Apr 6, 2022

Our current assumptions around the use of vfork() are very much glibc specific.

Another useful reference for reasoning, comments, and history is https://github.com/golang/go/blob/master/src/syscall/exec_linux.go#L146 forkAndExecInChild1

@izbyshev
Copy link
Mannequin

izbyshev mannequin commented Apr 7, 2022

In short: both this bug report and [1] are invalid.

The reason why doing syscall(SYS_vfork) is illegal is explained by Florian Weimer in [2]:

The syscall function in glibc does not protect the on-stack return address against overwriting, so it can't be used to call SYS_vfork on x86.

This is off-topic here because CPython calls vfork() via libc, but I'll still expand the Florian's comment. Suppose one wants to write my_vfork() wrapper over vfork syscall. When vfork syscall returns the first time, my_vfork() has to return to its caller. This necessarily involves knowing the return address. On some architectures this return address is stored on the stack by the caller (e.g. x86). The problem is that any data in my_vfork() stack frame can be overwritten by its caller once it returns the first time. Then, when vfork syscall returns the second time, my_vfork() could be unable to return to its caller because the data it fetches from its (now invalid) stack frame is garbage. This is precisely what happens when one implements my_vfork() as syscall(SYS_vfork). To avoid this, the most common strategy is to store the return address into a register that's guaranteed to be preserved around syscall by the OS ABI. For example, the x86-64 musl implementation [3] stores the return address in rdx (which is preserved around syscall) and then restores it after syscall (both on the first and the second return of the syscall).

Now back to CPython. The real problem with stack sharing between the child and the parent could be due to compiler bugs, e.g. if a variable stored on the stack is modified in the child branch of "if (vfork())", but the compiler assumes that some other variable sharing the stack slot with the first one is *not* modified in the parent branch (i.e. after vfork() returns the second time). But all production compilers handle vfork() (and setjmp(), which has a similar issue) in a special way to avoid this, and GCC has __attribute__((returns_twice)) that a programmer could use for custom functions behaving in this way (my_vfork() would have to use this attribute).

Regarding a separate stack for the child and clone(CLONE_VM|CLONE_VFORK), I considered this in bpo-35823, but this has its own problems. The most important one is that CPython would be in business of choosing the correct stack size for the child's stack, but CPython is *not* in control of all the code executing in the child because it calls into libc. In practice, people use various LD_PRELOAD-based software that wraps various libc functions (e.g. Scratchbox 2 build environment for Sailfish OS is an LD_PRELOAD-based sandbox), so even common-sense assumptions like "execve() in libc can't use a lot of stack!" might turn out to be wrong. CPython *could* work around that by using direct syscalls for everything in the child, or by using some "large" size that "should be enough for everybody", but I wouldn't want to see that unless we have a real problem with vfork(). Note that vfork()-based posix_spawn() implementations in C libraries do *not* have this problem because they fully control all code in the child (e.g. they would use a non-interposable execve() symbol or a direct syscall).

Immediate action item: Add a way for people to disable vfork at runtime by setting a flag in the subprocess module, just in case.

I don't think any action is needed at all, and I think this issue should be closed.

Our current assumptions around the use of vfork() are very much glibc specific.

Could you clarify what glibc-specific assumptions you mean? In bpo-35823 I tried to use as little assumptions as possible.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=215813
[2] https://bugzilla.kernel.org/show_bug.cgi?id=215813#c2
[3] https://git.musl-libc.org/cgit/musl/tree/src/process/x86_64/vfork.s?id=ced75472d7e3d73d5b057e36ccbc7b7fcba95104

@gpshead
Copy link
Member

gpshead commented Apr 7, 2022

Thanks! I agree with you that this is probably not an actual problem on Linux.

_I did look at the various glibc architecture vfork.s implementations: Cute tricks used on some where they need to avoid a stack modifying traditional return from vfork()._

As for glibc specifics, I'm mostly thinking of the calls we do in the child.

According to the "Standard Description (POSIX.1)" calls to anything other than _exit() or exec*() are not allowed. But the longer "Linux Description" in that vfork(2) man page does not say that. Which implies merely by omission that calls to other things are okay so long as you understand everything they do to the process heap/stack/state. (I wish it were explicit about that)

Some of the calls we do from our child_exec() code... many are likely "just" syscall shims and thus fine - but that is technically up to libc.

A few others are Py functions that go elsewhere in CPython and while they may be fine for practical reasons today with dangerous bits on conditional branches that technically should not be possible to hit given the state by the time we're at this point in _posixsubprocess, pose a future risk - anyone touching implementations of those is likely unaware of vfork'ed child limitations that must be met.

For example if one of the potential code paths that trigger an indirect Py_FatalError() is hit... that fatal exit code is definitely not post-vfork-child safe. The pre-exec child dying via that could screw up the vfork parent process's state.

@gpshead gpshead changed the title potential undefined behavior with subprocess using vfork() on Linux potential undefined behavior with subprocess using vfork() on Linux? Apr 7, 2022
@izbyshev
Copy link
Mannequin

izbyshev mannequin commented Apr 8, 2022

As for glibc specifics, I'm mostly thinking of the calls we do in the child.

According to the "Standard Description (POSIX.1)" calls to anything other than _exit() or exec*() are not allowed. But the longer "Linux Description" in that vfork(2) man page does not say that. Which implies merely by omission that calls to other things are okay so long as you understand everything they do to the process heap/stack/state. (I wish it were explicit about that)

If we're talking about the kernel side of things, sure, we rely on Linux being "sane" here, though I suppose on *BSDs the situation is similar.

Some of the calls we do from our child_exec() code... many are likely "just" syscall shims and thus fine - but that is technically up to libc.

Yes, but I wouldn't say that "being just syscall shims" is specific for glibc. It's just a "natural" property that just about any libc is likely to possess. (Yeah, I know, those are vague words, but in my experience "glibc-specific" is usually applied to some functionality/bug present in glibc and absent in other libcs, and I don't think we rely on something like that).

Of course, there are also LD_PRELOAD things that could be called instead of libc, but good news here is that we don't create new constrains for them (CPython is not the only software that uses vfork()), and they're on their own otherwise.

A few others are Py functions that go elsewhere in CPython and while they may be fine for practical reasons today with dangerous bits on conditional branches that technically should not be possible to hit given the state by the time we're at this point in _posixsubprocess, pose a future risk - anyone touching implementations of those is likely unaware of vfork'ed child limitations that must be met.

We already have async-signal-safety requirement for all such code because of fork(). Requirements of vfork() are a bit more strict, but at least the set of functions we have to watch for dangerous changes is the same. And I suspect that most practical violations of vfork()-safety also violate async-signal-safety.

For example if one of the potential code paths that trigger an indirect Py_FatalError() is hit... that fatal exit code is definitely not post-vfork-child safe. The pre-exec child dying via that could screw up the vfork parent process's state.

Yeah, and it can break the fork parent too, at least because it uses exit() (not _exit()), so stdio buffers will be flushed twice, in the child and in the parent.

@socketpair
Copy link
Mannequin Author

socketpair mannequin commented Apr 8, 2022

So, finally:

  1. Regarding vfork() and stack - everything is nice. No bugs because libc has nasty hacks for stack restoration.

  2. Having the ability to turn off vfork using environment variables is NICE. At least, one can easily compare the performance.

  3. We have to fix error-path in order not to change heap state (contents and allocations), possibly do not touch locks. During vfork() child execution - the only parent THREAD (not the process) is blocked. For example, it's not allowed to touch GIL. Child process may die unexpectedly and leave GIL locked. Is it possible to rewrite children path for vfork() case without any Py* calls ? As an idea, we can prepare all low-level things (all the pointers to strings and plain values) before vfork(), so child code will use only that data.

@izbyshev
Copy link
Mannequin

izbyshev mannequin commented Apr 8, 2022

  1. We have to fix error-path in order not to change heap state (contents and allocations), possibly do not touch locks. During vfork() child execution - the only parent THREAD (not the process) is blocked. For example, it's not allowed to touch GIL. Child process may die unexpectedly and leave GIL locked. Is it possible to rewrite children path for vfork() case without any Py* calls ? As an idea, we can prepare all low-level things (all the pointers to strings and plain values) before vfork(), so child code will use only that data.

What specifically do you propose to fix? There is no problem with GIL if the child dies because the GIL is locked and unlocked only by the parent and the child never touches it. Similarly, only Py_* calls known to be safe are used. As for "pointers to strings", it's not clear to me what you mean, but if you mean allocations, they are already done before (v)fork(), since the child code is required to be async-signal-safe even if plain fork() is used.

@socketpair
Copy link
Mannequin Author

socketpair mannequin commented Apr 8, 2022

I have studied assembler output of _posixsubprocess.o compilation. Yes, everything seems safe. So, I'm closing the bug.

@socketpair socketpair mannequin closed this as completed Apr 8, 2022
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
gpshead added a commit to gpshead/cpython that referenced this issue Apr 13, 2022
Just in case there is ever an issue with _posixsubprocess's use of
vfork() due to the complexity of using it properly and potential
directions that Linux platforms where it defaults to on could take, this
adds a failsafe so that users can disable its use entirely by setting
a global flag.

No known reason to disable it exists. But it'd be a shame to encounter
one and not be able to use CPython without patching and rebuilding it.

See the linked issue for some discussion on reasoning.
@gpshead
Copy link
Member

gpshead commented Apr 13, 2022

I'm reopening this to track adding a failsafe so people have a big-red-button style way to disable Python's use of vfork without recompiling just in case of future problems. That arguably should've been there from the start in 3.10. Though we still have no known need to use it, by the time you do it needs to already exist.

I'll do a manual more conservative backport of the disable_vfork_reason button to 3.10 for 3.10.5.

@gpshead gpshead reopened this Apr 13, 2022
@vstinner
Copy link
Member

I wish posix_spawn() could be used in more cases on Linux: see #86904.

gpshead added a commit that referenced this issue Apr 25, 2022
Just in case there is ever an issue with _posixsubprocess's use of
vfork() due to the complexity of using it properly and potential
directions that Linux platforms where it defaults to on could take, this
adds a failsafe so that users can disable its use entirely by setting
a global flag.

No known reason to disable it exists. But it'd be a shame to encounter
one and not be able to use CPython without patching and rebuilding it.

See the linked issue for some discussion on reasoning.

Also documents the existing way to disable posix_spawn.
@izbyshev
Copy link
Contributor

izbyshev commented May 1, 2022

I wish posix_spawn() could be used in more cases on Linux: see #86904.

On the contrary, I wish posix_spawn() completely disabled on Linux. Using different backends depending on Popen() arguments only adds possibility for hard-to-understand behavioral differences. And posix_spawn() in glibc is still buggy (#91307 (comment)).

@MojoVampire
Copy link
Contributor

@gpshead: It looks like you added the big-red-button; can this be closed?

@gpshead
Copy link
Member

gpshead commented May 1, 2022

On the contrary, I wish posix_spawn() completely disabled on Linux. Using different backends depending on Popen() arguments only adds possibility for hard-to-understand behavioral differences. And posix_spawn() in glibc is still buggy (#91307 (comment)).

Hah, joy, but not surprised, that at least proves that failsafes knobs to let modern things be disabled have uses.

The 3.10 PR being merged should close this issue out.

gpshead added a commit that referenced this issue May 1, 2022
…1932)

This does not alter the `_posixsubprocess.fork_exec()` private API to
avoid issues for anyone relying on that (bad idea) or for anyone who's
`subprocess.py` and `_posixsubprocess.so` upgrades may not become
visible to existing Python 3.10 processes at the same time.

Backports the concept of cd5726f.

Provides a fail-safe way to disable vfork for #91401.

I didn't backport the documentation as I don't actually expect this to be used and `.. versionadded: 3.10.5` always looks weird in docs. It's being done more to have a fail-safe in place for people just in case.
@gpshead gpshead closed this as completed May 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.10 only security fixes 3.11 only security fixes extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants