classification
Title: subprocess.Popen.send_signal() should poll the process
Type: Stage: resolved
Components: Library (Lib) Versions: Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: SpecLad, giampaolo.rodola, njs, oconnor663, vstinner
Priority: normal Keywords: patch

Created on 2019-10-29 11:18 by vstinner, last changed 2020-12-03 17:05 by oconnor663. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 16984 merged vstinner, 2019-10-29 11:37
Messages (25)
msg355645 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-29 11:18
subprocess.Popen.send_signal() doesn't check if the process exited since the poll() method has been called for the last time. If the process exit just before os.kill() is called, the signal can be sent to the wrong process if the process identified is recycled.

Attached PR simply calls poll() once to reduce the time window when this race condition can occur, but it doesn't fully kill the race condition.

--

See also the new "pidfd" API which only landed very recently in the Linux kernel to prevent this issue:

* https://lwn.net/Articles/773459/
  "Toward race-free process signaling" (this articles describes this issue)
* https://lwn.net/Articles/789023/ 
  "New system calls: pidfd_open() and close_range()"
* https://kernel-recipes.org/en/2019/talks/pidfds-process-file-descriptors-on-linux/
  "pidfds: Process file descriptors on Linux" by Chrisitan Brauner

Illumos, OpenBSD, NetBSD and FreeBSD have similar concepts.

I don't propose to use pidfd here, but it's just to highlight that it's a real issue and that kernels are evolving to provide more reliable solutions against the kill(pid, sig) race condition ;-)
msg355647 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-29 11:41
I noticed this issue by reading subprocess code while working on bpo-38207 ("subprocess: Popen.kill() + Popen.communicate() is blocking until all processes using the pipe close the pipe") and bpo-38502 ("regrtest: use process groups").
msg355706 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-30 11:26
> subprocess.Popen.send_signal() doesn't check if the process exited since the poll() method has been called for the last time. If the process exit just before os.kill() is called, the signal can be sent to the wrong process if the process identified is recycled.

I'm not sure under which conditions this case happens.

I understood that the pid is released (and so can be reused) as soon as the child process completes *and* waitpid() has been called.

Two cases:

* os.waitpid(pid, 0) can be called directly ("raw call")
* Popen API used, but called from a different thread: thread A calls send_signal(), thread B calls waitpid()

I'm mostly concerned by the multithreading case. I noticed the race condition while working on regrtest which uses 2 threads. One thread which is responsible for the process (call .communicate(), call .wait(), etc.), and one "main" thread which sends a signal to every child processes to interrupt them.

My first implementation called .wait() after calling .send_signal() in the main thread. But it seems like Popen is not "fully" thread safe: the "waitpid lock" only protects the os.waitpid() call, it doesn't protect the self.returncode attribute.

I modified regrtest to only call .waitpid() in the thread responsible of the process, to avoid such race condition. It seems like the new design is more reliable. The main thread only calls .send_signal(): it doesn't call .wait() (which calls os.waitpid()) anymore.
msg355856 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2019-11-02 01:50
-1 about the PR solution to suppress ProcessLookupError in case the process is gone. In psutil I solved the “pid reused problem” by using process creation time to identify a process uniquely (on start). 
A decorator can be used to protect the sensibile methods interacting with the PID/handle (communicate(), send_signal(), terminate(), kill()) and raise an exception (maybe ProcessLooKupError(“PID has been reused”)?).
msg355917 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-11-04 01:27
> -1 about the PR solution to suppress ProcessLookupError in case the process is gone.

Did someone propose a pull request to fix this issue by ignoring ProcessLookupError?

> In psutil I solved the “pid reused problem” by using process creation time to identify a process uniquely (on start). 

Well, that requires to write platform specific code.

I would prefer to only use platform specific code when the platform provides something like pidfd: I understand that Linux, FreeBSD, OpenBSD, NetBSD, and Illumos are concerned. I mean reuse an exisiting solution, rather than writing our own "workaround" (solution?).
msg355918 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-11-04 01:28
> I would prefer to only use platform specific code when the platform provides something like pidfd: I understand that Linux, FreeBSD, OpenBSD, NetBSD, and Illumos are concerned. I mean reuse an exisiting solution, rather than writing our own "workaround" (solution?).

By the way, I suggest to stick to the initial problem in this issue. If someone is intersted to explore the pidfd idea, I recommend to open a different issue ;-)
msg355928 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2019-11-04 07:55
> Did someone propose a pull request to fix this issue by ignoring ProcessLookupError?

I misread your PR, sorry. I thought that was the effect.
msg356073 - (view) Author: Nathaniel Smith (njs) * (Python committer) Date: 2019-11-05 23:07
It sounds like there's actually nothing to do here? (Except maybe eventually switch to pidfd or similar, but Victor says he wants to use a different issue for that.) Can this be closed?
msg356074 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-11-05 23:10
> It sounds like there's actually nothing to do here? (Except maybe eventually switch to pidfd or similar, but Victor says he wants to use a different issue for that.) Can this be closed?

I opened this issue to propose PR 16984. Did you notice the PR?

In short, I propose to call poll() before calling os.kill() in send_signal().
msg356075 - (view) Author: Nathaniel Smith (njs) * (Python committer) Date: 2019-11-05 23:12
But then deeper in the thread it sounded like you concluded that this didn't actually help anything, which is what I would expect :-).
msg356076 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-11-05 23:19
> But then deeper in the thread it sounded like you concluded that this didn't actually help anything, which is what I would expect :-).

The PR contains a test which demonstrate one race condition. It fails  without the fix.
msg356077 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-11-05 23:19
> (Except maybe eventually switch to pidfd or similar, but Victor says he wants to use a different issue for that.)

pidfd is not available on all platforms (ex: Windows).
msg356084 - (view) Author: Nathaniel Smith (njs) * (Python committer) Date: 2019-11-06 00:35
You can't solve a time-of-check-to-time-of-use race by adding another check. I guess your patch might narrow the race window slightly, but it was a tiny window before and it's a tiny window after, so I don't really get it.

The test doesn't demonstrate a race condition. What it demonstrates is increased robustness against user code that corrupts Popen's internal state by reaping one of its processes behind its back.

That kind of robustness might be a good motivation on its own! (I actually just did a bunch of work to make trio more robust against user code closing fds behind its back, which is a similar kind of thing.) But it's a very different motivation than what you say in this bug.
msg356085 - (view) Author: Nathaniel Smith (njs) * (Python committer) Date: 2019-11-06 00:43
Hmm, you know, on further thought, it is actually possible to close this race for real, without pidfd.

What you do is split 'wait' into two parts: first it waits for me process to become reapable without actually reaping it. On Linux you can do this with waitid+WNOWAIT. On kqueue platforms you can do it with kqueue.

Then, you take a lock, and use it to atomically call waitpid/waitid to reap the process + update self.returncode to record that you've done so.

In send_signal, we use the same lock to atomically check whether the process has been reaped, and then send the signal if it hasn't.

That doesn't fix your test, but it does fix the race condition: as long as users only interact with the process through the Popen API, it guarantees that send_signal will never rather the wrong process.
msg356571 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2019-11-14 03:19
To further elaborate on the creation time solution, the idea in pseudo-code is the following:

class Popen:

    def __init__(self, ...):
        self._execute_child(...)
        try:
            self._ctime = get_create_time(self.pid)
        except ProcessLookupError:
            self._ctime = None

    def send_signal(self, sig):
        if self._ctime is not None:
            if self._ctime != get_create_time(self.pid):
                raise ProecssLookupError("PID has been reused")
        os.kill(self.pid, sig)

Technically there is still a race condition between _execute_child() and get_create_time(), but:

1) the time between the two calls is supposed to be extremely short
2) an OS should not reuse a PID that fast, precisely in order to minimize the chances of such a race condition to occur
3) as a general rule, on most systems PIDs are supposed to be assigned sequentially and wrap around. E.g. on Linux:

~$ for i in $(seq 20); do ps; done | grep ps
3199 pts/0    00:00:00 ps
3200 pts/0    00:00:00 ps
3201 pts/0    00:00:00 ps
3202 pts/0    00:00:00 ps
3203 pts/0    00:00:00 ps
3204 pts/0    00:00:00 ps
3205 pts/0    00:00:00 ps
3207 pts/0    00:00:00 ps
3208 pts/0    00:00:00 ps
3209 pts/0    00:00:00 ps
3210 pts/0    00:00:00 ps
3213 pts/0    00:00:00 ps
3214 pts/0    00:00:00 ps
3215 pts/0    00:00:00 ps
3216 pts/0    00:00:00 ps
3217 pts/0    00:00:00 ps
3218 pts/0    00:00:00 ps
3219 pts/0    00:00:00 ps
3221 pts/0    00:00:00 ps
3223 pts/0    00:00:00 ps

As for Windows, the termination is based on the process handle instead of the PID, so I assume that means Windows is safe in this regard. There was a prior discussion about this actually:
https://bugs.python.org/issue36067#msg336243
Getting process creation time is relatively easy, even though platform-dependent (and should be done in C).
pidfd API would be nicer to use, but it seems it's not portable and relatively recent.
msg356593 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-11-14 11:08
> To further elaborate on the creation time solution, the idea in pseudo-code is the following: (...)
> Technically there is still a race condition between _execute_child() and get_create_time(), but: (...)

Even if the approach seems to be different, PR 16984 seems to have the same advantages and drawbacks.

PR 16984 rely on the fact that the Linux kernel doesn't use a pid until its parent calls os.waitpid(). Even if the child process completed, the pid remains assigned (and the process status is "zombie").

PR 16984 has the advantage that it reuses existing code which has been battle tested: we are sure that the code is portable and works well.

Whereas get_create_time() will be likely platform specific and will add a little maintenance burden.

I know that PR 16984 is a partial solution (there is still a race condition if waitpid() is called directly, without using the Popen API), but I don't see a strong reason to prefer get_create_time().
msg358137 - (view) Author: Роман Донченко (SpecLad) * Date: 2019-12-09 20:11
> What you do is split 'wait' into two parts: first it waits for me process to become reapable without actually reaping it. On Linux you can do this with waitid+WNOWAIT. On kqueue platforms you can do it with kqueue.

> Then, you take a lock, and use it to atomically call waitpid/waitid to reap the process + update self.returncode to record that you've done so.

> In send_signal, we use the same lock to atomically check whether the process has been reaped, and then send the signal if it hasn't.

It's a good idea, but it would break the scenario where two threads call wait() concurrently. It would create this race condition:

1. Thread A reaps the process.
2. Thread B thinks the process is still running, so it calls waitid+WNOHANG on a stale PID, with unpredictable results.
3. Thread A sets self.returncode.

What is needed here is a reader-writer lock. subprocess.wait would work like this (pseudocode):

with lock taken for reading:
    os.waitid(..., WNOHANG)
with lock taken for writing:
    os.waitid(...)
    self.returncode = ...

Whereas subprocess.send_signal would work like this:

with lock taken for reading:
    os.kill(...)

Unfortunately, it doesn't seem like Python has reader-writer locks in the library...
msg358142 - (view) Author: Nathaniel Smith (njs) * (Python committer) Date: 2019-12-09 21:18
>  Thread B thinks the process is still running, so it calls waitid+WNOHANG on a stale PID, with unpredictable results.

I'm pretty sure you mean WNOWAIT, right? The actual reaping step (which might use WNOHANG) is already protected by a lock, but there's a danger that a process might try to perform the blocking-until-exit step (which uses WNOWAIT) on a stale pid.


Good catch!

I think this can be fixed by using a second lock around all of .wait(). But this is a bit tricky because we also need to account for concurrent .poll() calls. So something like:

def wait(self):
    # Don't take the big lock unless the process is actually still running
    # This isn't just an optimization; it's critical for poll() correctness
    if self.poll() is not None:
        return self.returncode

    with self._blocking_wait_lock:
        with self._returncode_lock:
            revalidate pid licenses

        block_until_child_exits()

        with self._returncode_lock:
            reap the child and mark it as reaped

def poll(self):
    # if someone is already blocked waiting for the child, then it definitely
    # hasn't exited yet, so we don't need to call waitpid ourselves.
    # This isn't just an optimization; it's critical for correctness.
    if not self._blocking_wait_lock.acquire(blocking=False):
        return None
    try:
        with self._returncode_lock:
            do the actual poll + returncode updating
    finally:
        self._blocking_wait_lock.release()

Notes:

If there's already a wait() running, and someone calls poll(), then we have to make sure the poll() can't reap the process out from under the wait(). To fix that, poll skips trying in case wait is already running.

But, this could introduce its own race condition: if a process has exited but we haven't noticed yet, and then someone calls wait() and poll() simultaneously, then the wait() might take the lock, then poll() notices the lock is taken, and concludes that the process can't have exited yet. If course wait() will immediately reap the process and drop the lock, but by this point poll() has already returned the wrong information.

The problem is that poll() needs to be able to treat "the blocking wait lock is taken" as implying "the process hasn't exited yet". So to make that implication true, we add a check at the top of wait().

Of course if a process exits while wait() is running, and someone calls poll() in that brief interval between it exiting and wait() noticing, then poll() could again fail to report it exiting. But I think this is fine: it's ok if poll() is a fraction of a second out of date; that race condition is inherent in its API. The problem would be if the fails to notice a process that exited a while ago and is just sitting around waiting to be reaped.

... Ugh but there is still one more race condition here. I think this fixes all the cases involving send_signal, wait, and poll interactions with each other, BUT we broke the case of two threads calling poll() at the same time. One thread will take the blocking_wait_lock, then the other will take this as evidence that someone is blocked in wait() and exit early, which isn't appropriate.

I think we can patch up this last race condition by adding yet one more lock: a simple

with self._poll_lock:

around the whole body of poll(), so that only one thread can call it at a time.
msg358146 - (view) Author: Роман Донченко (SpecLad) * Date: 2019-12-09 21:46
> I'm pretty sure you mean WNOWAIT, right?

Right.

> revalidate pid licenses

What does this mean?

> I think we can patch up this last race condition by adding yet one more lock

Wouldn't it be sufficient to add

if self.returncode is not None:
    return self.returncode

at the top of poll()?

(And in fact, you don't even need to add it, since an equivalent of this check is already there.)

I think this makes calling poll() from wait() unnecessary too.
msg358147 - (view) Author: Nathaniel Smith (njs) * (Python committer) Date: 2019-12-09 22:04
>             revalidate pid licenses

It means autocorrect mangled the text... I don't remember what word that's supposed to be, but basically I meant "revalidate the pid hasn't been reaped"

> Wouldn't it be sufficient to add
> 
> if self.returncode is not None:
>     return self.returncode
> 
> at the top of poll()?

No, the race condition is:

1. Process exits
[some time passes]
2. Thread 1 and Thread 2 both call poll() simultaneously
3. Thread 1 and Thread 2 both see that the process hasn't been reaped
4. Thread 1 attempts to take the blocking_wait_lock, to make sure no-one is blocked in wait(). It succeeds.
5. Thread 2 attempts to take the blocking_wait_lock, and it fails (because Thread 1 already has it). Thread 2's poll() returns saying that the process hasn't exited yet (which is wrong!)
6. Thread 1 calls waitpid(..., WNOHANG) and reaps the process, then releases the lock.

So adding a check at the top (= step 3) doesn't help. The problem is in steps 4/5.
msg360064 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-01-15 16:38
New changeset e85a305503bf83c5a8ffb3a988dfe7b67461cbee by Victor Stinner in branch 'master':
bpo-38630: Fix subprocess.Popen.send_signal() race condition (GH-16984)
https://github.com/python/cpython/commit/e85a305503bf83c5a8ffb3a988dfe7b67461cbee
msg360065 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-01-15 16:53
While there was no strong agreement on my change, Nathaniel and Giampaolo said that it should not harm to merge it :-) So I merged my change.

It has been said multiple times: my change doesn't fully close the race condition. It only reduces the risk that it happens. I rephrased my change to mention that there is still room for the race condition and that polling only *reduces* the risk, it doesn't fully fix the race condition.

--

Changing locking can help. But I'm not sure that it covers *all* cases. For example, what if another function calls os.waitpid() directly for whatever reason and the process completed? The pid recycling issue can also happen in this case, no?

Linux pidfd really fix the issue. If someone wants to enhance subprocess to use the new os.pidfd_open(), I suggest to open a separated issue.

Another approach is to emulate pidfd in userspace. Giampaolo explains that he also compares the process creation time for that (msg356571). I'm not excited to workaround OS issue this way. It requires to write platform specific get_create_time() code. I would prefer to reuse what the OS provides when available: Linux pidfd.

--

This issue title is quite explicit on purpose: "subprocess.Popen.send_signal() should poll the process". I now close this issue: send_signal() does now poll :-)

Because there is no *strong* consensus on my change, I decided to not backport it to stable branches 3.7 and 3.8.

Thanks everyone for this very interesting discussion!
msg382382 - (view) Author: Jack O'Connor (oconnor663) * Date: 2020-12-03 06:12
This change caused a crash in the Duct library in Python 3.9. Duct uses the waitid(NOWAIT) strategy that Nathaniel Smith has described in this thread. With this change, the child process is reaped by kill() in a spot we didn't expect, and a subsequent call to os.waitid() raises a ChildProcessError. This is a race condition that only triggers if the child happens to exit before kill() is called.

I just pushed https://github.com/oconnor663/duct.py/commit/5dfae70cc9481051c5e53da0c48d9efa8ff71507 to work around this, which I'll release shortly as Duct version 0.6.4.

Broadly speaking, this change could break any program that uses Popen.kill() together with os.waitpid() or os.waitid(). Checking Popen.returncode before calling the os functions is a good workaround for the single-threaded case, but it doesn't fix the multithreaded case. Duct is going to avoid calling Popen.kill() entirely. There are some longer comments about race conditions in that commit I linked.

I don't feel strongly one way the other about keeping this new behavior, but we should probably document it clearly in Popen.send_signal() and all of its callers that these functions might reap the child.
msg382397 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-12-03 09:42
On Linux, a pidfd file descriptor can be created with os.pidfd_open() (added to Python 3.9). It would avoid even more race conditions. The best would be to request a pidfd file descriptor directly when we spawn the process which is possible on recent Linux kernel versions.

> we should probably document it clearly in Popen.send_signal() and all of its callers that these functions might reap the child.

Feel free to propose a PR to document the behavior.

I'm not sure of what you mean by send_signal() which "reaps" the child process.

See also bpo-40550.

Anyway, this issue and bpo-40550 are closed. I suggest to open a new issue to enhance the documentation.
msg382430 - (view) Author: Jack O'Connor (oconnor663) * Date: 2020-12-03 17:05
Filed separately as https://bugs.python.org/issue42558.
History
Date User Action Args
2020-12-03 17:05:56oconnor663setmessages: + msg382430
2020-12-03 09:42:54vstinnersetmessages: + msg382397
2020-12-03 06:12:38oconnor663setnosy: + oconnor663
messages: + msg382382
2020-01-15 16:53:35vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg360065

stage: patch review -> resolved
2020-01-15 16:38:59vstinnersetmessages: + msg360064
2019-12-09 22:04:42njssetmessages: + msg358147
2019-12-09 21:46:45SpecLadsetmessages: + msg358146
2019-12-09 21:18:18njssetmessages: + msg358142
2019-12-09 20:11:43SpecLadsetnosy: + SpecLad
messages: + msg358137
2019-11-14 11:08:49vstinnersetmessages: + msg356593
2019-11-14 03:19:22giampaolo.rodolasetmessages: + msg356571
2019-11-06 00:43:48njssetmessages: + msg356085
2019-11-06 00:35:37njssetmessages: + msg356084
2019-11-05 23:19:32vstinnersetmessages: + msg356077
2019-11-05 23:19:01vstinnersetmessages: + msg356076
2019-11-05 23:12:49njssetmessages: + msg356075
2019-11-05 23:10:39vstinnersetmessages: + msg356074
2019-11-05 23:07:40njssetnosy: + njs
messages: + msg356073
2019-11-04 07:55:40giampaolo.rodolasetmessages: + msg355928
2019-11-04 01:28:15vstinnersetmessages: + msg355918
2019-11-04 01:27:22vstinnersetmessages: + msg355917
2019-11-02 01:50:25giampaolo.rodolasetnosy: + giampaolo.rodola
messages: + msg355856
2019-10-30 11:26:26vstinnersetmessages: + msg355706
2019-10-29 11:41:10vstinnersetmessages: + msg355647
2019-10-29 11:37:07vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request16510
2019-10-29 11:18:26vstinnercreate