classification
Title: Popen.wait() hangs when called from a signal handler when os.waitpid() does not
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.9, Python 3.8, Python 3.7, Python 3.6, Python 3.5
process
Status: open Resolution: remind
Dependencies: Superseder:
Assigned To: gregory.p.smith Nosy List: chris.jerdonek, gregory.p.smith, vapier
Priority: normal Keywords: 3.4regression, 3.5regression

Created on 2015-12-27 10:58 by chris.jerdonek, last changed 2020-02-25 06:06 by vapier.

Messages (11)
msg257071 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2015-12-27 10:58
I came across a situation where Popen.wait() hangs and os.waitpid() doesn't hang.  It seems like a bug, but I don't know for sure.  This is with Python 3.5.1.

To reproduce, save the following to demo.py:

    import os, signal, subprocess

    class State:
        process = None

    def handle_signal(signalnum, frame):
        print("Handling: {0}".format(signalnum))
        p = State.process
        # The following line hangs:
        p.wait()
        # However, this line does not hang:
        # os.waitpid(p.pid, 1)
        print("Done waiting")

    signal.signal(signal.SIGINT, handle_signal)
    p = subprocess.Popen("while true; do echo sleeping; sleep 2; done",
                         shell=True)
    State.process = p
    p.wait()

Then run the following, hit Control-C, and it will hang:

    $ python demo.py 
    sleeping
    sleeping
    ^CHandling: 2

It will still hang if you insert "p.terminate()" right before p.wait().

However, calling "os.waitpid(p.pid, ...)" instead of p.wait() exits immediately and does not hang.  The behavior also occurs without shell=True.
msg257072 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2015-12-27 11:37
More information:

No hang: 2.7.11, 3.3.6, 3.4.0

Hang: 3.4.1 3.4.3, 3.5.0, 3.5.1

Is this a regression or a bug fix?  Maybe issue #21291 is related which was fixed in 3.4.1.  Also, this is on Mac OS X 10.11 for me.
msg257087 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2015-12-27 19:04
This appears due to Popen._waitpid_lock.  It is not being released when the signal fires and the while loop containing the with self._waitpid_lock repeats.

This also reproduces when using subprocess32.
msg257089 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2015-12-27 19:16
fundamentally: this shouldn't work anyways.

You are calling wait() from a signal handler.
That is a blocking operation.
You cannot do that from a signal handler.

os.waitpid(p.pid, 1) is os.waitpid(p.pid, os.WNOHANG) which is a non-blocking operation so it works.
msg257091 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2015-12-27 19:53
Thanks for looking into this so quickly.

> os.waitpid(p.pid, 1) is os.waitpid(p.pid, os.WNOHANG) which is a non-blocking operation so it works.

For the record, it also works with "os.waitpid(p.pid, 0)."  I should have written 0 the first time.  Does your point still hold?
msg257105 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2015-12-28 04:14
I think it still applies.  os.waitpid even in blocking mode could hang at the whim of the OS.  you aren't guaranteed that your child process has died and the OS is ready to return a status code.
msg362583 - (view) Author: Mike Frysinger (vapier) Date: 2020-02-24 10:29
> fundamentally: this shouldn't work anyways.
>
> You are calling wait() from a signal handler.
> That is a blocking operation.
> You cannot do that from a signal handler.

what definition/spec are you referring to here ?  is this a Python limitation ?  i don't see any mention in the Python documentation on the subject:
https://docs.python.org/3/library/signal.html

maybe you meant this as an OS limitation ?  it's certainly not a POSIX limitation as it's quite clear on the subject:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html
> The following table defines a set of functions that shall be async-signal-safe. Therefore, applications can call them, without restriction, from signal-catching functions.
> wait()
> waitpid()

in general, the idea that a signal handler "cannot block" is weird.  i guess you're saying that anything that involves the OS is forbidden ?  there's no guarantee something as simple as a file open() or close() won't block, or really any syscall at all.

i bisected a similar testcase from the OP down ... python 3.4.0 works, but 3.4.1 fails.  looks like it's due to this change:

https://github.com/python/cpython/commit/d65ba51e245ffdd155bc1e7b8884fc943048111f
Author: Gregory P. Smith <greg@krypto.org>
Date:   Wed Apr 23 00:27:17 2014 -0700

    subprocess's Popen.wait() is now thread safe so that multiple threads
    may be calling wait() or poll() on a Popen instance at the same time
    without losing the Popen.returncode value.  Fixes issue #21291.
msg362607 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2020-02-24 23:22
yes, Popen's use of a lock since that change means that Popen.wait() cannot be called in an asynchronous context where its own execution could be blocking the execution of code that would unlock the lock.

at this point we should probably just document that.
msg362608 - (view) Author: Mike Frysinger (vapier) Date: 2020-02-25 00:13
to be clear, there is no Python or OS restriction that you're aware of for your earlier statement ?  you just want to make it into a new Popen restriction that didn't previously exist ?

we came across this bug as we upgraded our existing Python 2.7 codebase to Python 3.6.  so even if it's "been this way for a while" (which is to say, since the 3.4.1 release), at least some people are going to be coming across this for the first time as they migrate.

if the popen APIs aren't going to handle this correctly (check threading.active_count?), can we at least get a knob to disable it or class methods we can stub out ?  we never use threads, so having popen add support for a runtime mode we never utilize is kind of annoying.

for now i'm defining a custom subprocess.Popen class to break the lock, but it's kind of terrible to have to access _waitpid_lock outside of the subprocess module.
msg362609 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2020-02-25 00:30
Reopening so that spending time on deciding this in the future at least stays on my radar.

Your workaround sounds like the reasonable, if understandably not pretty approach for now.
msg362625 - (view) Author: Mike Frysinger (vapier) Date: 2020-02-25 06:06
if threading.active_count() returns 1, then you know there's one thread, and you're it, so it's not racey, and a lock ins't needed.

thinking a bit more, what if the code just use a recursive lock ?  that would restore the single threaded status quo without overly complicating the code.

thinking a bit more, i think you're right that this is inherently racy, but not because using the APIs in async context isn't permissible.  if a signal is delivered after Popen.wait() finishes the OS waitpid call, but before it releases the lock, then the child state is not recoverable from the signal handler.

in our case, we're using the signal handler to kill the process, and we want to make sure it's exited, so being able to get the exact exit status is not important to us, just making sure we can detect when the process is gone.
History
Date User Action Args
2020-02-25 06:06:45vapiersetmessages: + msg362625
2020-02-25 00:31:01gregory.p.smithsetkeywords: + 3.4regression, 3.5regression
2020-02-25 00:30:46gregory.p.smithsetstatus: closed -> open
resolution: wont fix -> remind
messages: + msg362609

versions: + Python 3.6, Python 3.7, Python 3.8, Python 3.9
2020-02-25 00:13:37vapiersetmessages: + msg362608
2020-02-24 23:22:02gregory.p.smithsetmessages: + msg362607
2020-02-24 10:29:36vapiersetnosy: + vapier
messages: + msg362583
2015-12-28 04:14:45gregory.p.smithsetmessages: + msg257105
2015-12-27 19:55:23chris.jerdoneksettitle: Popen.wait() hangs when called from a signal handler when os.waitpid(pid, os.WNOHANG) does not -> Popen.wait() hangs when called from a signal handler when os.waitpid() does not
2015-12-27 19:53:44chris.jerdoneksetmessages: + msg257091
2015-12-27 19:16:58gregory.p.smithsetstatus: open -> closed
resolution: wont fix
title: Popen.wait() hangs with SIGINT when os.waitpid() does not -> Popen.wait() hangs when called from a signal handler when os.waitpid(pid, os.WNOHANG) does not
2015-12-27 19:16:04gregory.p.smithsetmessages: + msg257089
2015-12-27 19:04:33gregory.p.smithsetassignee: gregory.p.smith
2015-12-27 19:04:25gregory.p.smithsetmessages: + msg257087
2015-12-27 11:37:45chris.jerdoneksetnosy: + gregory.p.smith
messages: + msg257072
2015-12-27 10:58:02chris.jerdonekcreate