msg257071 - (view) |
Author: Chris Jerdonek (chris.jerdonek) * |
Date: 2015-12-27 10:58 |
I came across a situation where Popen.wait() hangs and os.waitpid() doesn't hang. It seems like a bug, but I don't know for sure. This is with Python 3.5.1.
To reproduce, save the following to demo.py:
import os, signal, subprocess
class State:
process = None
def handle_signal(signalnum, frame):
print("Handling: {0}".format(signalnum))
p = State.process
# The following line hangs:
p.wait()
# However, this line does not hang:
# os.waitpid(p.pid, 1)
print("Done waiting")
signal.signal(signal.SIGINT, handle_signal)
p = subprocess.Popen("while true; do echo sleeping; sleep 2; done",
shell=True)
State.process = p
p.wait()
Then run the following, hit Control-C, and it will hang:
$ python demo.py
sleeping
sleeping
^CHandling: 2
It will still hang if you insert "p.terminate()" right before p.wait().
However, calling "os.waitpid(p.pid, ...)" instead of p.wait() exits immediately and does not hang. The behavior also occurs without shell=True.
|
msg257072 - (view) |
Author: Chris Jerdonek (chris.jerdonek) * |
Date: 2015-12-27 11:37 |
More information:
No hang: 2.7.11, 3.3.6, 3.4.0
Hang: 3.4.1 3.4.3, 3.5.0, 3.5.1
Is this a regression or a bug fix? Maybe issue #21291 is related which was fixed in 3.4.1. Also, this is on Mac OS X 10.11 for me.
|
msg257087 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2015-12-27 19:04 |
This appears due to Popen._waitpid_lock. It is not being released when the signal fires and the while loop containing the with self._waitpid_lock repeats.
This also reproduces when using subprocess32.
|
msg257089 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2015-12-27 19:16 |
fundamentally: this shouldn't work anyways.
You are calling wait() from a signal handler.
That is a blocking operation.
You cannot do that from a signal handler.
os.waitpid(p.pid, 1) is os.waitpid(p.pid, os.WNOHANG) which is a non-blocking operation so it works.
|
msg257091 - (view) |
Author: Chris Jerdonek (chris.jerdonek) * |
Date: 2015-12-27 19:53 |
Thanks for looking into this so quickly.
> os.waitpid(p.pid, 1) is os.waitpid(p.pid, os.WNOHANG) which is a non-blocking operation so it works.
For the record, it also works with "os.waitpid(p.pid, 0)." I should have written 0 the first time. Does your point still hold?
|
msg257105 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2015-12-28 04:14 |
I think it still applies. os.waitpid even in blocking mode could hang at the whim of the OS. you aren't guaranteed that your child process has died and the OS is ready to return a status code.
|
msg362583 - (view) |
Author: Mike Frysinger (vapier) |
Date: 2020-02-24 10:29 |
> fundamentally: this shouldn't work anyways.
>
> You are calling wait() from a signal handler.
> That is a blocking operation.
> You cannot do that from a signal handler.
what definition/spec are you referring to here ? is this a Python limitation ? i don't see any mention in the Python documentation on the subject:
https://docs.python.org/3/library/signal.html
maybe you meant this as an OS limitation ? it's certainly not a POSIX limitation as it's quite clear on the subject:
https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html
> The following table defines a set of functions that shall be async-signal-safe. Therefore, applications can call them, without restriction, from signal-catching functions.
> wait()
> waitpid()
in general, the idea that a signal handler "cannot block" is weird. i guess you're saying that anything that involves the OS is forbidden ? there's no guarantee something as simple as a file open() or close() won't block, or really any syscall at all.
i bisected a similar testcase from the OP down ... python 3.4.0 works, but 3.4.1 fails. looks like it's due to this change:
https://github.com/python/cpython/commit/d65ba51e245ffdd155bc1e7b8884fc943048111f
Author: Gregory P. Smith <greg@krypto.org>
Date: Wed Apr 23 00:27:17 2014 -0700
subprocess's Popen.wait() is now thread safe so that multiple threads
may be calling wait() or poll() on a Popen instance at the same time
without losing the Popen.returncode value. Fixes issue #21291.
|
msg362607 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2020-02-24 23:22 |
yes, Popen's use of a lock since that change means that Popen.wait() cannot be called in an asynchronous context where its own execution could be blocking the execution of code that would unlock the lock.
at this point we should probably just document that.
|
msg362608 - (view) |
Author: Mike Frysinger (vapier) |
Date: 2020-02-25 00:13 |
to be clear, there is no Python or OS restriction that you're aware of for your earlier statement ? you just want to make it into a new Popen restriction that didn't previously exist ?
we came across this bug as we upgraded our existing Python 2.7 codebase to Python 3.6. so even if it's "been this way for a while" (which is to say, since the 3.4.1 release), at least some people are going to be coming across this for the first time as they migrate.
if the popen APIs aren't going to handle this correctly (check threading.active_count?), can we at least get a knob to disable it or class methods we can stub out ? we never use threads, so having popen add support for a runtime mode we never utilize is kind of annoying.
for now i'm defining a custom subprocess.Popen class to break the lock, but it's kind of terrible to have to access _waitpid_lock outside of the subprocess module.
|
msg362609 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2020-02-25 00:30 |
Reopening so that spending time on deciding this in the future at least stays on my radar.
Your workaround sounds like the reasonable, if understandably not pretty approach for now.
|
msg362625 - (view) |
Author: Mike Frysinger (vapier) |
Date: 2020-02-25 06:06 |
if threading.active_count() returns 1, then you know there's one thread, and you're it, so it's not racey, and a lock ins't needed.
thinking a bit more, what if the code just use a recursive lock ? that would restore the single threaded status quo without overly complicating the code.
thinking a bit more, i think you're right that this is inherently racy, but not because using the APIs in async context isn't permissible. if a signal is delivered after Popen.wait() finishes the OS waitpid call, but before it releases the lock, then the child state is not recoverable from the signal handler.
in our case, we're using the signal handler to kill the process, and we want to make sure it's exited, so being able to get the exact exit status is not important to us, just making sure we can detect when the process is gone.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:25 | admin | set | github: 70148 |
2020-02-25 06:06:45 | vapier | set | messages:
+ msg362625 |
2020-02-25 00:31:01 | gregory.p.smith | set | keywords:
+ 3.4regression, 3.5regression |
2020-02-25 00:30:46 | gregory.p.smith | set | status: closed -> open resolution: wont fix -> remind messages:
+ msg362609
versions:
+ Python 3.6, Python 3.7, Python 3.8, Python 3.9 |
2020-02-25 00:13:37 | vapier | set | messages:
+ msg362608 |
2020-02-24 23:22:02 | gregory.p.smith | set | messages:
+ msg362607 |
2020-02-24 10:29:36 | vapier | set | nosy:
+ vapier messages:
+ msg362583
|
2015-12-28 04:14:45 | gregory.p.smith | set | messages:
+ msg257105 |
2015-12-27 19:55:23 | chris.jerdonek | set | title: Popen.wait() hangs when called from a signal handler when os.waitpid(pid, os.WNOHANG) does not -> Popen.wait() hangs when called from a signal handler when os.waitpid() does not |
2015-12-27 19:53:44 | chris.jerdonek | set | messages:
+ msg257091 |
2015-12-27 19:16:58 | gregory.p.smith | set | status: open -> closed resolution: wont fix title: Popen.wait() hangs with SIGINT when os.waitpid() does not -> Popen.wait() hangs when called from a signal handler when os.waitpid(pid, os.WNOHANG) does not |
2015-12-27 19:16:04 | gregory.p.smith | set | messages:
+ msg257089 |
2015-12-27 19:04:33 | gregory.p.smith | set | assignee: gregory.p.smith |
2015-12-27 19:04:25 | gregory.p.smith | set | messages:
+ msg257087 |
2015-12-27 11:37:45 | chris.jerdonek | set | nosy:
+ gregory.p.smith messages:
+ msg257072
|
2015-12-27 10:58:02 | chris.jerdonek | create | |