msg256973 - (view) |
Author: Mike Pomraning (Mike Pomraning) |
Date: 2015-12-25 01:15 |
Python 3.3 introduces timeout support in subprocess.call, implemented by sending a SIGKILL if the Popen.wait is interrupted by a TimeoutExpired exception.
However, the "except" clause is too broad, and will, for instance, trigger on a KeyboardInterrupt. For practical purposes, this means that sending a Ctrl-C to a python program before 3.3 sent a SIGINT to both the parent and subprocess.call()d child, whereas under 3.3+ sends a SIGINT _and_ a SIGKILL to the child. The child will not be able to clean up appropriately.
For a real world example of this, see http://stackoverflow.com/q/34458583/132382
The fix is, I think, simply changing the clause to "except TimeoutExpired". At least, that works for me. See attached patch.
|
msg257118 - (view) |
Author: SilentGhost (SilentGhost) *  |
Date: 2015-12-28 16:57 |
The code was introduced to solve issue 12494, so I'm adding Victor to weigh in.
|
msg257163 - (view) |
Author: Mike Pomraning (Mike Pomraning) |
Date: 2015-12-29 04:37 |
If I understand correctly, the _try_wait mechanics (or 3.5's syscall behavior) already handle EINTR the way we way: ignore it and try wait()ing again.
So, this patch would kill only on a timeout, and never on another error like Ctrl-C, a UserDefinedTimeoutException from a signal handler, etc.
That's probably the lesser of two evils, the other being a SIGKILL against an arbitrary child process. Better to document that a non-timeout-parameter interruption to subprocess.call will separate the parent from its child, than to hard kill arbitrary programs when a polite SIGINT was intended.
|
msg257283 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2016-01-01 09:01 |
Doesn’t this scenario apply equally to run(), or check_output() in 3.4?
I suspect the explicit p.wait() call is not needed either. The context manager should already be calling wait().
One possible problem that I can think of: if you set a timeout, then interrupt the call with KeyboardInterrupt or similar, the context manager will now wait without without a timeout. Demo:
# Hit Ctrl+C before the 3 s timeout, and it will delay 10 s
call('trap "" INT && sleep 10', shell=True, timeout=3)
|
msg257289 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2016-01-01 12:56 |
The issue explained in 12494 is that the caller doesn't have access to the
subprocess.Popen object. I disagree to not kill the process when an
exception is raised, even KeyboardInterrupt.
I also disagree to say that we kill an "arbitrary" process. IMHO it's part
of the API that the process is killed with SIGKILL on error.
Maybe we need to flag to send SIGTERM on exception and then wait N wait
until the child exited, or send SIGKILL after the timeout. Maybe it's
overkill and such API should be developed in third party modules.
Anyway, not sending any signal on exction is not a good idea. We must read
the exit status to avoid zombi process. It's not a matter of sending a
signal but of reading the exit status.
|
msg257293 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2016-01-01 20:21 |
The reported problem is when no timeout is given. Perhaps it would be sufficient to:
1. Kill the child if any exception happens when a timeout is enforced
2. Never kill the child when there is no timeout
If a timeout is specified, the current code is good enough, but if no timeout is specified, maybe we should just do the equivalent of:
with Popen(...) as p:
return p.wait()
# If interrupted, the context manager will wait again. If the interruption is due to a terminal-wide SIGINT, the child may also have been interrupted.
For comparison, the Posix system() function is supposed to ignore SIGINT and SIGQUIT (i.e. signals from the terminal). See the Gnu implementation: <https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/posix/system.c>.
|
msg262871 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2016-04-04 21:03 |
Even if we can’t agree on any behaviour change, I think it might be worth documenting how these functions behave on exceptions (interrupts) other than TimeoutExpired. Currently all I can find is “If the timeout expires, the child process will be killed and waited for.” I think this could be expanded to also say what happens if the parent is interrupted by a signal such as KeyboardInterrupt:
* Current behaviour: Immediately kill child (i.e. timeout expiry is not special)
* Previous behaviour: Return without waiting for child, which will become a zombie
* Mike’s proposal: Wait indefinitely for child without killing it, which could defeat the purpose of the timeout, especially if the child ignores or does not receive the same signal as the parent
|
msg263260 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2016-04-12 16:07 |
Again, the problem is that the exception exits from the function which owns the last reference to the Popen object. If the Popen is left alive when you exit the function, you create a zombi process, you can leave open pipes, etc.
It's unclear to me if CTRL+c sends SIGTERM to all processes or only a few of them (only the parent process?). Even if SIGTERM is sent to all processes, a child process can decide to completly ignore it, or can block in a deadlock or whatever.
Giving a few seconds to the child process to wait until it ends is not easy because it's hard to choose an arbitrary timeout. If the timeout is too low, you kill the child process (SIGKILL) before it flushes files. If the timeout is too long, the parent process is blocked too long when the child process is really blocked.
I suggest to keep the current behaviour by default.
If you really want to give time to the child process, I suggest to add a *new* optional parameter . For example ctrlc_timeout=5.0 to send SIGTERM and then wait 5 seconds.
I don't know if the parent process must always send a SIGTERM to the child process or not. One signal or two don't have the same behaviour. It's possible to explicitly send a SIGTERM to only one process using the kill command.
|
msg263279 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2016-04-12 22:11 |
I don’t know how it works on Windows, but on Unix in most cases the parent and child will share a controlling terminal. Pressing Ctrl+C in the terminal will broadcast SIGINT to all processes, parent and child. That is probably why os.system() ignores SIGINT.
I doubt the usefulness of building in extra timeouts to send SIGTERM and SIGKILL. If the user really cares that much, they can probably design their own timeout mechanism. That is why I suggested above to treat the non-timeout mode differently.
|
msg263302 - (view) |
Author: Mike Pomraning (Mike Pomraning) |
Date: 2016-04-13 03:26 |
Yes, standard UNIX terminal behavior is to map Ctrl-C to a SIGINT sent to the foreground process group, so that every member of a pipeline (e.g.) or hidden helper children processes can be terminated by the interactive user and have the chance to clean up.
Handling a child process behind a convenience interface, like system() or subprocess.call(), is inherently a bit tricky when things go wrong.
My expectation for .call() would be that it behave something like os.system() (or the C library system() for that matter) and _not_ be interrupted by conventional signals. That the EINTR be "swallowed" and the p.wait() resumed, as _try_wait() does already. That way a user timeout= does what we want, but a Ctrl-C has familiar semantics.
Yes, it will be possible for a coder to contrive to throw some other exception during the wait() ... in that case we should close the pipes but _not_ kill and reap the child. There will be zombies. Zombies are better than SIGKILLing a 3rd-party process that perhaps needs a graceful shutdown.
|
msg263308 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2016-04-13 05:51 |
When no timeout is specified, these are the options as I see them:
1. SIGKILL child immediately on the first KeyboardInterrupt (Victor’s behaviour since 3.3)
2. Give up and leave a zombie after the first KeyboardInterrupt (pre-3.3 behaviour)
3. Wait again after first KeyboardInterrupt, and leave a zombie after the second one (Mike’s patch)
4. Ignore SIGINT so that by default no KeyboardInterrupt will happen, like C’s system()
5. Start a timeout after the first KeyboardInterrupt (Victor’s suggestion)
Here is my proposal, taking into account Victor’s desire to never leave a zombie, and Mike’s desire to let the child handle SIGINT in its own time: After the first KeyboardInterrupt or other exception, wait() a second time, and only use SIGKILL if the second wait() is interrupted. It’s a bit complicated, but I think this would solve everyone’s concerns raised so far:
def call(*popenargs, timeout=None, **kwargs):
p = Popen(*popenargs, **kwargs)
try:
if timeout is None:
try:
return p.wait()
except:
p.wait() # Let the child handle SIGINT
raise
else:
return p.wait(timeout=timeout)
except:
p.kill() # Last resort to avoid leaving a zombie
p.wait()
raise
|
msg263420 - (view) |
Author: Mike Pomraning (Mike Pomraning) |
Date: 2016-04-14 16:07 |
#2 and #4 are the only predictable and palatable options, I think. Ignore the patch that started this issue.
|
msg263551 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2016-04-16 12:48 |
I don’t think Victor likes #2 because of the zombie. I would be interested in #4 (one of the documented purposes of subprocess is to replace os.system), but it might need careful consideration and discussion. Ignoring signals is such a significant change I think it would have to be a new feature for 3.6+ only.
Mike/Victor, what do you think of my proposal (call it #6) about waiting a second time before resorting to SIGKILL? Posting a patch which implements this.
|
msg263743 - (view) |
Author: Mike Pomraning (Mike Pomraning) |
Date: 2016-04-19 14:53 |
Re: #2, I'd rather have a zombie than a hard kill on a child whose code I perhaps don't control. Zombies are a fact of life (er, a fact of undeath?) in UNIX process management, and are the historical and IMHO expected behavior.
|
msg305619 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2017-11-06 04:59 |
https://github.com/python/cpython/pull/4283 adds a secondary timeout, which defaults to 1 s when there is no main timeout. But this seems complicated and arbitrary. As I understand, the main use case discussed here was waiting without a timeout for a child that exits soon after the interrupt. But are there any practical use cases or demand for:
* Limiting the wait time after the interrupt (despite no timeout before the interrupt)?
* Customizing this additional timeout?
|
msg305641 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2017-11-06 13:10 |
I changed the issue title to "Add a new optional cleanup_timeout parameter to subprocess.call()" to make it more positive and update it to the currently proposed change ;-)
|
msg309115 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2017-12-28 04:08 |
you'll notice I added an alternate PR. I don't like the complication of adding yet another knob (cleanup_timeout) to subprocesses already giant API surface. It will rarely be used.
My PR tries to take a practical approach: Just wait a little while (arbitrary value of little chosen in the code) for the child after receiving SIGINT before reraising the exception and triggering a .kill() matching existing behavior.
The one controversial thing in my PR (which could be undone, it is independent of the other changes) is that I also modify the context manager __exit__ behavior to not do an infinite wait() upon KeyboardInterrupt. This means context managers of Popen _will_ complete potentially leaving a dangling process around (which our existing __del__ will pick up and put in the internal subprocess._active list). Relatively harmless, but a change none-the-less.
I went that far to try and better match the Python 2.7 and 3.2 behavior: On SIGINT our process sees the KeyboardInterrupt "right away" (not quite as instantaneously here given the wait timeouts, but close enough for interactive command line tool ^C happiness).
It seems like an improvement all around and is IMNSHO less complicated.
|
msg311230 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2018-01-30 05:27 |
New changeset f4d644f36ffb6cb11b34bfcf533c14cfaebf709a by Gregory P. Smith in branch 'master':
bpo-25942: make subprocess more graceful on ^C (GH-5026)
https://github.com/python/cpython/commit/f4d644f36ffb6cb11b34bfcf533c14cfaebf709a
|
msg311234 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2018-01-30 05:31 |
I went with my change to give the child process a small amount of time to cleanup by default. Not perfect, but this should be more similar to the Python <=3.2 behavior. Lets see if any issues crop up during the 3.7 betas.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:25 | admin | set | github: 70130 |
2018-01-30 15:16:26 | stuarteberg | set | nosy:
+ stuarteberg
|
2018-01-30 05:31:57 | gregory.p.smith | set | status: open -> closed title: Add a new optional cleanup_timeout parameter to subprocess.call() -> Do not immediately SIGKILL subprocess child processes upon ^C messages:
+ msg311234
resolution: fixed stage: patch review -> commit review |
2018-01-30 05:27:41 | gregory.p.smith | set | messages:
+ msg311230 |
2017-12-28 04:08:03 | gregory.p.smith | set | messages:
+ msg309115 |
2017-12-28 03:41:34 | gregory.p.smith | set | pull_requests:
+ pull_request4914 |
2017-11-06 13:10:16 | vstinner | set | type: behavior -> enhancement title: subprocess.call SIGKILLs too liberally -> Add a new optional cleanup_timeout parameter to subprocess.call() messages:
+ msg305641 versions:
- Python 3.6 |
2017-11-06 04:59:43 | martin.panter | set | messages:
+ msg305619 |
2017-11-05 03:57:22 | gregory.p.smith | set | versions:
+ Python 3.7, - Python 3.5 |
2017-11-05 03:56:49 | gregory.p.smith | set | assignee: gregory.p.smith
nosy:
+ gregory.p.smith |
2017-11-04 23:24:24 | stuarteberg | set | pull_requests:
+ pull_request4245 |
2016-04-19 14:53:21 | Mike Pomraning | set | messages:
+ msg263743 |
2016-04-16 12:48:57 | martin.panter | set | files:
+ second-wait.patch
messages:
+ msg263551 |
2016-04-14 16:07:26 | Mike Pomraning | set | messages:
+ msg263420 |
2016-04-13 05:51:07 | martin.panter | set | messages:
+ msg263308 |
2016-04-13 03:26:41 | Mike Pomraning | set | messages:
+ msg263302 |
2016-04-12 22:11:02 | martin.panter | set | messages:
+ msg263279 |
2016-04-12 16:07:54 | vstinner | set | messages:
+ msg263260 |
2016-04-04 21:03:23 | martin.panter | set | messages:
+ msg262871 |
2016-04-04 15:11:52 | rpcope1 | set | nosy:
+ rpcope1
|
2016-01-01 20:21:41 | martin.panter | set | messages:
+ msg257293 |
2016-01-01 12:56:10 | vstinner | set | messages:
+ msg257289 |
2016-01-01 09:01:29 | martin.panter | set | nosy:
+ martin.panter
messages:
+ msg257283 stage: patch review |
2015-12-29 04:37:17 | Mike Pomraning | set | messages:
+ msg257163 |
2015-12-28 16:57:48 | SilentGhost | set | nosy:
+ SilentGhost, vstinner
messages:
+ msg257118 versions:
+ Python 3.5, Python 3.6, - Python 3.3 |
2015-12-25 01:15:08 | Mike Pomraning | create | |