classification
Title: Add a new optional cleanup_timeout parameter to subprocess.call()
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: gregory.p.smith Nosy List: Mike Pomraning, SilentGhost, gregory.p.smith, martin.panter, rpcope1, vstinner
Priority: normal Keywords: patch

Created on 2015-12-25 01:15 by Mike Pomraning, last changed 2017-11-06 13:10 by vstinner.

Files
File name Uploaded Description Edit
subprocess-call-py344-kill-only-on-timeout.patch Mike Pomraning, 2015-12-25 01:15 Patch to restrict subprocess.call SIGKILLs to timeouts only
second-wait.patch martin.panter, 2016-04-16 12:48 review
Pull Requests
URL Status Linked Edit
PR 4283 open stuarteberg, 2017-11-04 23:24
Messages (16)
msg256973 - (view) Author: Mike Pomraning (Mike Pomraning) Date: 2015-12-25 01:15
Python 3.3 introduces timeout support in subprocess.call, implemented by sending a SIGKILL if the Popen.wait is interrupted by a TimeoutExpired exception.

However, the "except" clause is too broad, and will, for instance, trigger on a KeyboardInterrupt.  For practical purposes, this means that sending a Ctrl-C to a python program before 3.3 sent a SIGINT to both the parent and subprocess.call()d child, whereas under 3.3+ sends a SIGINT _and_ a SIGKILL to the child.  The child will not be able to clean up appropriately.

For a real world example of this, see http://stackoverflow.com/q/34458583/132382

The fix is, I think, simply changing the clause to "except TimeoutExpired".  At least, that works for me.  See attached patch.
msg257118 - (view) Author: SilentGhost (SilentGhost) * Date: 2015-12-28 16:57
The code was introduced to solve issue 12494, so I'm adding Victor to weigh in.
msg257163 - (view) Author: Mike Pomraning (Mike Pomraning) Date: 2015-12-29 04:37
If I understand correctly, the _try_wait mechanics (or 3.5's syscall behavior) already handle EINTR the way we way:  ignore it and try wait()ing again.

So, this patch would kill only on a timeout, and never on another error like Ctrl-C, a UserDefinedTimeoutException from a signal handler, etc.

That's probably the lesser of two evils, the other being a SIGKILL against an arbitrary child process.  Better to document that a non-timeout-parameter interruption to subprocess.call will separate the parent from its child, than to hard kill arbitrary programs when a polite SIGINT was intended.
msg257283 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-01-01 09:01
Doesn’t this scenario apply equally to run(), or check_output() in 3.4?

I suspect the explicit p.wait() call is not needed either. The context manager should already be calling wait().

One possible problem that I can think of: if you set a timeout, then interrupt the call with KeyboardInterrupt or similar, the context manager will now wait without without a timeout. Demo:

# Hit Ctrl+C before the 3 s timeout, and it will delay 10 s
call('trap "" INT && sleep 10', shell=True, timeout=3)
msg257289 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-01-01 12:56
The issue explained in 12494 is that the caller doesn't have access to the
subprocess.Popen object. I disagree to not kill the process when an
exception is raised, even KeyboardInterrupt.

I also disagree to say that we kill an "arbitrary" process. IMHO it's part
of the API that the process is killed with SIGKILL on error.

Maybe we need to flag to send SIGTERM on exception and then wait N wait
until the child exited, or send SIGKILL after the timeout. Maybe it's
overkill and such API should be developed in third party modules.

Anyway, not sending any signal on exction is not a good idea. We must read
the exit status to avoid zombi process. It's not a matter of sending a
signal but of reading the exit status.
msg257293 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-01-01 20:21
The reported problem is when no timeout is given. Perhaps it would be sufficient to:

1. Kill the child if any exception happens when a timeout is enforced
2. Never kill the child when there is no timeout

If a timeout is specified, the current code is good enough, but if no timeout is specified, maybe we should just do the equivalent of:

with Popen(...) as p:
    return p.wait()
    # If interrupted, the context manager will wait again. If the interruption is due to a terminal-wide SIGINT, the child may also have been interrupted.

For comparison, the Posix system() function is supposed to ignore SIGINT and SIGQUIT (i.e. signals from the terminal). See the Gnu implementation: <https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/posix/system.c>.
msg262871 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-04-04 21:03
Even if we can’t agree on any behaviour change, I think it might be worth documenting how these functions behave on exceptions (interrupts) other than TimeoutExpired. Currently all I can find is “If the timeout expires, the child process will be killed and waited for.” I think this could be expanded to also say what happens if the parent is interrupted by a signal such as KeyboardInterrupt:

* Current behaviour: Immediately kill child (i.e. timeout expiry is not special)

* Previous behaviour: Return without waiting for child, which will become a zombie

* Mike’s proposal: Wait indefinitely for child without killing it, which could defeat the purpose of the timeout, especially if the child ignores or does not receive the same signal as the parent
msg263260 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-04-12 16:07
Again, the problem is that the exception exits from the function which owns the last reference to the Popen object. If the Popen is left alive when you exit the function, you create a zombi process, you can leave open pipes, etc.

It's unclear to me if CTRL+c sends SIGTERM to all processes or only a few of them (only the parent process?). Even if SIGTERM is sent to all processes, a child process can decide to completly ignore it, or can block in a deadlock or whatever.

Giving a few seconds to the child process to wait until it ends is not easy because it's hard to choose an arbitrary timeout. If the timeout is too low, you kill the child process (SIGKILL) before it flushes files. If the timeout is too long, the parent process is blocked too long when the child process is really blocked.

I suggest to keep the current behaviour by default.

If you really want to give time to the child process, I suggest to add a *new* optional parameter . For example ctrlc_timeout=5.0 to send SIGTERM and then wait 5 seconds.

I don't know if the parent process must always send a SIGTERM to the child process or not. One signal or two don't have the same behaviour. It's possible to explicitly send a SIGTERM to only one process using the kill command.
msg263279 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-04-12 22:11
I don’t know how it works on Windows, but on Unix in most cases the parent and child will share a controlling terminal. Pressing Ctrl+C in the terminal will broadcast SIGINT to all processes, parent and child. That is probably why os.system() ignores SIGINT.

I doubt the usefulness of building in extra timeouts to send SIGTERM and SIGKILL. If the user really cares that much, they can probably design their own timeout mechanism. That is why I suggested above to treat the non-timeout mode differently.
msg263302 - (view) Author: Mike Pomraning (Mike Pomraning) Date: 2016-04-13 03:26
Yes, standard UNIX terminal behavior is to map Ctrl-C to a SIGINT sent to the foreground process group, so that every member of a pipeline (e.g.) or hidden helper children processes can be terminated by the interactive user and have the chance to clean up.

Handling a child process behind a convenience interface, like system() or subprocess.call(), is inherently a bit tricky when things go wrong.

My expectation for .call() would be that it behave something like os.system() (or the C library system() for that matter) and _not_ be interrupted by conventional signals.  That the EINTR be "swallowed" and the p.wait() resumed, as _try_wait() does already.  That way a user timeout= does what we want, but a Ctrl-C has familiar semantics.

Yes, it will be possible for a coder to contrive to throw some other exception during the wait() ... in that case we should close the pipes but _not_ kill and reap the child.  There will be zombies.  Zombies are better than SIGKILLing a 3rd-party process that perhaps needs a graceful shutdown.
msg263308 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-04-13 05:51
When no timeout is specified, these are the options as I see them:

1. SIGKILL child immediately on the first KeyboardInterrupt (Victor’s behaviour since 3.3)

2. Give up and leave a zombie after the first KeyboardInterrupt (pre-3.3 behaviour)

3. Wait again after first KeyboardInterrupt, and leave a zombie after the second one (Mike’s patch)

4. Ignore SIGINT so that by default no KeyboardInterrupt will happen, like C’s system()

5. Start a timeout after the first KeyboardInterrupt (Victor’s suggestion)

Here is my proposal, taking into account Victor’s desire to never leave a zombie, and Mike’s desire to let the child handle SIGINT in its own time: After the first KeyboardInterrupt or other exception, wait() a second time, and only use SIGKILL if the second wait() is interrupted. It’s a bit complicated, but I think this would solve everyone’s concerns raised so far:

def call(*popenargs, timeout=None, **kwargs):
    p = Popen(*popenargs, **kwargs)
    try:
        if timeout is None:
            try:
                return p.wait()
            except:
                p.wait()  # Let the child handle SIGINT
                raise
        else:
            return p.wait(timeout=timeout)
    except:
        p.kill()  # Last resort to avoid leaving a zombie
        p.wait()
        raise
msg263420 - (view) Author: Mike Pomraning (Mike Pomraning) Date: 2016-04-14 16:07
#2 and #4 are the only predictable and palatable options, I think.  Ignore the patch that started this issue.
msg263551 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-04-16 12:48
I don’t think Victor likes #2 because of the zombie. I would be interested in #4 (one of the documented purposes of subprocess is to replace os.system), but it might need careful consideration and discussion. Ignoring signals is such a significant change I think it would have to be a new feature for 3.6+ only.

Mike/Victor, what do you think of my proposal (call it #6) about waiting a second time before resorting to SIGKILL? Posting a patch which implements this.
msg263743 - (view) Author: Mike Pomraning (Mike Pomraning) Date: 2016-04-19 14:53
Re: #2, I'd rather have a zombie than a hard kill on a child whose code I perhaps don't control.  Zombies are a fact of life (er, a fact of undeath?) in UNIX process management, and are the historical and IMHO expected behavior.
msg305619 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2017-11-06 04:59
https://github.com/python/cpython/pull/4283 adds a secondary timeout, which defaults to 1 s when there is no main timeout. But this seems complicated and arbitrary. As I understand, the main use case discussed here was waiting without a timeout for a child that exits soon after the interrupt. But are there any practical use cases or demand for:

* Limiting the wait time after the interrupt (despite no timeout before the interrupt)?
* Customizing this additional timeout?
msg305641 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-11-06 13:10
I changed the issue title to "Add a new optional cleanup_timeout parameter to subprocess.call()" to make it more positive and update it to the currently proposed change ;-)
History
Date User Action Args
2017-11-06 13:10:16vstinnersettype: behavior -> enhancement
title: subprocess.call SIGKILLs too liberally -> Add a new optional cleanup_timeout parameter to subprocess.call()
messages: + msg305641
versions: - Python 3.6
2017-11-06 04:59:43martin.pantersetmessages: + msg305619
2017-11-05 03:57:22gregory.p.smithsetversions: + Python 3.7, - Python 3.5
2017-11-05 03:56:49gregory.p.smithsetassignee: gregory.p.smith

nosy: + gregory.p.smith
2017-11-04 23:24:24stuartebergsetpull_requests: + pull_request4245
2016-04-19 14:53:21Mike Pomraningsetmessages: + msg263743
2016-04-16 12:48:57martin.pantersetfiles: + second-wait.patch

messages: + msg263551
2016-04-14 16:07:26Mike Pomraningsetmessages: + msg263420
2016-04-13 05:51:07martin.pantersetmessages: + msg263308
2016-04-13 03:26:41Mike Pomraningsetmessages: + msg263302
2016-04-12 22:11:02martin.pantersetmessages: + msg263279
2016-04-12 16:07:54vstinnersetmessages: + msg263260
2016-04-04 21:03:23martin.pantersetmessages: + msg262871
2016-04-04 15:11:52rpcope1setnosy: + rpcope1
2016-01-01 20:21:41martin.pantersetmessages: + msg257293
2016-01-01 12:56:10vstinnersetmessages: + msg257289
2016-01-01 09:01:29martin.pantersetnosy: + martin.panter

messages: + msg257283
stage: patch review
2015-12-29 04:37:17Mike Pomraningsetmessages: + msg257163
2015-12-28 16:57:48SilentGhostsetnosy: + SilentGhost, vstinner

messages: + msg257118
versions: + Python 3.5, Python 3.6, - Python 3.3
2015-12-25 01:15:08Mike Pomraningcreate