Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proc communicate not exiting on python subprocess timeout using PIPES #75628

Open
LeonardoFrancalanci mannequin opened this issue Sep 13, 2017 · 17 comments
Open

proc communicate not exiting on python subprocess timeout using PIPES #75628

LeonardoFrancalanci mannequin opened this issue Sep 13, 2017 · 17 comments
Labels
3.8 only security fixes 3.9 only security fixes 3.10 only security fixes docs Documentation in the Doc dir stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@LeonardoFrancalanci
Copy link
Mannequin

LeonardoFrancalanci mannequin commented Sep 13, 2017

BPO 31447
Nosy @pfmoore, @tjguk, @bitdancer, @vadmium, @zware, @eryksun, @zooba

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2017-09-13.10:11:27.791>
labels = ['type-bug', '3.8', '3.9', '3.10', 'library', 'docs']
title = 'proc communicate not exiting on python subprocess timeout using PIPES'
updated_at = <Date 2021-03-08.19:51:26.184>
user = 'https://bugs.python.org/LeonardoFrancalanci'

bugs.python.org fields:

activity = <Date 2021-03-08.19:51:26.184>
actor = 'vstinner'
assignee = 'docs@python'
closed = False
closed_date = None
closer = None
components = ['Documentation', 'Library (Lib)']
creation = <Date 2017-09-13.10:11:27.791>
creator = 'Leonardo Francalanci'
dependencies = []
files = []
hgrepos = []
issue_num = 31447
keywords = []
message_count = 17.0
messages = ['302051', '302059', '302062', '302067', '302068', '302071', '302121', '302130', '302146', '302155', '302240', '302254', '302281', '302450', '302493', '361566', '387860']
nosy_count = 9.0
nosy_names = ['paul.moore', 'tim.golden', 'r.david.murray', 'docs@python', 'martin.panter', 'zach.ware', 'eryksun', 'steve.dower', 'Leonardo Francalanci']
pr_nums = []
priority = 'normal'
resolution = None
stage = 'needs patch'
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue31447'
versions = ['Python 3.8', 'Python 3.9', 'Python 3.10']

@LeonardoFrancalanci
Copy link
Mannequin Author

LeonardoFrancalanci mannequin commented Sep 13, 2017

the script below (a python process is called, which calls a waitfor cmd with a timeout of 4 seconds) is supposed to end after 4 seconds. But instead proc.communicate stops after the 20 seconds timeout.

Everything works 100% ok if I remove the stdin/stderr/stdout parameters...

if __name__ == "__main__":
  #start a python process which will wait for 4 seconds and then exit (waitfor is set to 200):
  proc_str = ["C:\\Program Files (x86)\\Anaconda3\\Python.exe",
                        "-c", "import 
  subprocess;subprocess.run('cmd /S /C waitfor g /t 200', shell=False, timeout=4)"]
  proc = subprocess.Popen(proc_str,
                            stdin=subprocess.PIPE,
                            stderr=subprocess.PIPE,
                            stdout=subprocess.PIPE,
                            shell=False,
                            universal_newlines=True)
  #this should exit in 4 seconds (when the called process exits), but instead exits after 20 seconds:
  (proc_out, proc_err) = proc.communicate(timeout=20)

@LeonardoFrancalanci LeonardoFrancalanci mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels Sep 13, 2017
@eryksun
Copy link
Contributor

eryksun commented Sep 13, 2017

I don't see a bug here. The inner run() kills cmd.exe after 4 seconds. But the waitfor.exe process waits for 200 seconds and has handles for the pipes. The OS won't close out the pipes as long as there are potential writers, so the outer communicate() waits the full 20 seconds.

@LeonardoFrancalanci
Copy link
Mannequin Author

LeonardoFrancalanci mannequin commented Sep 13, 2017

The called "C:\\Program Files (x86)\\Anaconda3\\Python.exe" process exits after 4 seconds. The reason why it ends shouldn't matter, right? I expect that a call to communicate should exit as soon as the called process is not running anymore. I don't hold a pipe with "waitfor", I hold it with python.exe.

@vadmium
Copy link
Member

vadmium commented Sep 13, 2017

I’m not familiar with Windows commands or processes enough, but going by Erik’s comment, this sounds similar to bpo-26534 and bpo-30154. Doesn’t your “waitfor” command inherit the same parent process’s stdout etc pipes?

@LeonardoFrancalanci
Copy link
Mannequin Author

LeonardoFrancalanci mannequin commented Sep 13, 2017

thank you for your replies!

I used waitfor because it's the simplest and shortest way to have reproducible code. The issue I'm having is obviously not with waitfor, but with another exe, but this doesn't change the fact that I need a way to exit as soon as the called process exits or the timeout ends. I don't care what the called process does. I know that the process exited, and the docs for proc.communicate say "Wait for process to terminate". Well, the process terminates, but "communicate" doesn't exit. It doesn't say "communicate will hang as long as the pipes are open".
Plus: the "python.exe" process doesn't actually timeout!!! It just exits (with exception, but that's not important), and I would like to get the error from stderr! While I can't get it because "communicate" raises an exception...

@LeonardoFrancalanci
Copy link
Mannequin Author

LeonardoFrancalanci mannequin commented Sep 13, 2017

(forgot: bpo-26534 is about shell=True, I use shell=False, right?)

@vadmium
Copy link
Member

vadmium commented Sep 13, 2017

.
Isn’t your use of “cmd” similar enough to shell=True? I.e. isn’t that a “cmd” parent process spawning a “waitfor” child? If your 4 s “subprocess.run” call times out, does it kill the “waitfor” process, or leave it running?

Could the “waitfor” process write to stdout or stderr, and should that be captured by your 20 s “communicate” call?

@eryksun
Copy link
Contributor

eryksun commented Sep 13, 2017

the process terminates, but "communicate" doesn't exit. It doesn't
say "communicate will hang as long as the pipes are open".

I think the problem with leaked handles does warrant a note. It can be a difficult problem to diagnose otherwise.

If communicate() raises subprocess.TimeoutExpired, it can be called again after the caller takes whatever action is necessary to resolve the problem, such as killing a process group as proposed by bpo-5115.

the "python.exe" process doesn't actually timeout!!! It just exits
(with exception, but that's not important), and I would like to get
the error from stderr! While I can't get it because "communicate"
raises an exception...

subprocess.run() in the child does time out and raise subprocess.TimeoutExpired. For the parent to read stdout and stderr to EOF, all pipe handles with write access have to close. Calling communicate() again will succeed once waitfor.exe either exits or gets terminated.

Since python.exe and waitfor.exe are console processes, and waitfor.exe inherits Python's console, you can send it a Ctrl+Break event to make it exit. Make sure to create a new process group in the Popen() call by passing the argument creationflags=subprocess.CREATE_NEW_PROCESS_GROUP. Then to deal with a timeout, call proc.send_signal(signal.CTRL_BREAK_EVENT) and retry communicate().

(Yes, it's wrong that send_signal sends Ctrl+Break to a process group instead of explicitly requiring a group argument to enable this, as proposed in bpo-5115. I guess whoever implemented os.kill for Windows didn't understand that GenerateConsoleCtrlEvent should have been used to implement os.killpg instead.)

I don't care what the called process does.

If you have source control of the child process and don't care about the output of grandchild processes, redirect their standard I/O to the NUL device. Also, make the child's standard I/O files non-inheritable, or if you're using C/C++, call CreateProcess with a PROC_THREAD_ATTRIBUTE_HANDLE_LIST to control handle inheritance (this should be available in Python 3.7, pending bpo-19764).

@eryksun eryksun added the 3.7 (EOL) end of life label Sep 13, 2017
@LeonardoFrancalanci
Copy link
Mannequin Author

LeonardoFrancalanci mannequin commented Sep 14, 2017

I have a workaround, and I guess this means there's a bug in the current implementation of stdout/stderr=subprocess.PIPE; if I open my own pipes instead of using subprocess.PIPE everything seems to work (provided I close the pipe before reading from it):

(errPipeR, errPipeW) = os.pipe();
(outPipeR, outPipeW) = os.pipe();

proc = subprocess.Popen(proc_str,
                                    stdin=subprocess.PIPE,
                                    stderr=errPipeW,
                                    stdout=outPipeW,shell=False,
                                    universal_newlines=True)

proc.communicate(timeout=20)
os.close(outPipeW)
os.close(errPipeW)

Now I can read from the "R" pipes with 0 problems (getting the error and output streams), and proc.communicate exits as soon as the called process exits (4 seconds).

So, since I can use my own pipes without any problems, I don't see why the regular "subprocess.PIPE" shouldn't be working...

@eryksun
Copy link
Contributor

eryksun commented Sep 14, 2017

Now I can read from the "R" pipes with 0 problems (getting the error
and output streams)

The pipe handles are inherited by waitfor.exe, just as before. You'll find that reading to EOF will block just as it does when using subprocess.PIPE with communicate(). For example:

    stderr = open(errPipeR, 'r')
    err = stderr.read()

The above stderr.read() will return when waitfor.exe either exits or gets terminated.

and proc.communicate exits as soon as the called process exits
(4 seconds).

proc.stdout and proc.stderr are None in this case, so all communicate() can do is write to proc.stdin, if there's input text, and then wait() for the process to exit.

@LeonardoFrancalanci
Copy link
Mannequin Author

LeonardoFrancalanci mannequin commented Sep 15, 2017

I'm really sorry, you are 100% correct: it blocks on the pipe (my tests killed the process tree before reading from the pipes).
Still, I think there should be a way to actually read the output also in this case... works for me when I kill the whole process stack.

@LeonardoFrancalanci
Copy link
Mannequin Author

LeonardoFrancalanci mannequin commented Sep 15, 2017

can I at least change the call to:

subprocess.run('cmd /S /C waitfor g /t 200', shell=False, timeout=4)

in any way to avoid the problem?
I tried with stdin=subprocess.DEVNULL,stdout=subprocess.DEVNULL,stderr=subprocess.DEVNULL; also with close_fds=True, but nothing changes...

@eryksun
Copy link
Contributor

eryksun commented Sep 15, 2017

I tried with stdin=subprocess.DEVNULL, stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL

As I said previously, you also need to make the current standard handles non-inheritable. Pending bpo-19764, in 3.7 you'll be able to override stdin, stdout, and stderr with the default close_fds=True. Currently overriding them implicitly sets close_fds=False, in which case you need to manually ensure that the current standard handles (i.e. the pipe handles) can't be inherited. For example:

    os.set_inheritable(0, False)
    os.set_inheritable(1, False)
    os.set_inheritable(2, False)

@LeonardoFrancalanci
Copy link
Mannequin Author

LeonardoFrancalanci mannequin commented Sep 18, 2017

That works!
But when I said "also with close_fds=True", I meant that I tried WITHOUT overriding stdin, stdout, and stderr AND setting close_fds=True, but it didn't work. What worked was not overriding stdin/out/err and adding

os.set_inheritable(0, False)
os.set_inheritable(1, False)
os.set_inheritable(2, False)

before the call (no need to set close_fds)

@eryksun
Copy link
Contributor

eryksun commented Sep 18, 2017

when I said "also with close_fds=True", I meant that I tried
WITHOUT overriding stdin, stdout, and stderr AND setting
close_fds=True, but it didn't work.

Console applications (e.g. python.exe) duplicate their standard handles into a child console process when the bInheritHandles argument of CreateProcess is false, i.e. when subprocess.Popen is called with the default close_fds=True. We know it's duplication instead of inheritance because the actual handle values change and also because it doesn't matter whether or not the standard handles are flagged as inheritable in the creating process. In the typical case this ensures that console processes have valid standard handles.

However, if you make the standard handles non-inheritable in the parent and also inherit handles when creating the child, then the parent's standard handles will be neither inherited nor duplicated. The child will still inherit the standard handle *values* from the parent (i.e. copied from the parent's PEB ProcessParameters). These handle values will be invalid. Either the handle won't be defined, or it will reference an arbitrary, unrelated kernel object.

What worked was not overriding stdin/out/err and adding

os.set_inheritable(0, False)
os.set_inheritable(1, False)
os.set_inheritable(2, False)

before the call (no need to set close_fds)

In this case cmd.exe gets duplicated standard handles, but you've flagged these handles as non-inheritable. When cmd.exe calls CreateProcess to execute waitfor.exe, it uses bInheritHandles=TRUE. Consequently, its non-inheritable standard handles are neither inherited by nor duplicated to waitfor.exe. Let's attach a debugger to the waitfor.exe process and examine the standard handle values in its PEB ProcessParameters:

0:003> ?? @$peb->ProcessParameters->StandardInput
void * 0x00000000`00000008
0:003> !handle 8
Handle 8
  Type          Event

0:003> ?? @$peb->ProcessParameters->StandardOutput
void * 0x00000000`0000000c
0:003> !handle c
Handle c
  Type          WaitCompletionPacket

0:003> ?? @$peb->ProcessParameters->StandardError
void * 0x00000000`00000010
0:003> !handle 10
Handle 10
  Type          IoCompletion

waitfor.exe doesn't care that its standard handle values reference non-File objects, i.e. an Event, a WaitCompletionPacket, and an IoCompletion port. However, what if by coincidence one of its standard handle values is actually a valid File handle that's inherited for some other reason? That would be a right mess. That's why I suggested overriding the standard handles to the NUL device. For example:

    import subprocess

    script = r'''
    import os
    import subprocess

    os.set_inheritable(0, False)
    os.set_inheritable(1, False)
    os.set_inheritable(2, False)

    cmdline = 'cmd.exe /s /c waitfor.exe signal /t 200'
    subprocess.run(cmdline, timeout=4, stdin=subprocess.DEVNULL,
        stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
    '''

    args = ['python.exe', '-c', script]
    proc = subprocess.Popen(args, encoding='ansi', stdin=subprocess.PIPE,
                stdout=subprocess.PIPE, stderr=subprocess.PIPE)
proc_out, proc_err = proc.communicate(timeout=20)
print('proc_out:', proc_out)
print('proc_err:', proc_err)

@vstinner
Copy link
Member

vstinner commented Feb 7, 2020

See also bpo-38207: "subprocess: Popen.kill() + Popen.communicate() is blocking until all processes using the pipe close the pipe".

@eryksun eryksun added stdlib Python modules in the Lib dir OS-windows 3.8 only security fixes 3.9 only security fixes 3.10 only security fixes and removed interpreter-core (Objects, Python, Grammar, and Parser dirs) 3.7 (EOL) end of life labels Feb 27, 2021
@eryksun
Copy link
Contributor

eryksun commented Mar 1, 2021

I'm changing this to a documentation issue and removing the Windows component. The documentation doesn't make it clear that communicate() may block indefinitely (in both POSIX and Windows) even after the process has terminated. As currently implemented, this claim and the example need to be corrected.

Depending on one's needs, the Popen() instance can be used as context manager to ensure that communication is finalized (e.g. in Windows, the synchronous reads in worker threads need to be canceled) and that the pipes are closed -- presuming you don't need to read more data after the timeout. If bpo-43346 is resolved as suggested, then the following will work without blocking indefinitely in the second communicate() call:

    proc = subprocess.Popen(...)
    try:
        out, err = proc.communicate(timeout=15)
    except subprocess.TimeoutExpired:
        with proc:
            proc.kill()
        out, err = proc.communicate()

@eryksun eryksun added docs Documentation in the Doc dir and removed OS-windows labels Mar 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.8 only security fixes 3.9 only security fixes 3.10 only security fixes docs Documentation in the Doc dir stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants