This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: subprocess: Popen.kill() + Popen.communicate() is blocking until all processes using the pipe close the pipe
Type: Stage: resolved
Components: Library (Lib) Versions: Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: jkloth, miss-islington, vstinner
Priority: normal Keywords: patch

Created on 2019-09-17 23:30 by vstinner, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
communicate_close.py vstinner, 2019-09-18 14:16
Pull Requests
URL Status Linked Edit
PR 16659 merged vstinner, 2019-10-08 14:44
PR 16676 merged miss-islington, 2019-10-09 13:33
PR 16677 merged miss-islington, 2019-10-09 13:34
Messages (10)
msg352673 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-09-17 23:30
On Windows, the communicate() method of subprocess.Popen is implemented with threads calling:

def _readerthread(self, fh, buffer):
    buffer.append(fh.read())
    fh.close()

where fh is one the Popen pipes: popen.stdout or popen.stderr.

For stdout=PIPE and/or stderr=PIPE, Popen._get_handles() creates a pipe with _winapi.CreatePipe(None, 0).

The fh.read() calls only completes when the write end of the pipe is closed: when all open handles of the write end of the pipe are closed.

When using Popen(stdout=PIPE), Python uses the pipe as stdout:

                startupinfo.dwFlags |= _winapi.STARTF_USESTDHANDLES
                startupinfo.hStdInput = p2cread
                startupinfo.hStdOutput = c2pwrite
                startupinfo.hStdError = errwrite

So far so good, it works well when Python spawns a single process.

--

Problems arise when the child process itself spawns new processes. In that case, the stdout pipe is inherited by other processes.

Popen.communicate() will block until all processes close the pipe.

This behavior is surprising for the following pattern:
------------------
try:
    stdout, stderr = popen.communicate(timeout=5.0)
except subprocess.TimeoutExpired:
    popen.kill()
    stdout, stderr = popen.communicate()
------------------

I would expect that the second communicate() call completes immediately since we just killed the process: Windows knows that child process is dead, Python knows that Popen object has been kill, but fh.read() continues to block (until all processes closed the pipe).

I would expect communicate() to complete immediately.

The problem is that fh.read() only returns once all data is returned at once. If fh.read() call is cancelled somehow, communicate() would just return empty data: we loose all data.

--

Eryk Sun proposed two solutions:
https://bugs.python.org/issue37531#msg350246

"For Windows, subprocess could have a _read_all(file) method that special cases a pipe. The read loop for a pipe would check whether the child has exited. Then call _winapi.PeekNamedPipe on the handle (from get_osfhandle), and do a raw read of the available bytes. If the child has exited or PeekNamedPipe fails (EPIPE), then break, join the partial reads, decode and translate newlines if it's text mode, and return."

and:
https://bugs.python.org/issue37531#msg350329

"Alternatively, instead of special casing the file type and spinning on PeekNamedPipe, the workaround could be based on a multiple-object wait that includes the child process handle. In this case, communicate() would always call _communicate in Windows, regardless of the timeout or number of pipes -- because simplistically calling either self.stdout.read() or self.stderr.read() could hang. 

The implementation would be moderately complicated. If we stop waiting on the reader threads because the process exited, we can give the threads a short time to finish reading and close the files -- maybe 250 ms is enough. But if they haven't exited at this time, we can't simply raise a TimeoutExpired exception if the call hasn't actually timed out.  To finish the _communicate call, we would have to cancel the pending reads and join() the threads.

We can force a reader thread to exit by canceling the read via WINAPI CancelIoEx. However, _readerthread has to be modified to support this. It could be implemented as a loop that calls _winapi.ReadFile to read the output in chunks that get appended to the buffer list. The read loop would break for either ERROR_BROKEN_PIPE or ERROR_OPERATION_ABORTED (canceled). 

The final step in _communicate would be to concatenate the partial reads. If it's text mode, it would also have to decode the bytes and translate newlines. The POSIX implementation of _communicate has to do this, so we already have a _translate_newlines method for this case.

Note that _winapi.WaitForMultipleObjects is interruptible in the main thread via Ctrl+C, which is a bonus improvement since Thread.join() can't be interrupted in Windows."

--

This issue has been discussed in bpo-37531 about regrtest:

https://bugs.python.org/issue37531#msg350181

It impacts the Python test suite when a test uses multiprocessing (for example) which spawns new processes.
msg352678 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-09-17 23:40
See also bpo-37424 was has been fixed: "subprocess.run timeout does not function if shell=True and capture_output=True".
msg352729 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-09-18 14:16
Hum, I see a similar behavior on Linux. Try attached communicate_close.py (I added newlines for readability):
---
vstinner@apu$ python3 communicate_close.py
process 1: pid 13089
process 1: process 2 spawned (pid 13090)
process 1: stdout pipe os.stat_result(st_mode=4480, st_ino=3107861, st_dev=12, st_nlink=1, st_uid=1000, st_gid=1000, st_size=0, st_atime=1568816000, st_mtime=1568816000, st_ctime=1568816000)
process 1: communicate with process 2 (pid 13090) ...

process 2: pid 13090
process 2: stdout os.stat_result(st_mode=4480, st_ino=3107861, st_dev=12, st_nlink=1, st_uid=1000, st_gid=1000, st_size=0, st_atime=1568816000, st_mtime=1568816000, st_ctime=1568816000)

process 3: pid 13092
process 3: stdout os.stat_result(st_mode=4480, st_ino=3107861, st_dev=12, st_nlink=1, st_uid=1000, st_gid=1000, st_size=0, st_atime=1568816000, st_mtime=1568816000, st_ctime=1568816000)

process 1 (thread): kill process 2 (pid 13090)
process 1 (thread): close process 2 stdout pipe (fd 3)

process 3: exit
process 1: communicate with process 2 (pid 13090) ... done in 5.1 sec
---

In process 1, communicate(<process 2>) does not complete immediately when the process 1 thread kills process 2 and closes process 2 stdout pipe.

communicate() only completes once process 3 completes.

The stdout file of process 2 and process 3 are the same file.
msg354222 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-08 16:45
New changeset 0ec618af98ac250a91ee9c91f8569e6df6772758 by Victor Stinner in branch 'master':
bpo-37531: regrtest ignores output on timeout (GH-16659)
https://github.com/python/cpython/commit/0ec618af98ac250a91ee9c91f8569e6df6772758
msg354269 - (view) Author: miss-islington (miss-islington) Date: 2019-10-09 13:52
New changeset 598bfa4d457d61431142ad99ecbb9bd10cf314e6 by Miss Islington (bot) in branch '3.7':
bpo-37531: regrtest ignores output on timeout (GH-16659)
https://github.com/python/cpython/commit/598bfa4d457d61431142ad99ecbb9bd10cf314e6
msg354271 - (view) Author: miss-islington (miss-islington) Date: 2019-10-09 14:02
New changeset 359a1975cbca488ccd5ea13bd7268d7e88664078 by Miss Islington (bot) in branch '3.8':
bpo-37531: regrtest ignores output on timeout (GH-16659)
https://github.com/python/cpython/commit/359a1975cbca488ccd5ea13bd7268d7e88664078
msg354278 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-09 15:06
regrtest now workarounds this issue using timeouts on subprocess.Popen.wait() and threading.Thread.join() operations: see bpo-37531.
msg354820 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-16 23:06
See also bpo-31447: "proc communicate not exiting on python subprocess timeout using PIPES".
msg354823 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-16 23:43
On Windows, the threads reading pipes continue to run even after the direct child process exited.

Problem: at Python exit (main process), Python hangs on calling threading._shutdown() which waits until all Python threads complete.

One enhancement would be to spawn subprocess.Popen.communicate() internal threads as daemon threads. So threading._shutdown() would not wait for them at Python exit.
msg361565 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-02-07 10:48
My use case was specific to regrtest. I solved the issue differently in regrtest. I close the issue.
History
Date User Action Args
2022-04-11 14:59:20adminsetgithub: 82388
2020-02-07 10:48:47vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg361565

stage: patch review -> resolved
2019-10-16 23:43:43vstinnersetmessages: + msg354823
2019-10-16 23:06:20vstinnersetmessages: + msg354820
2019-10-16 23:06:04vstinnersettitle: subprocess: On Windows, Popen.kill() + Popen.communicate() is blocking until all processes using the pipe close the pipe -> subprocess: Popen.kill() + Popen.communicate() is blocking until all processes using the pipe close the pipe
2019-10-09 15:06:57vstinnersetmessages: + msg354278
2019-10-09 14:02:18miss-islingtonsetmessages: + msg354271
2019-10-09 13:52:38miss-islingtonsetnosy: + miss-islington
messages: + msg354269
2019-10-09 13:34:02miss-islingtonsetpull_requests: + pull_request16260
2019-10-09 13:33:55miss-islingtonsetpull_requests: + pull_request16258
2019-10-08 16:45:53vstinnersetmessages: + msg354222
2019-10-08 14:44:40vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request16243
2019-09-18 14:16:39vstinnersetfiles: + communicate_close.py

messages: + msg352729
2019-09-17 23:44:04jklothsetnosy: + jkloth
2019-09-17 23:40:39vstinnersetmessages: + msg352678
2019-09-17 23:30:48vstinnercreate