classification
Title: subprocess: Calling Popen.communicate() after Popen.stdout.read() returns an empty string
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Frost Ming, g.starck@gmail.com, gregory.p.smith, vstinner
Priority: normal Keywords:

Created on 2020-07-27 08:26 by Frost Ming, last changed 2020-08-04 01:55 by gregory.p.smith.

Messages (4)
msg374366 - (view) Author: Frost Ming (Frost Ming) Date: 2020-07-27 08:26
The following snippet behaves differently between Windows and POSIX.

import subprocess
import time


p = subprocess.Popen("ls -l", shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

print(p.stdout.read(1))   # read 1 byte
print(p.communicate())    # Returns empty output

It works fine on Windows and Python 2.x(communicate() returning the remaining output). So from the best guess it should be the expected behavior.

The reason behind this is that Popen.stdout is a BufferedReader. It stores all output in the buffer when calling read(). However, communicate() and the lower API _communicate() use a lower level method os.read() to get the output, which does not respect the underlying buffer. When an empty output is retrieved the file object is closed then.

First time to submit a bug report and pardon me if I am getting anything wrong.
msg374436 - (view) Author: Grégory Starck (g.starck@gmail.com) * Date: 2020-07-27 23:12
also affecting 3.6
msg374769 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-08-03 23:30
Calling proc.communicate() after proc.stdout.read() doesn't seem to be supported. What is your use case? Why not just calling communicate()? Why not only using stdout directly?
msg374783 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2020-08-04 01:55
A workaround should be pass bufsize=0.

There might be performance consequences.  That depends on your read patterns and child process.

If this is to be supported and fixed, the selectors used in POpen._communicate on the POSIX side presumably don't bother to look at buffered IO objects buffer.  https://github.com/python/cpython/blob/master/Lib/subprocess.py#L1959

manually consuming data from the stdout and stderr buffers, if any, before entering that loop is probably a fix.

Higher up the chain, should the https://docs.python.org/3/library/selectors.html be enhanced to support emptying the buffer on buffered IO objects?  That sounds complicated; probably even infeasible if in text mode.  In general it is understood that poll/select type APIs are meant to be used on unbuffered raw binary file objects.
History
Date User Action Args
2020-08-04 01:55:39gregory.p.smithsetmessages: + msg374783
2020-08-03 23:30:07vstinnersettitle: BufferedReader causes Popen.communicate losing the remaining output. -> subprocess: Calling Popen.communicate() after Popen.stdout.read() returns an empty string
nosy: + gregory.p.smith

messages: + msg374769

versions: - Python 3.6, Python 3.7, Python 3.8, Python 3.9
components: - 2to3 (2.x to 3.x conversion tool), IO
2020-07-27 23:12:07g.starck@gmail.comsetnosy: + g.starck@gmail.com

messages: + msg374436
versions: + Python 3.6
2020-07-27 16:41:48brett.cannonsetnosy: - brett.cannon
2020-07-27 08:27:52Frost Mingsettype: behavior
2020-07-27 08:26:25Frost Mingcreate