This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Default buffering for input and output pipes in subprocess module
Type: behavior Stage: resolved
Components: Documentation Versions: Python 3.3, Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: docs@python, georg.brandl, lunixbochs, martin.panter, python-dev
Priority: normal Keywords:

Created on 2013-11-16 05:22 by martin.panter, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (6)
msg203010 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2013-11-16 05:22
Currently the documentation for the “bufsize” parameter in the “subprocess” module says:

"""
Changed in version 3.2.4,: 3.3.1

bufsize now defaults to -1 to enable buffering by default to match the behavior that most code expects. In 3.2.0 through 3.2.3 and 3.3.0 it incorrectly defaulted to 0 which was unbuffered and allowed short reads. This was unintentional and did not match the behavior of Python 2 as most code expected.
"""

First of all the formatting is a bit screwy. There’s a colon in the wrong place, so it’s not obvious that the “changed in version” heading applies to the following paragraph.

The main issue is that I got the impression the default of 0 was a regression, and that Python 3.1 and Python 2 defaulted to -1. However, as far as I can tell the default was actually 0 in 3.1 and 2.

The change to -1 was for Issue 17488, which seems to be focussed on the behaviour of reading from a subprocess’s output pipe. In Python 2, file.read() blocks to read as much as possible, even when buffering is disabled. In Python 3, you end up with either a FileIO or a BufferedIOBase object, and they have different read() behaviours.

Perhaps the documentation should say something like

"""
The “bufsize” argument now defaults to -1 to enable buffering. In 3.2.3, 3.3.0, and earlier, it defaulted to 0 which was unbuffered and allowed short reads.
"""

I would take out the “most code expects buffering” bits. Maybe most code expects the greedy read behaviour from output pipes, but I would say most code writing to an input pipe expects unbuffered behaviour. The big issue with buffering for me is that BufferedWriter.close() may raise a broken pipe condition.

If you want to mention Python 2, maybe say that Python 2 did not use buffering by default, but that file.read() always had blocking read behaviour, which can be emulated by using buffered reading in Python 3.
msg204305 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-11-25 07:52
New changeset 0f0dc0276a7c by Georg Brandl in branch '3.3':
Closes #19622: clarify message about bufsize changes in 3.2.4 and 3.3.1.
http://hg.python.org/cpython/rev/0f0dc0276a7c
msg204453 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2013-11-26 05:53
The updated text to “suprocess.rst” is better, but now it looks like the whole paragraph fails to render at http://docs.python.org/dev/library/subprocess#subprocess.Popen. I have no idea about the syntax but maybe the blank line separating “versionchanged” from its paragraph shouldn’t be there?
msg204457 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2013-11-26 07:25
Thanks, should be fixed now.
msg228106 - (view) Author: Ryan (lunixbochs) Date: 2014-10-01 18:33
This is not fixed. The documentation may be more correct now, but the behavior still does not match Python 2 as purported.

The default bufsize changed in 3.3.1 is incorrect, at least when tested in 3.4.0 and 3.4.1.

Here is a test for systems with cat available.

    import subprocess
    proc = subprocess.Popen('cat', stdin=subprocess.PIPE)
    proc.stdin.write('test\n'.encode('utf8'))

This test will succeed in Python 2.x and Python 3.0 - 3.3.0, but fail on 3.4.x. This is a regression as the documentation states "did not match the behavior of Python 2 as most code expected", while the current behavior definitely does not match Python 2.
msg228149 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2014-10-02 02:07
I agree that it is misleading to say it matches Python 2 behaviour, as I said in my original post. Do you think I should reopen this and get that bit removed from the documentation?

I don’t see an easy way to make the behaviour consistent in all cases. My understanding is Python 2 always defaulted to unbuffered pipe readers and writers, but they were always “greedy”, meaning the read() and write() methods only succeed after transferring all the data. This is basically the API for BufferedIOBase.read() and BufferedIOBase.write(). In Python 3, unbuffered pipe readers and writers implement the RawIOBase API and may succeed before all the data is transfered.

If you really wanted to change the behaviour to be consistent with Python 2, you could make Popen always return BufferedIOBase pipes (unless universal_newlines=True). But BufferedReader and BufferedWriter don’t seem to accept buffer_size=0. For a pipe reader, maybe BufferedReader(buffer_size=1) would work, but you would either have to hack the BufferedWriter class, or implement a new GreedyWriter class. However I suspect this would introduce deadlocks in programs that access more than one pipe at once (probably also a problem in Python 2).

Perhaps the best thing is to document the problems and then explicitly pass in whatever “bufsize” value is appropriate for your usage.
History
Date User Action Args
2022-04-11 14:57:53adminsetgithub: 63821
2014-10-02 02:07:55martin.pantersetmessages: + msg228149
2014-10-01 18:33:49lunixbochssetnosy: + lunixbochs

messages: + msg228106
versions: - Python 3.2
2013-11-26 07:25:48georg.brandlsetnosy: + georg.brandl
messages: + msg204457
2013-11-26 05:53:23martin.pantersetmessages: + msg204453
2013-11-25 07:52:31python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg204305

resolution: fixed
stage: resolved
2013-11-16 05:22:14martin.pantercreate