This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: POpen bufsize=0 ignored with universal_newlines=True
Type: behavior Stage: patch review
Components: IO Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: eryksun, gregory.p.smith, pitrou, yann
Priority: normal Keywords: patch

Created on 2020-07-06 19:54 by yann, last changed 2022-04-11 14:59 by admin.

Files
File name Uploaded Description Edit
testproc-unbuffered.py yann, 2020-07-06 19:54
Pull Requests
URL Status Linked Edit
PR 25859 open yann, 2021-05-03 20:35
Messages (6)
msg373165 - (view) Author: Yann Dirson (yann) * Date: 2020-07-06 19:54
On a POpen object created with bufsize=0, stdout.readline() does a buffered reading with python3, whereas in 2.7 it did char-by-char reading.  See attached example.

As a result, a poll including the stdout object suffers a behaviour change when stdout is ready for writing and there is more than one line of data available.  In both cases we get notified by poll() that data is available on the fd and we can stdout.readline() and get back to our polling loop.  Then:

* with python2 poll() then returns immediately and stdout.readline() will then return the next line

* with python3 poll() now blocks

Running the attached example under strace reveals the underlying difference:

 write(4, "go\n", 3)                     = 3
 poll([{fd=5, events=POLLIN|POLLERR|POLLHUP}], 1, -1) = 1 ([{fd=5, revents=POLLIN}])
-read(5, "x", 1)                         = 1
-read(5, "x", 1)                         = 1
-read(5, "x", 1)                         = 1
-read(5, "x", 1)                         = 1
-read(5, "x", 1)                         = 1
-read(5, "x", 1)                         = 1
-read(5, "x", 1)                         = 1
-read(5, "x", 1)                         = 1
-read(5, "x", 1)                         = 1
-read(5, "x", 1)                         = 1
-read(5, "x", 1)                         = 1
-read(5, "x", 1)                         = 1
-read(5, "\n", 1)                        = 1
-fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0x2), ...}) = 0
+read(5, "xxxxxxxxxxxx\nyyyyyyyyyyyyyyy\naaa"..., 8192) = 74
 write(1, ">xxxxxxxxxxxx\n", 14)         = 14


We can see a buffered read, which explains the behaviour difference.

Changing to bufsize=1, strace does not show a difference here.

This is especially troubling, as the first note in https://docs.python.org/3/library/io.html#class-hierarchy mentions that even in buffered mode there is an unoptimized readline() implementation.
msg392714 - (view) Author: Yann Dirson (yann) * Date: 2021-05-02 18:46
With upcoming 3.10 phasing out 2.7 compatibility I have to find a solution to this, so I'm back digging here.

Even .read(1) on a subprocess pipe causes an underlying buffered read, so working around the problem by a loop of 1-byte reads has to do with os.read(), though its usage on file-like object is discouraged.

It looks like one of those would be needed, depending on the expected semantics of `POpen`'s `bufsize` parameter:

* use the provided bufsize for the underlying buffering
* provide a dummy pipe fd through fileno(), feeding it data as long as a read() call leaves data in the underlying buffer (indeed a simple conditional 1-byte read or write to the pipe before returning to caller should provide the correct semantics)
msg392795 - (view) Author: Yann Dirson (yann) * Date: 2021-05-03 10:51
Relevant commits include this one from v3.1.4:

 commit 877766dee8e60c7971ed0cabba89fbe981c2ab1b
 Author: Antoine Pitrou <solipsis@pitrou.net>
 Date:   Sat Mar 19 17:00:37 2011 +0100

    Issue #11459: A `bufsize` value of 0 in subprocess.Popen() really creates
    unbuffered pipes, such that select() works properly on them.


I can't use that commit without cherry-picking this one from v3.2.2, though:

 commit e96ec6810184f5daacb2d47ab8801365c99bb206
 Author: Antoine Pitrou <solipsis@pitrou.net>
 Date:   Sat Jul 23 21:46:35 2011 +0200

    Issue #12591: Allow io.TextIOWrapper to work with raw IO objects (without
    a read1() method), and add an undocumented *write_through* parameter to
    mandate unbuffered writes.


And my test script still shows the same behaviour, with poll.poll() or poll.select().

The fact that my stdout object has no read1() and needs the above patch looks like a good lead for further investigation?
msg392840 - (view) Author: Yann Dirson (yann) * Date: 2021-05-03 20:39
> The fact that my stdout object has no read1() and needs the above patch looks like a good lead for further investigation?

That's linked to universal_newlines, the bug only shows when that flag is set.

Testcases provided in https://github.com/python/cpython/pull/25859
msg409016 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2021-12-22 09:45
Hmm, sorry for not responding earlier.

Buffering is necessary for implementing the universal_newlines behaviour (I don't know how we could do otherwise?). This has the unavoidable side effect that the Python buffered file object is not in sync with the underlying file descriptor, so that using `p.stdout` in a `select` call will give you inaccurate information.

So it seems like this is perhaps a documentation issue. What do you think?
msg409031 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2021-12-22 13:01
> Buffering is necessary for implementing the universal_newlines 

Why is that? I can see that it requires newline state tracking, and the allowance to make two read(fd, &c, 1) system calls for a single read(1) method call, in case a "\n" has to be ignored.

testproc-unbuffered.py runs to completion in 3.11 if the following statement that changes the text wrapper's chunk size is added right after creating the Popen() instance:

    if sys.version_info[0] > 2:
        process.stdout._CHUNK_SIZE = 1

The initial chunk size for a text wrapper is hard coded as 8192 bytes. For some reason the constructor has no parameter for it.
History
Date User Action Args
2022-04-11 14:59:33adminsetgithub: 85394
2021-12-22 13:01:55eryksunsetnosy: + eryksun
messages: + msg409031
2021-12-22 09:45:17pitrousetmessages: + msg409016
2021-12-22 02:52:16gregory.p.smithsetnosy: + gregory.p.smith
2021-12-12 00:01:55iritkatrielsetversions: + Python 3.10, Python 3.11, - Python 3.5, Python 3.6, Python 3.7, Python 3.8
2021-10-22 18:10:02iritkatrielsetnosy: + pitrou
2021-05-03 20:39:03yannsetmessages: + msg392840
title: Undocumented behaviour change of POpen.stdout.readine with bufsize=0 or =1 -> POpen bufsize=0 ignored with universal_newlines=True
2021-05-03 20:35:52yannsetkeywords: + patch
stage: patch review
pull_requests: + pull_request24542
2021-05-03 10:51:05yannsetmessages: + msg392795
2021-05-02 18:46:13yannsetmessages: + msg392714
versions: + Python 3.9
2020-07-06 19:54:02yanncreate