classification
Title: python --help: -u is misdocumented as binary mode
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: berker.peksag, eryksun, gdr@garethrees.org, ncoghlan, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2016-11-09 10:26 by arigo, last changed 2017-10-13 13:23 by berker.peksag. This issue is now closed.

Files
File name Uploaded Description Edit
issue28647.patch gdr@garethrees.org, 2016-11-12 18:36 review
Pull Requests
URL Status Linked Edit
PR 1655 closed berker.peksag, 2017-05-18 19:49
PR 1796 closed matrixise, 2017-05-24 20:57
PR 1797 closed matrixise, 2017-05-24 21:17
PR 3954 merged berker.peksag, 2017-10-11 13:44
PR 3961 merged berker.peksag, 2017-10-12 05:58
Messages (32)
msg280390 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2016-11-09 10:26
``python3.5 --help`` gives this information:

    -u : unbuffered binary stdout and stderr, stdin always buffered

However, stdout and stderr are actually always opened in text mode, and print() always expects a string and never a bytes object.  This usage of "binary" in the --help is in contradiction with the usage of "binary" in the description of files (e.g. ``help(open)``).
msg280391 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-11-09 10:30
Right. Moreover, "unbuffered" is wrong. It's line buffered: sys.stdout.buffer is directly a io.FileIO object, but TextIOWrapper only calls buffer.write() when the message contains a newline character. It's not (currently) possible to have a fully unbuffered stdout.
msg280392 - (view) Author: Eryk Sun (eryksun) * Date: 2016-11-09 11:04
> It's not (currently) possible to have a fully unbuffered stdout.

Why doesn't create_stdio also pass `write_through = Py_True` when Py_UnbufferedStdioFlag is set? This would immediately pass writes through to the FileIO object, even without containing a newline (i.e. it sets text_needflush in _io_TextIOWrapper_write_impl).
msg280393 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-11-09 12:47
I don't recall, I would have to search in old issues for the rationale. But
the explanation is probably performance, reduce the number of syscalls.
msg280394 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-09 13:18
The write_through argument was added only in 3.3. I think this is a good idea to pass `write_through = Py_True` when Py_UnbufferedStdioFlag is set. Using -u means that performance is not important (that is why this is not default behavior).
msg280405 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-11-09 14:58
Would it make sense to have two modes: line buferred and unbuffered,
as the C function setvbuf() for stdout?
msg280665 - (view) Author: Gareth Rees (gdr@garethrees.org) * Date: 2016-11-12 18:21
The output of "python3.5 --help" says:

    -u : unbuffered binary stdout and stderr, stdin always buffered;
         also PYTHONUNBUFFERED=x
         see man page for details on internal buffering relating to '-u'

If you look at the man page as instructed then you'll see a clearer
explanation:

    -u   Force  the  binary  I/O  layers  of  stdout  and  stderr  to  be
         unbuffered.  stdin is always buffered.  The text I/O layer  will
         still be line-buffered.

For example, if you try this:

    python3.5 -uc 'import sys,time;w=sys.stdout.buffer.write;w(b"a");time.sleep(1);w(b"b");'

then you'll see that the binary output is indeed unbuffered as
documented.

The output of --help is trying to abbreviate this explanation, but I
think it's abbreviated too much. The explanation from the man page
seems clear to me, and is only a little longer, so I suggest changing
the --help output to match the man page.
msg280666 - (view) Author: Gareth Rees (gdr@garethrees.org) * Date: 2016-11-12 18:36
Here's a patch that copies the text for the -u option from the man page to the --help output.
msg289649 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-03-15 08:54
I just ran into this discrepancy working on the test cases for PEP 538 - +1 for Gareth's suggested approach of just aligning the `--help` output with the man page.
msg293961 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-05-19 16:05
Issue30404 makes stdout and stderr truly unbuffered when run with -u.
msg304136 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2017-10-11 13:49
Pull request for issue 30404 has been merged so we only need the documentation patch for the 3.6 branch (unfortunately 3.5 is now in security-fix-only mode) I've opened PR 3954.
msg304139 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2017-10-11 14:10
New changeset 5f908005ce16b06d5af7b413264009c4b062f33c by Berker Peksag in branch '3.6':
bpo-28647: Update -u documentation (GH-3954)
https://github.com/python/cpython/commit/5f908005ce16b06d5af7b413264009c4b062f33c
msg304140 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2017-10-11 14:10
Thank you for the patch, Gareth.
msg304141 - (view) Author: Gareth Rees (gdr@garethrees.org) * Date: 2017-10-11 14:29
You're welcome.
msg304147 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-10-11 15:08
I just checked the master branch:

-u     : unbuffered binary stdout and stderr, stdin always buffered;
         also PYTHONUNBUFFERED=x
         see man page for details on internal buffering relating to '-u'

The doc is wrong. stdout and stderr are fully unbuferred since Serhiy changed them: commit 77732be801c18013cfbc86e27fcc50194ca22c8e, bpo-30404.
msg304206 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2017-10-12 06:00
Good catch, I thought it was already fixed in master after bpo-30404. I've opened PR 3961.
msg304207 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-12 06:18
Don't forget to update Misc/python.man.
msg304211 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-12 07:43
The -u option doesn't affect the buffering of stdin. Is it worth to document this explicitly? Or it just adds a noise? Maybe remove this from the help and manpage but add in the RST documentation?
msg304216 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2017-10-12 07:55
I think it's better to be explicit and mention 'stdin' in this case. Plus, "stdin is always buffered" is quite short so potential of creating noise is quite minimal IMO.

(FYI, I asked this question on GitHub: https://github.com/python/cpython/pull/3961#issuecomment-336035358)
msg304228 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-12 11:05
In Python 2 there is an internal buffering in xreadlines(), readlines() and file-object iterators. You need to enter many lines first that the program get the first of them. And -u doesn't help.

But in Python 3 the program gets the input right as it becomes available. Reading is not blocked if the input is available. There are internal buffers, but they affect only performance, not the behavior. If you can edit a line before pressing Enter, this is because your terminal buffers a line before sending it to the program. I think it is more correct to say that stdin is always unbuffered in Python 3.
msg304231 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-10-12 13:15
Serhiy: "(...) I think it is more correct to say that stdin is always unbuffered in Python 3."

I disagree. Technically, sys.stdin.read(1) reads up to 1024 bytes from the file descriptor 0. For me, "unbuffered read" means that read(1) reads a single byte.

Expected behaviour of an fully unbuffered stdin:

assert sys.stdin.read(1) == 'a'
assert os.read(0, 1) == b'b'

The program should not fail with an assertion error nor block if you write 'ab' characters into stdin.
msg304237 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-10-12 13:45
Serhiy Storchaka:
https://github.com/python/cpython/pull/3961#issuecomment-336136160

"I suggest to continue the discussion on the tracker."

Ok, let's continue here.

"We are fixing the outdated documentation inherited from Python 2. First than keep some statement we should consider what it means in the context of Python 2 and what it means in the context of Python 3."

stdin buffering is a complex thing.

When running the UNIX command "producer | consumer", many users are confused by the buffering on the *producer* side.

When running a program in a TTY, the TTY does line buffering for you, you cannot get immediately a single character (without changing the default TTY configuration).

I don't think that we need to say too much. I just suggest to say "stdin is always buffered". That's all.

See my previous messages for the my definition of "buffered" versus "unbuffered" read.

Note: Today I learned the UNIX "stdbuf" command, useful to configure the stdin, stdout and stderr buffering of C applications using <stdio.h>.
msg304240 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-12 14:17
stdin is mentioned in the documentation of the -u option only due to weird internal buffering in Python 2, because user can expect that -u disables it. It is documented what methods use internal buffering and how get rid of it. No other buffering is mentioned.

This no longer actual in Python 3. I think there is no need to mention stdin in the context of the -u option at all. -u doesn't affect stdin buffering, whatever that would mean. Period.

Alternatively you can include a lecture about different kinds of buffering and how -u doesn't affect them.
msg304242 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2017-10-12 14:38
> -u doesn't affect stdin buffering, whatever that would mean.

I think we need to document behavior of stdin somewhere, because current the sys.stdin documentation states:

> When interactive, standard streams are line-buffered. Otherwise, they
> are block-buffered like regular text files. You can override this value
> with the -u command-line option.

https://docs.python.org/3/library/sys.html#sys.stdin
msg304251 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-10-12 15:09
Interesting comment in create_stdio() of Python/pylifecycle.c:
---
    /* stdin is always opened in buffered mode, first because it shouldn't
       make a difference in common use cases, second because TextIOWrapper
       depends on the presence of a read1() method which only exists on
       buffered streams.
    */
    if (Py_UnbufferedStdioFlag && write_mode)
        buffering = 0;
    else
        buffering = -1;
---

stdin is always buffered ;-)

I created bpo-31775: "Support unbuffered TextIOWrapper".
msg304255 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-10-12 15:26
Serhiy: "I think there is no need to mention stdin in the context of the -u option at all. -u doesn't affect stdin buffering, whatever that would mean. Period."

Hum, I propose to mention stdin in -u documentation as: "The option has no effect on stdin." What do you think?
msg304256 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-10-12 15:27
> I think we need to document behavior of stdin somewhere, because current the sys.stdin documentation states:
>
>> When interactive, standard streams are line-buffered. Otherwise, they
>> are block-buffered like regular text files. You can override this value
>> with the -u command-line option.

The last sentence is wrong and should be removed from sys.stdin documentation, no?
msg304259 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-12 15:40
> Hum, I propose to mention stdin in -u documentation as: "The option has no effect on stdin." What do you think?

This LGTM too. More precise, it has no effect on stdin buffering. It has effect on the line_buffering attribute, but this attribute has no effect on reading.

> The last sentence is wrong and should be removed from sys.stdin documentation, no?

Or correct it, making it related only to stdout and stderr.
msg304331 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2017-10-13 12:16
New changeset 7f580970836b0f6bc9c5db868d95bea81a3e1558 by Berker Peksag in branch 'master':
bpo-28647: Update -u documentation after bpo-30404 (GH-3961)
https://github.com/python/cpython/commit/7f580970836b0f6bc9c5db868d95bea81a3e1558
msg304333 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2017-10-13 12:17
Thank you for reviews, Serhiy and Victor.
msg304334 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-10-13 13:01
Thanks Berker for this nice documentation enhancement! It was required.

Do we need to update Python 3.6 documentation using the commit 5f908005ce16b06d5af7b413264009c4b062f33c, or are we good? (sorry, I didn't check)
msg304336 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2017-10-13 13:23
Modules/main.c and Python.man is same in 3.6 branch. We could backport the change in Doc/library/sys.rst from 7f580970836b0f6bc9c5db868d95bea81a3e1558 but I didn't do it yet since it needs be manually backported.
History
Date User Action Args
2017-10-13 13:23:37berker.peksagsetmessages: + msg304336
2017-10-13 13:01:33vstinnersetmessages: + msg304334
2017-10-13 12:17:18berker.peksagsetstatus: open -> closed
resolution: fixed
messages: + msg304333

stage: patch review -> resolved
2017-10-13 12:16:36berker.peksagsetmessages: + msg304331
2017-10-12 15:40:39serhiy.storchakasetmessages: + msg304259
2017-10-12 15:27:40vstinnersetmessages: + msg304256
2017-10-12 15:26:34vstinnersetmessages: + msg304255
2017-10-12 15:09:53vstinnersetmessages: + msg304251
2017-10-12 14:38:09berker.peksagsetmessages: + msg304242
2017-10-12 14:17:18serhiy.storchakasetmessages: + msg304240
2017-10-12 13:45:45vstinnersetmessages: + msg304237
2017-10-12 13:15:16vstinnersetmessages: + msg304231
2017-10-12 11:05:33serhiy.storchakasetmessages: + msg304228
2017-10-12 07:55:16berker.peksagsetmessages: + msg304216
2017-10-12 07:43:25serhiy.storchakasetmessages: + msg304211
2017-10-12 06:18:55serhiy.storchakasetmessages: + msg304207
2017-10-12 06:00:37berker.peksagsetmessages: + msg304206
2017-10-12 05:58:58berker.peksagsetstage: resolved -> patch review
pull_requests: + pull_request3938
2017-10-11 15:08:54vstinnersetstatus: closed -> open
resolution: fixed -> (no value)
messages: + msg304147

versions: + Python 3.7
2017-10-11 14:29:50gdr@garethrees.orgsetmessages: + msg304141
2017-10-11 14:10:39berker.peksagsetstatus: open -> closed
resolution: fixed
messages: + msg304140

stage: patch review -> resolved
2017-10-11 14:10:05berker.peksagsetmessages: + msg304139
2017-10-11 13:49:29berker.peksagsetmessages: + msg304136
versions: - Python 3.5, Python 3.7
2017-10-11 13:44:50berker.peksagsetpull_requests: + pull_request3929
2017-05-24 21:38:18arigosetnosy: - arigo
2017-05-24 21:17:35matrixisesetpull_requests: + pull_request1880
2017-05-24 20:57:02matrixisesetpull_requests: + pull_request1879
2017-05-19 16:05:37serhiy.storchakasetmessages: + msg293961
2017-05-18 19:49:11berker.peksagsetpull_requests: + pull_request1749
2017-03-15 08:54:59ncoghlansetnosy: + ncoghlan
messages: + msg289649
2017-01-07 07:55:23berker.peksagsetnosy: + berker.peksag
stage: patch review
type: behavior

versions: + Python 3.6, Python 3.7
2016-11-12 18:36:51gdr@garethrees.orgsetfiles: + issue28647.patch
keywords: + patch
messages: + msg280666
2016-11-12 18:21:35gdr@garethrees.orgsetnosy: + gdr@garethrees.org
messages: + msg280665
2016-11-09 14:58:38vstinnersetmessages: + msg280405
2016-11-09 13:18:28serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg280394
2016-11-09 12:47:02vstinnersetmessages: + msg280393
2016-11-09 11:04:55eryksunsetnosy: + eryksun
messages: + msg280392
2016-11-09 10:30:40vstinnersetnosy: + vstinner
messages: + msg280391
2016-11-09 10:26:05arigocreate