classification
Title: binary stdio
Type: behavior Stage:
Components: IO, Windows Versions: Python 3.1, Python 3.2, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, benjamin.peterson, brian.curtin, gvanrossum, haypo, pitrou, v+python
Priority: normal Keywords: patch

Created on 2011-01-06 02:11 by v+python, last changed 2012-08-03 23:53 by haypo. This issue is now closed.

Files
File name Uploaded Description Edit
test.py v+python, 2011-01-06 02:11 test of binary stdout
stdio_binary.patch haypo, 2011-01-06 13:03
parser_translate_newline.patch haypo, 2011-01-06 14:30
io_binary_windows.patch haypo, 2011-01-06 21:20
parser_translate_newline-2.patch haypo, 2011-01-07 18:50
Messages (29)
msg125500 - (view) Author: Glenn Linderman (v+python) * Date: 2011-01-06 02:11
Per Antoine's request, I wrote this test code, it isn't elegant, I whipped it together quickly; but it shows the issue.  The issue may be one of my ignorance, but it does show the behavior I described in issue 4953. Here's the output from the various test parameters that might be useful in running the test.

>c:\python32\python.exe test.py test 1
['c:\\python32\\python.exe', 'test.py', '1']
All OK

>c:\python32\python.exe test.py test 2
['c:\\python32\\python.exe', 'test.py', '2']
Not OK: b'abc\r\r\ndef\r\r\n'

>c:\python32\python.exe test.py test 3
['c:\\python32\\python.exe', 'test.py', '3']
All OK

>c:\python32\python.exe test.py test 4
['c:\\python32\\python.exe', 'test.py', '4']
Not OK: b'abc\r\r\ndef\r\r\n'

>c:\python32\python.exe test.py test 1-u
['c:\\python32\\python.exe', '-u', 'test.py', '1-u']
All OK

>c:\python32\python.exe test.py test 2-u
['c:\\python32\\python.exe', '-u', 'test.py', '2-u']
All OK

>c:\python32\python.exe test.py test 3-u
['c:\\python32\\python.exe', '-u', 'test.py', '3-u']
All OK

>c:\python32\python.exe test.py test 4-u
['c:\\python32\\python.exe', '-u', 'test.py', '4-u']
All OK

>

Note that test 2 and 4, which do not use the mscvrt stuff, have double \r: one sent by the code, and another added, apparently by MSC newline processing.  test 2-u and 4-u, which are invoking the subprocess with Python's -u parameter, also do not exhibit the problem, even though the mscvrt stuff is not used.  This seems to indicate that Python's -u parameter does approximately the same thing as my windows_binary function.

Seems like if Python already has code for this, that it would be nice to either make it more easily available to the user as an API (like my windows_binary function, invoked with a single line) in the io or sys modules (since it is used to affect sys.std* files).

And it would be nice if the function "worked cross-platform", even if it is a noop on most platforms.
msg125503 - (view) Author: Glenn Linderman (v+python) * Date: 2011-01-06 02:16
tested on Windows, for those that aren't following issue 4953
msg125510 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-01-06 06:52
What are the results if, instead of piping through subprocess, you simply redirect stdout to a file (using "... > myfile.txt")?
msg125511 - (view) Author: Glenn Linderman (v+python) * Date: 2011-01-06 07:06
The same.  This can be tested with the same test program,

c:\python32\python.exe test.py 1 > test1.txt

similar for 2, 3, 4.  Then add -u and repeat.  All 8 cases produce the same results, either via a pipe, or with a redirected stdout.
msg125512 - (view) Author: Glenn Linderman (v+python) * Date: 2011-01-06 07:13
Actually, it seems like this "-u" behaviour, should simply be the default for Python 3.x on Windows.  The new IO subsystem seems to be able to add \r when desired anyway.  And except for Notepad, most programs on Windows can deal with \r\n or solo \n anyway.  \r\r\n doesn't cause too many problems for very many programs, but is (1) non-standard (2) wasteful of bytes (3) does cause problems for CGI programs, and likely some others... I haven't done a lot of testing with that case, but tried a few programs, and they dealt with it gracefully.
msg125513 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-01-06 07:20
> Actually, it seems like this "-u" behaviour, should simply be the
> default for Python 3.x on Windows.

There is nothing in "-u" which should explain the behaviour you're
observing, so either way it's a bug. Making "-u" the default is quite
orthogonal.
msg125515 - (view) Author: Glenn Linderman (v+python) * Date: 2011-01-06 07:30
Is there an easy way for me to find the code for -u?  I haven't learned my way around the Python sources much, just peeked in modules that I've needed to fix or learn something from a little.  I'm just surprised you think it is orthogonal, but I'm glad you agree it is a bug.
msg125516 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-01-06 07:51
> Is there an easy way for me to find the code for -u?

"-u" just passes 0 for the "buffering" argument to open() when creating
stdout and stderr. Otherwise "buffering" equals -1.
You can find equivalent code for open() in Lib/_pyio.py (the actual code
in is in C).

The practical difference is that "-u" removes the binary buffering layer
between the unicode layer and the raw file object:

$ ./python -c "import sys; print(sys.stdout, sys.stdout.buffer)"
<_io.TextIOWrapper name='<stdout>' encoding='UTF-8'> <_io.BufferedWriter name='<stdout>'>
$ ./python -u -c "import sys; print(sys.stdout, sys.stdout.buffer)"
<_io.TextIOWrapper name='<stdout>' encoding='UTF-8'> <_io.FileIO name='<stdout>' mode='wb'>

This should make absolutely no difference as to newline handling, since
that is handled exclusively in TextIOWrapper (binary layers are
transparent by construction).
msg125521 - (view) Author: Glenn Linderman (v+python) * Date: 2011-01-06 08:42
I can read and understand C well enough, having coded in it for about 40 years now... but I left C for Perl and Perl for Python, I try not to code in C when I don't have to, these days, as the P languages are more productive, overall.

But there has to be special handling somewhere for opening std*, because they are already open, unlike other files.  That is no doubt where the bug is.  Can you point me at that code?
msg125523 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-01-06 08:48
> But there has to be special handling somewhere for opening std*,
> because they are already open, unlike other files.  That is no doubt
> where the bug is.  Can you point me at that code?

See initstdio() in Python/pythonrun.c
msg125525 - (view) Author: Glenn Linderman (v+python) * Date: 2011-01-06 08:55
I suppose the FileIO in _io is next to look at, wherever it can be found.
msg125526 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-01-06 08:59
> I suppose the FileIO in _io is next to look at, wherever it can be found.

I don't get what you're looking for. FileIO is involved in both the
buffered and unbuffered cases, so it shouldn't make a difference.
In any case, please look into Modules/_io
msg125530 - (view) Author: Glenn Linderman (v+python) * Date: 2011-01-06 09:16
Found it.

The file browser doesn't tell what line number it is, but in _io/Fileio.c function fileio_init, there is code like

#ifdef O_BINARY
    flags |= O_BINARY;
#endif

#ifdef O_APPEND
    if (append)
        flags |= O_APPEND;
#endif

    if (fd >= 0) {
        if (check_fd(fd))
            goto error;
        self->fd = fd;
        self->closefd = closefd;
    }


Note that if O_BINARY is defined, it is set into the default flags for opening files by name.  But if "opening" a file by fd, the fd is copied, regardless of whether it has O_BINARY set or not.  The rest of the IO code no doubt assumes the file was opened in O_BINARY mode.  But that isn't true of MSC std* handles by default.

How -u masks or overcomes this problem is not obvious, as yet, but the root bug seems to be the assumption in the above code.  A setmode of O_BINARY should be done, probably #ifdef O_BINARY, when attaching a MS C fd to a Python IO stack.  Otherwise it is going to have \r\r\n problems, it would seem.

Alternately, in the location where the Python IO stacks are attached to std* handles, those specific std* handles should have the setmode done there... other handles, if opened by Python, likely already have it done.

Documentation for open should mention, in the description of the file parameter, that on Windows, it is important to only attach Python IO stack to O_BINARY files, or beware the consequences of two independent newline handling algorithms being applied to the data stream... or to document that setmode O_BINARY will be performed on the handles passed to open.
msg125534 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-01-06 09:29
Ok, actually it boils down to the following code in Modules/main.c:

    if (Py_UnbufferedStdioFlag) {
#if defined(MS_WINDOWS) || defined(__CYGWIN__)
        _setmode(fileno(stdin), O_BINARY);
        _setmode(fileno(stdout), O_BINARY);
#endif

... which explains things quite clearly! Archeology leads to r7409 by Guido in 1997:

changeset:   4916:5af9f8a98d93
branch:      trunk
user:        guido
date:        Sat Jan 11 20:28:55 1997 +0100
files:       Modules/main.c
description:
[svn r7409] On Windows, -u implies binary mode for stdin/stdout
(as well as unbuffered stdout/stderr).
msg125537 - (view) Author: Glenn Linderman (v+python) * Date: 2011-01-06 09:37
Don't find "initstdio" "stdio" in pythonrun.c.  Has it moved?  There are precious few references to stdin, stdout, stderr in that module, mostly for attaching the default encoding.
msg125539 - (view) Author: Glenn Linderman (v+python) * Date: 2011-01-06 09:43
stderr is notable by its absence in the list of O_BINARY adjustments.

So -u does do 2/3 of what my windows_binary() does :)  Should I switch my test case to use stderr to demonstrate that it doesn't help with that?  I vaguely remember that early versions of DOS didn't do stderr, but I thought by the time Windows came along, let's see, was that about 1983?, that stderr was codified for DOS/Windows.  For sure it has never been missing in WinNT 4.0 +.
msg125550 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-01-06 10:36
> So -u does do 2/3 of what my windows_binary() does :)  Should I switch
> my test case to use stderr to demonstrate that it doesn't help with
> that?

Well, judging by the history of this code, selectively putting -u in
binary mode may be justified by the fact that Python 2 relied on the C
runtime's stdio FILE pointers, and therefore on the C runtime's own
newline translation. I would say that Python 3 should put all stdio fds
in binary mode, regardless of the -u switch.
msg125552 - (view) Author: Glenn Linderman (v+python) * Date: 2011-01-06 11:15
Makes sense to me.  Still should document the open file parameter when passed an fd, and either tell the user that it should be O_BINARY, or that it will be O_BINARYd for them, whichever technique is chosen.  But having two newline techniques is bad, and if Python thinks it is tracking the file pointer, but Windows is doing newline translation for it, then it isn't likely tracking it correctly for random access IO.  So I think the choice should be that any fd passed in to open on Windows should get O_BINARYd immediately.
msg125553 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-01-06 13:03
Attached stdio_binary.patch always set stdin, stdout and stderr in binary mode on Windows (and not only with -u command line flag). I added the following comment, is it correct?

    /* don't translate newlines */
msg125554 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-01-06 14:30
Using my patch, the interpreter fails with a SyntaxError on \r: attached patch translates newlines to avoid this problem.
msg125561 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-01-06 16:02
> Using my patch, the interpreter fails with a SyntaxError on \r:
> attached patch translates newlines to avoid this problem.

See also issue #4628.
msg125575 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2011-01-06 18:18
Not sure why I've been added, but I agree that all fds should always be in binary mode in Python 3.
msg125578 - (view) Author: Glenn Linderman (v+python) * Date: 2011-01-06 19:00
Victor,  Thanks for your interest and patches.

msg125530 points out the location of the code where _all_ fds could be O_BINARYed, when passed in to open.  I think this would make all fds open in binary mode, per Guido's comment... he made exactly the comment I was hoping for, even though I didn't +nosy him... I believe this would catch std* along the way, and render your first patch unnecessary, but your second one would likely still be needed.
msg125590 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-01-06 21:20
New patch setting O_BINARY mode using _setmode() (on Windows) to stdin, stdout, stderr, but also to all FileIO() objects.

The patch on Py_Main() is still needed because Python may use sys.stderr when it is still a stdprinter object, before the final TextIOWrapper object is created (before the io module is loaded).

I don't know the effect of _setmode(stderr, O_BINARY) on calls to fputs(stderr, ...) in Py_FatalError().

I would appreciate a review of parser_translate_newline.patch.
msg125595 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2011-01-06 22:20
On Thu, Jan 6, 2011 at 1:20 PM, STINNER Victor <report@bugs.python.org> wrote:
> I don't know the effect of _setmode(stderr, O_BINARY) on calls to fputs(stderr, ...) in Py_FatalError().

On Windows it will write lines ending in LF only instead of CRLF. Most
tools to read text files should be able deal with that by now,
conditioned as they are by many years of Unix developers doing Windows
stuff on the side. (Somebody should test what Notepad does though. :-)

I don't know any other supported platforms where O_BINARY exists and
is not a no-op (but I haven't really looked :-).
msg125680 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-01-07 18:50
I commited io_binary_windows.patch + parser_translate_newline.patch as r87824. I fixed the patch on the parser to avoid leak on newtok if translate_newlines() fails.
msg125697 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-01-07 19:49
Tests pass on Windows 7 buildbot, the two other XP buildbots have unrelated issues. I also tested on my XP VM: all tests pass. So I close this issue.
msg125703 - (view) Author: Glenn Linderman (v+python) * Date: 2011-01-07 20:14
Thanks for your work on this Victor, and other commenters also.
msg167383 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-08-03 23:53
For the record, this change (always set stdout and stderr in binary mode on Windows) introduced (at least???) the following regressions:

 - #11272: "input() has trailing carriage return on windows", fixed in Python 3.2.1
 - #11395: "print(s) fails on Windows with long strings", fixed in Python 3.2.1
 - #13119: "Newline for print() is \n on Windows, and not \r\n as expected", will be fixed in Python 3.2.4 and 3.3 (not released yet)
History
Date User Action Args
2012-08-03 23:53:32hayposetmessages: + msg167383
2011-01-07 20:14:41v+pythonsetnosy: gvanrossum, amaury.forgeotdarc, pitrou, haypo, benjamin.peterson, v+python, brian.curtin
messages: + msg125703
2011-01-07 19:49:10hayposetstatus: open -> closed

messages: + msg125697
resolution: fixed
nosy: gvanrossum, amaury.forgeotdarc, pitrou, haypo, benjamin.peterson, v+python, brian.curtin
2011-01-07 18:50:08hayposetfiles: + parser_translate_newline-2.patch
nosy: gvanrossum, amaury.forgeotdarc, pitrou, haypo, benjamin.peterson, v+python, brian.curtin
messages: + msg125680
2011-01-06 22:20:43gvanrossumsetnosy: gvanrossum, amaury.forgeotdarc, pitrou, haypo, benjamin.peterson, v+python, brian.curtin
messages: + msg125595
2011-01-06 21:23:06pebbesetnosy: - pebbe
2011-01-06 21:20:45hayposetfiles: + io_binary_windows.patch
nosy: gvanrossum, amaury.forgeotdarc, pitrou, haypo, benjamin.peterson, v+python, brian.curtin, pebbe
messages: + msg125590
2011-01-06 19:00:42v+pythonsetnosy: gvanrossum, amaury.forgeotdarc, pitrou, haypo, benjamin.peterson, v+python, brian.curtin, pebbe
messages: + msg125578
2011-01-06 18:18:26gvanrossumsetnosy: gvanrossum, amaury.forgeotdarc, pitrou, haypo, benjamin.peterson, v+python, brian.curtin, pebbe
messages: + msg125575
2011-01-06 16:02:21hayposetnosy: gvanrossum, amaury.forgeotdarc, pitrou, haypo, benjamin.peterson, v+python, brian.curtin, pebbe
messages: + msg125561
2011-01-06 15:52:21pitrousetnosy: + benjamin.peterson
2011-01-06 14:30:55hayposetfiles: + parser_translate_newline.patch
nosy: gvanrossum, amaury.forgeotdarc, pitrou, haypo, v+python, brian.curtin, pebbe
messages: + msg125554
2011-01-06 13:03:39hayposetfiles: + stdio_binary.patch

nosy: + haypo
messages: + msg125553

keywords: + patch
2011-01-06 11:15:55v+pythonsetnosy: gvanrossum, amaury.forgeotdarc, pitrou, v+python, brian.curtin, pebbe
messages: + msg125552
2011-01-06 11:13:05pebbesetnosy: + pebbe
2011-01-06 10:36:22pitrousetnosy: gvanrossum, amaury.forgeotdarc, pitrou, v+python, brian.curtin
messages: + msg125550
2011-01-06 09:43:49v+pythonsetnosy: gvanrossum, amaury.forgeotdarc, pitrou, v+python, brian.curtin
messages: + msg125539
2011-01-06 09:37:36v+pythonsetnosy: gvanrossum, amaury.forgeotdarc, pitrou, v+python, brian.curtin
messages: + msg125537
2011-01-06 09:29:53pitrousetnosy: + gvanrossum

messages: + msg125534
versions: + Python 2.7
2011-01-06 09:16:29v+pythonsetnosy: amaury.forgeotdarc, pitrou, v+python, brian.curtin
messages: + msg125530
2011-01-06 08:59:17pitrousetnosy: amaury.forgeotdarc, pitrou, v+python, brian.curtin
messages: + msg125526
2011-01-06 08:55:11v+pythonsetnosy: amaury.forgeotdarc, pitrou, v+python, brian.curtin
messages: + msg125525
2011-01-06 08:48:17pitrousetnosy: amaury.forgeotdarc, pitrou, v+python, brian.curtin
messages: + msg125523
2011-01-06 08:42:25v+pythonsetnosy: amaury.forgeotdarc, pitrou, v+python, brian.curtin
messages: + msg125521
2011-01-06 07:51:31pitrousetnosy: amaury.forgeotdarc, pitrou, v+python, brian.curtin
messages: + msg125516
2011-01-06 07:30:24v+pythonsetnosy: amaury.forgeotdarc, pitrou, v+python, brian.curtin
messages: + msg125515
2011-01-06 07:20:27pitrousetnosy: amaury.forgeotdarc, pitrou, v+python, brian.curtin
messages: + msg125513
2011-01-06 07:13:22v+pythonsetnosy: amaury.forgeotdarc, pitrou, v+python, brian.curtin
messages: + msg125512
2011-01-06 07:06:28v+pythonsetnosy: amaury.forgeotdarc, pitrou, v+python, brian.curtin
messages: + msg125511
2011-01-06 06:52:57pitrousetnosy: amaury.forgeotdarc, pitrou, v+python, brian.curtin
messages: + msg125510
2011-01-06 06:24:19pitrousetnosy: + pitrou
2011-01-06 06:24:03pitrousetnosy: + amaury.forgeotdarc, brian.curtin

components: + Windows
versions: + Python 3.1, Python 3.2
2011-01-06 02:16:42v+pythonsettype: behavior
messages: + msg125503
components: + IO
2011-01-06 02:11:53v+pythoncreate