New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
binary stdio #55050
Comments
Per Antoine's request, I wrote this test code, it isn't elegant, I whipped it together quickly; but it shows the issue. The issue may be one of my ignorance, but it does show the behavior I described in bpo-4953. Here's the output from the various test parameters that might be useful in running the test.
Note that test 2 and 4, which do not use the mscvrt stuff, have double \r: one sent by the code, and another added, apparently by MSC newline processing. test 2-u and 4-u, which are invoking the subprocess with Python's -u parameter, also do not exhibit the problem, even though the mscvrt stuff is not used. This seems to indicate that Python's -u parameter does approximately the same thing as my windows_binary function. Seems like if Python already has code for this, that it would be nice to either make it more easily available to the user as an API (like my windows_binary function, invoked with a single line) in the io or sys modules (since it is used to affect sys.std* files). And it would be nice if the function "worked cross-platform", even if it is a noop on most platforms. |
tested on Windows, for those that aren't following bpo-4953 |
What are the results if, instead of piping through subprocess, you simply redirect stdout to a file (using "... > myfile.txt")? |
The same. This can be tested with the same test program, c:\python32\python.exe test.py 1 > test1.txt similar for 2, 3, 4. Then add -u and repeat. All 8 cases produce the same results, either via a pipe, or with a redirected stdout. |
Actually, it seems like this "-u" behaviour, should simply be the default for Python 3.x on Windows. The new IO subsystem seems to be able to add \r when desired anyway. And except for Notepad, most programs on Windows can deal with \r\n or solo \n anyway. \r\r\n doesn't cause too many problems for very many programs, but is (1) non-standard (2) wasteful of bytes (3) does cause problems for CGI programs, and likely some others... I haven't done a lot of testing with that case, but tried a few programs, and they dealt with it gracefully. |
There is nothing in "-u" which should explain the behaviour you're |
Is there an easy way for me to find the code for -u? I haven't learned my way around the Python sources much, just peeked in modules that I've needed to fix or learn something from a little. I'm just surprised you think it is orthogonal, but I'm glad you agree it is a bug. |
"-u" just passes 0 for the "buffering" argument to open() when creating The practical difference is that "-u" removes the binary buffering layer $ ./python -c "import sys; print(sys.stdout, sys.stdout.buffer)"
<_io.TextIOWrapper name='<stdout>' encoding='UTF-8'> <_io.BufferedWriter name='<stdout>'>
$ ./python -u -c "import sys; print(sys.stdout, sys.stdout.buffer)"
<_io.TextIOWrapper name='<stdout>' encoding='UTF-8'> <_io.FileIO name='<stdout>' mode='wb'> This should make absolutely no difference as to newline handling, since |
I can read and understand C well enough, having coded in it for about 40 years now... but I left C for Perl and Perl for Python, I try not to code in C when I don't have to, these days, as the P languages are more productive, overall. But there has to be special handling somewhere for opening std*, because they are already open, unlike other files. That is no doubt where the bug is. Can you point me at that code? |
See initstdio() in Python/pythonrun.c |
I suppose the FileIO in _io is next to look at, wherever it can be found. |
I don't get what you're looking for. FileIO is involved in both the |
Found it. The file browser doesn't tell what line number it is, but in _io/Fileio.c function fileio_init, there is code like #ifdef O_BINARY
flags |= O_BINARY;
#endif
#ifdef O_APPEND
if (append)
flags |= O_APPEND;
#endif
if (fd >= 0) {
if (check_fd(fd))
goto error;
self->fd = fd;
self->closefd = closefd;
} Note that if O_BINARY is defined, it is set into the default flags for opening files by name. But if "opening" a file by fd, the fd is copied, regardless of whether it has O_BINARY set or not. The rest of the IO code no doubt assumes the file was opened in O_BINARY mode. But that isn't true of MSC std* handles by default. How -u masks or overcomes this problem is not obvious, as yet, but the root bug seems to be the assumption in the above code. A setmode of O_BINARY should be done, probably #ifdef O_BINARY, when attaching a MS C fd to a Python IO stack. Otherwise it is going to have \r\r\n problems, it would seem. Alternately, in the location where the Python IO stacks are attached to std* handles, those specific std* handles should have the setmode done there... other handles, if opened by Python, likely already have it done. Documentation for open should mention, in the description of the file parameter, that on Windows, it is important to only attach Python IO stack to O_BINARY files, or beware the consequences of two independent newline handling algorithms being applied to the data stream... or to document that setmode O_BINARY will be performed on the handles passed to open. |
Ok, actually it boils down to the following code in Modules/main.c: if (Py_UnbufferedStdioFlag) {
#if defined(MS_WINDOWS) || defined(__CYGWIN__)
_setmode(fileno(stdin), O_BINARY);
_setmode(fileno(stdout), O_BINARY);
#endif ... which explains things quite clearly! Archeology leads to r7409 by Guido in 1997: changeset: 4916:5af9f8a98d93 |
Don't find "initstdio" "stdio" in pythonrun.c. Has it moved? There are precious few references to stdin, stdout, stderr in that module, mostly for attaching the default encoding. |
stderr is notable by its absence in the list of O_BINARY adjustments. So -u does do 2/3 of what my windows_binary() does :) Should I switch my test case to use stderr to demonstrate that it doesn't help with that? I vaguely remember that early versions of DOS didn't do stderr, but I thought by the time Windows came along, let's see, was that about 1983?, that stderr was codified for DOS/Windows. For sure it has never been missing in WinNT 4.0 +. |
Well, judging by the history of this code, selectively putting -u in |
Makes sense to me. Still should document the open file parameter when passed an fd, and either tell the user that it should be O_BINARY, or that it will be O_BINARYd for them, whichever technique is chosen. But having two newline techniques is bad, and if Python thinks it is tracking the file pointer, but Windows is doing newline translation for it, then it isn't likely tracking it correctly for random access IO. So I think the choice should be that any fd passed in to open on Windows should get O_BINARYd immediately. |
Attached stdio_binary.patch always set stdin, stdout and stderr in binary mode on Windows (and not only with -u command line flag). I added the following comment, is it correct?
|
Using my patch, the interpreter fails with a SyntaxError on \r: attached patch translates newlines to avoid this problem. |
See also issue bpo-4628. |
Not sure why I've been added, but I agree that all fds should always be in binary mode in Python 3. |
Victor, Thanks for your interest and patches. msg125530 points out the location of the code where _all_ fds could be O_BINARYed, when passed in to open. I think this would make all fds open in binary mode, per Guido's comment... he made exactly the comment I was hoping for, even though I didn't +nosy him... I believe this would catch std* along the way, and render your first patch unnecessary, but your second one would likely still be needed. |
New patch setting O_BINARY mode using _setmode() (on Windows) to stdin, stdout, stderr, but also to all FileIO() objects. The patch on Py_Main() is still needed because Python may use sys.stderr when it is still a stdprinter object, before the final TextIOWrapper object is created (before the io module is loaded). I don't know the effect of _setmode(stderr, O_BINARY) on calls to fputs(stderr, ...) in Py_FatalError(). I would appreciate a review of parser_translate_newline.patch. |
On Thu, Jan 6, 2011 at 1:20 PM, STINNER Victor <report@bugs.python.org> wrote:
On Windows it will write lines ending in LF only instead of CRLF. Most I don't know any other supported platforms where O_BINARY exists and |
I commited io_binary_windows.patch + parser_translate_newline.patch as r87824. I fixed the patch on the parser to avoid leak on newtok if translate_newlines() fails. |
Tests pass on Windows 7 buildbot, the two other XP buildbots have unrelated issues. I also tested on my XP VM: all tests pass. So I close this issue. |
Thanks for your work on this Victor, and other commenters also. |
For the record, this change (always set stdout and stderr in binary mode on Windows) introduced (at least???) the following regressions: |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: