Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

binary stdio #55050

Closed
vpython mannequin opened this issue Jan 6, 2011 · 29 comments
Closed

binary stdio #55050

vpython mannequin opened this issue Jan 6, 2011 · 29 comments
Labels
OS-windows topic-IO type-bug An unexpected behavior, bug, or error

Comments

@vpython
Copy link
Mannequin

vpython mannequin commented Jan 6, 2011

BPO 10841
Nosy @gvanrossum, @amauryfa, @pitrou, @vstinner, @benjaminp, @briancurtin
Files
  • test.py: test of binary stdout
  • stdio_binary.patch
  • parser_translate_newline.patch
  • io_binary_windows.patch
  • parser_translate_newline-2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2011-01-07.19:49:10.518>
    created_at = <Date 2011-01-06.02:11:53.393>
    labels = ['type-bug', 'expert-IO', 'OS-windows']
    title = 'binary stdio'
    updated_at = <Date 2012-08-03.23:53:32.513>
    user = 'https://bugs.python.org/vpython'

    bugs.python.org fields:

    activity = <Date 2012-08-03.23:53:32.513>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2011-01-07.19:49:10.518>
    closer = 'vstinner'
    components = ['Windows', 'IO']
    creation = <Date 2011-01-06.02:11:53.393>
    creator = 'v+python'
    dependencies = []
    files = ['20285', '20290', '20291', '20294', '20304']
    hgrepos = []
    issue_num = 10841
    keywords = ['patch']
    message_count = 29.0
    messages = ['125500', '125503', '125510', '125511', '125512', '125513', '125515', '125516', '125521', '125523', '125525', '125526', '125530', '125534', '125537', '125539', '125550', '125552', '125553', '125554', '125561', '125575', '125578', '125590', '125595', '125680', '125697', '125703', '167383']
    nosy_count = 7.0
    nosy_names = ['gvanrossum', 'amaury.forgeotdarc', 'pitrou', 'vstinner', 'benjamin.peterson', 'v+python', 'brian.curtin']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue10841'
    versions = ['Python 3.1', 'Python 2.7', 'Python 3.2']

    @vpython
    Copy link
    Mannequin Author

    vpython mannequin commented Jan 6, 2011

    Per Antoine's request, I wrote this test code, it isn't elegant, I whipped it together quickly; but it shows the issue. The issue may be one of my ignorance, but it does show the behavior I described in bpo-4953. Here's the output from the various test parameters that might be useful in running the test.

    c:\python32\python.exe test.py test 1
    ['c:\\python32\\python.exe', 'test.py', '1']
    All OK

    c:\python32\python.exe test.py test 2
    ['c:\\python32\\python.exe', 'test.py', '2']
    Not OK: b'abc\r\r\ndef\r\r\n'

    c:\python32\python.exe test.py test 3
    ['c:\\python32\\python.exe', 'test.py', '3']
    All OK

    c:\python32\python.exe test.py test 4
    ['c:\\python32\\python.exe', 'test.py', '4']
    Not OK: b'abc\r\r\ndef\r\r\n'

    c:\python32\python.exe test.py test 1-u
    ['c:\\python32\\python.exe', '-u', 'test.py', '1-u']
    All OK

    c:\python32\python.exe test.py test 2-u
    ['c:\\python32\\python.exe', '-u', 'test.py', '2-u']
    All OK

    c:\python32\python.exe test.py test 3-u
    ['c:\\python32\\python.exe', '-u', 'test.py', '3-u']
    All OK

    c:\python32\python.exe test.py test 4-u
    ['c:\\python32\\python.exe', '-u', 'test.py', '4-u']
    All OK

    Note that test 2 and 4, which do not use the mscvrt stuff, have double \r: one sent by the code, and another added, apparently by MSC newline processing. test 2-u and 4-u, which are invoking the subprocess with Python's -u parameter, also do not exhibit the problem, even though the mscvrt stuff is not used. This seems to indicate that Python's -u parameter does approximately the same thing as my windows_binary function.

    Seems like if Python already has code for this, that it would be nice to either make it more easily available to the user as an API (like my windows_binary function, invoked with a single line) in the io or sys modules (since it is used to affect sys.std* files).

    And it would be nice if the function "worked cross-platform", even if it is a noop on most platforms.

    @vpython
    Copy link
    Mannequin Author

    vpython mannequin commented Jan 6, 2011

    tested on Windows, for those that aren't following bpo-4953

    @vpython vpython mannequin added topic-IO type-bug An unexpected behavior, bug, or error labels Jan 6, 2011
    @pitrou
    Copy link
    Member

    pitrou commented Jan 6, 2011

    What are the results if, instead of piping through subprocess, you simply redirect stdout to a file (using "... > myfile.txt")?

    @vpython
    Copy link
    Mannequin Author

    vpython mannequin commented Jan 6, 2011

    The same. This can be tested with the same test program,

    c:\python32\python.exe test.py 1 > test1.txt

    similar for 2, 3, 4. Then add -u and repeat. All 8 cases produce the same results, either via a pipe, or with a redirected stdout.

    @vpython
    Copy link
    Mannequin Author

    vpython mannequin commented Jan 6, 2011

    Actually, it seems like this "-u" behaviour, should simply be the default for Python 3.x on Windows. The new IO subsystem seems to be able to add \r when desired anyway. And except for Notepad, most programs on Windows can deal with \r\n or solo \n anyway. \r\r\n doesn't cause too many problems for very many programs, but is (1) non-standard (2) wasteful of bytes (3) does cause problems for CGI programs, and likely some others... I haven't done a lot of testing with that case, but tried a few programs, and they dealt with it gracefully.

    @pitrou
    Copy link
    Member

    pitrou commented Jan 6, 2011

    Actually, it seems like this "-u" behaviour, should simply be the
    default for Python 3.x on Windows.

    There is nothing in "-u" which should explain the behaviour you're
    observing, so either way it's a bug. Making "-u" the default is quite
    orthogonal.

    @vpython
    Copy link
    Mannequin Author

    vpython mannequin commented Jan 6, 2011

    Is there an easy way for me to find the code for -u? I haven't learned my way around the Python sources much, just peeked in modules that I've needed to fix or learn something from a little. I'm just surprised you think it is orthogonal, but I'm glad you agree it is a bug.

    @pitrou
    Copy link
    Member

    pitrou commented Jan 6, 2011

    Is there an easy way for me to find the code for -u?

    "-u" just passes 0 for the "buffering" argument to open() when creating
    stdout and stderr. Otherwise "buffering" equals -1.
    You can find equivalent code for open() in Lib/_pyio.py (the actual code
    in is in C).

    The practical difference is that "-u" removes the binary buffering layer
    between the unicode layer and the raw file object:

    $ ./python -c "import sys; print(sys.stdout, sys.stdout.buffer)"
    <_io.TextIOWrapper name='<stdout>' encoding='UTF-8'> <_io.BufferedWriter name='<stdout>'>
    $ ./python -u -c "import sys; print(sys.stdout, sys.stdout.buffer)"
    <_io.TextIOWrapper name='<stdout>' encoding='UTF-8'> <_io.FileIO name='<stdout>' mode='wb'>

    This should make absolutely no difference as to newline handling, since
    that is handled exclusively in TextIOWrapper (binary layers are
    transparent by construction).

    @vpython
    Copy link
    Mannequin Author

    vpython mannequin commented Jan 6, 2011

    I can read and understand C well enough, having coded in it for about 40 years now... but I left C for Perl and Perl for Python, I try not to code in C when I don't have to, these days, as the P languages are more productive, overall.

    But there has to be special handling somewhere for opening std*, because they are already open, unlike other files. That is no doubt where the bug is. Can you point me at that code?

    @pitrou
    Copy link
    Member

    pitrou commented Jan 6, 2011

    But there has to be special handling somewhere for opening std*,
    because they are already open, unlike other files. That is no doubt
    where the bug is. Can you point me at that code?

    See initstdio() in Python/pythonrun.c

    @vpython
    Copy link
    Mannequin Author

    vpython mannequin commented Jan 6, 2011

    I suppose the FileIO in _io is next to look at, wherever it can be found.

    @pitrou
    Copy link
    Member

    pitrou commented Jan 6, 2011

    I suppose the FileIO in _io is next to look at, wherever it can be found.

    I don't get what you're looking for. FileIO is involved in both the
    buffered and unbuffered cases, so it shouldn't make a difference.
    In any case, please look into Modules/_io

    @vpython
    Copy link
    Mannequin Author

    vpython mannequin commented Jan 6, 2011

    Found it.

    The file browser doesn't tell what line number it is, but in _io/Fileio.c function fileio_init, there is code like

    #ifdef O_BINARY
        flags |= O_BINARY;
    #endif
    
    #ifdef O_APPEND
        if (append)
            flags |= O_APPEND;
    #endif
    
        if (fd >= 0) {
            if (check_fd(fd))
                goto error;
            self->fd = fd;
            self->closefd = closefd;
        }

    Note that if O_BINARY is defined, it is set into the default flags for opening files by name. But if "opening" a file by fd, the fd is copied, regardless of whether it has O_BINARY set or not. The rest of the IO code no doubt assumes the file was opened in O_BINARY mode. But that isn't true of MSC std* handles by default.

    How -u masks or overcomes this problem is not obvious, as yet, but the root bug seems to be the assumption in the above code. A setmode of O_BINARY should be done, probably #ifdef O_BINARY, when attaching a MS C fd to a Python IO stack. Otherwise it is going to have \r\r\n problems, it would seem.

    Alternately, in the location where the Python IO stacks are attached to std* handles, those specific std* handles should have the setmode done there... other handles, if opened by Python, likely already have it done.

    Documentation for open should mention, in the description of the file parameter, that on Windows, it is important to only attach Python IO stack to O_BINARY files, or beware the consequences of two independent newline handling algorithms being applied to the data stream... or to document that setmode O_BINARY will be performed on the handles passed to open.

    @pitrou
    Copy link
    Member

    pitrou commented Jan 6, 2011

    Ok, actually it boils down to the following code in Modules/main.c:

        if (Py_UnbufferedStdioFlag) {
    #if defined(MS_WINDOWS) || defined(__CYGWIN__)
            _setmode(fileno(stdin), O_BINARY);
            _setmode(fileno(stdout), O_BINARY);
    #endif

    ... which explains things quite clearly! Archeology leads to r7409 by Guido in 1997:

    changeset: 4916:5af9f8a98d93
    branch: trunk
    user: guido
    date: Sat Jan 11 20:28:55 1997 +0100
    files: Modules/main.c
    description:
    [svn r7409] On Windows, -u implies binary mode for stdin/stdout
    (as well as unbuffered stdout/stderr).

    @vpython
    Copy link
    Mannequin Author

    vpython mannequin commented Jan 6, 2011

    Don't find "initstdio" "stdio" in pythonrun.c. Has it moved? There are precious few references to stdin, stdout, stderr in that module, mostly for attaching the default encoding.

    @vpython
    Copy link
    Mannequin Author

    vpython mannequin commented Jan 6, 2011

    stderr is notable by its absence in the list of O_BINARY adjustments.

    So -u does do 2/3 of what my windows_binary() does :) Should I switch my test case to use stderr to demonstrate that it doesn't help with that? I vaguely remember that early versions of DOS didn't do stderr, but I thought by the time Windows came along, let's see, was that about 1983?, that stderr was codified for DOS/Windows. For sure it has never been missing in WinNT 4.0 +.

    @pitrou
    Copy link
    Member

    pitrou commented Jan 6, 2011

    So -u does do 2/3 of what my windows_binary() does :) Should I switch
    my test case to use stderr to demonstrate that it doesn't help with
    that?

    Well, judging by the history of this code, selectively putting -u in
    binary mode may be justified by the fact that Python 2 relied on the C
    runtime's stdio FILE pointers, and therefore on the C runtime's own
    newline translation. I would say that Python 3 should put all stdio fds
    in binary mode, regardless of the -u switch.

    @vpython
    Copy link
    Mannequin Author

    vpython mannequin commented Jan 6, 2011

    Makes sense to me. Still should document the open file parameter when passed an fd, and either tell the user that it should be O_BINARY, or that it will be O_BINARYd for them, whichever technique is chosen. But having two newline techniques is bad, and if Python thinks it is tracking the file pointer, but Windows is doing newline translation for it, then it isn't likely tracking it correctly for random access IO. So I think the choice should be that any fd passed in to open on Windows should get O_BINARYd immediately.

    @vstinner
    Copy link
    Member

    vstinner commented Jan 6, 2011

    Attached stdio_binary.patch always set stdin, stdout and stderr in binary mode on Windows (and not only with -u command line flag). I added the following comment, is it correct?

    /* don't translate newlines */
    

    @vstinner
    Copy link
    Member

    vstinner commented Jan 6, 2011

    Using my patch, the interpreter fails with a SyntaxError on \r: attached patch translates newlines to avoid this problem.

    @vstinner
    Copy link
    Member

    vstinner commented Jan 6, 2011

    Using my patch, the interpreter fails with a SyntaxError on \r:
    attached patch translates newlines to avoid this problem.

    See also issue bpo-4628.

    @gvanrossum
    Copy link
    Member

    Not sure why I've been added, but I agree that all fds should always be in binary mode in Python 3.

    @vpython
    Copy link
    Mannequin Author

    vpython mannequin commented Jan 6, 2011

    Victor, Thanks for your interest and patches.

    msg125530 points out the location of the code where _all_ fds could be O_BINARYed, when passed in to open. I think this would make all fds open in binary mode, per Guido's comment... he made exactly the comment I was hoping for, even though I didn't +nosy him... I believe this would catch std* along the way, and render your first patch unnecessary, but your second one would likely still be needed.

    @vstinner
    Copy link
    Member

    vstinner commented Jan 6, 2011

    New patch setting O_BINARY mode using _setmode() (on Windows) to stdin, stdout, stderr, but also to all FileIO() objects.

    The patch on Py_Main() is still needed because Python may use sys.stderr when it is still a stdprinter object, before the final TextIOWrapper object is created (before the io module is loaded).

    I don't know the effect of _setmode(stderr, O_BINARY) on calls to fputs(stderr, ...) in Py_FatalError().

    I would appreciate a review of parser_translate_newline.patch.

    @gvanrossum
    Copy link
    Member

    On Thu, Jan 6, 2011 at 1:20 PM, STINNER Victor <report@bugs.python.org> wrote:

    I don't know the effect of _setmode(stderr, O_BINARY) on calls to fputs(stderr, ...) in Py_FatalError().

    On Windows it will write lines ending in LF only instead of CRLF. Most
    tools to read text files should be able deal with that by now,
    conditioned as they are by many years of Unix developers doing Windows
    stuff on the side. (Somebody should test what Notepad does though. :-)

    I don't know any other supported platforms where O_BINARY exists and
    is not a no-op (but I haven't really looked :-).

    @vstinner
    Copy link
    Member

    vstinner commented Jan 7, 2011

    I commited io_binary_windows.patch + parser_translate_newline.patch as r87824. I fixed the patch on the parser to avoid leak on newtok if translate_newlines() fails.

    @vstinner
    Copy link
    Member

    vstinner commented Jan 7, 2011

    Tests pass on Windows 7 buildbot, the two other XP buildbots have unrelated issues. I also tested on my XP VM: all tests pass. So I close this issue.

    @vstinner vstinner closed this as completed Jan 7, 2011
    @vpython
    Copy link
    Mannequin Author

    vpython mannequin commented Jan 7, 2011

    Thanks for your work on this Victor, and other commenters also.

    @vstinner
    Copy link
    Member

    vstinner commented Aug 3, 2012

    For the record, this change (always set stdout and stderr in binary mode on Windows) introduced (at least???) the following regressions:

    • bpo-11272: "input() has trailing carriage return on windows", fixed in Python 3.2.1
    • bpo-11395: "print(s) fails on Windows with long strings", fixed in Python 3.2.1
    • bpo-13119: "Newline for print() is \n on Windows, and not \r\n as expected", will be fixed in Python 3.2.4 and 3.3 (not released yet)

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    OS-windows topic-IO type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants