Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WindowsConsoleIO readall() fails if first line starts with Ctrl+Z #72349

Closed
eryksun opened this issue Sep 15, 2016 · 7 comments
Closed

WindowsConsoleIO readall() fails if first line starts with Ctrl+Z #72349

eryksun opened this issue Sep 15, 2016 · 7 comments
Assignees
Labels
3.7 (EOL) end of life OS-windows stdlib Python modules in the Lib dir topic-IO type-bug An unexpected behavior, bug, or error

Comments

@eryksun
Copy link
Contributor

eryksun commented Sep 15, 2016

BPO 28162
Nosy @pfmoore, @tjguk, @zware, @eryksun, @zooba
PRs
  • [Do Not Merge] Convert Misc/NEWS so that it is managed by towncrier #552
  • Files
  • issue_28162_01.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/zooba'
    closed_at = <Date 2016-10-08.19:44:31.832>
    created_at = <Date 2016-09-15.03:25:17.633>
    labels = ['3.7', 'type-bug', 'library', 'expert-IO', 'OS-windows']
    title = 'WindowsConsoleIO readall() fails if first line starts with Ctrl+Z'
    updated_at = <Date 2017-03-31.16:36:32.838>
    user = 'https://github.com/eryksun'

    bugs.python.org fields:

    activity = <Date 2017-03-31.16:36:32.838>
    actor = 'dstufft'
    assignee = 'steve.dower'
    closed = True
    closed_date = <Date 2016-10-08.19:44:31.832>
    closer = 'steve.dower'
    components = ['Library (Lib)', 'Windows', 'IO']
    creation = <Date 2016-09-15.03:25:17.633>
    creator = 'eryksun'
    dependencies = []
    files = ['44766']
    hgrepos = []
    issue_num = 28162
    keywords = ['patch']
    message_count = 7.0
    messages = ['276508', '276841', '276845', '276848', '277093', '278320', '278321']
    nosy_count = 6.0
    nosy_names = ['paul.moore', 'tim.golden', 'python-dev', 'zach.ware', 'eryksun', 'steve.dower']
    pr_nums = ['552']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue28162'
    versions = ['Python 3.6', 'Python 3.7']

    @eryksun
    Copy link
    Contributor Author

    eryksun commented Sep 15, 2016

    For a console readall(), if the first line starts with '\x1a' (i.e. Ctrl+Z), it breaks out of its read loop before incrementing len. Thus the input isn't handled properly as EOF, for which the check requires len > 0. Instead it ends up calling WideCharToMultiByte with len == 0, which fails as follows:

        >>> sys.stdin.buffer.raw.read()
        ^Z
        Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
        OSError: [WinError 87] The parameter is incorrect

    @eryksun eryksun added 3.7 (EOL) end of life stdlib Python modules in the Lib dir OS-windows topic-IO type-bug An unexpected behavior, bug, or error labels Sep 15, 2016
    @zooba
    Copy link
    Member

    zooba commented Sep 17, 2016

    Unfortunately, very hard to test this because it requires interacting with an actual console. But it's an easy fix.

    @zooba zooba self-assigned this Sep 17, 2016
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 17, 2016

    New changeset d0bab9fda568 by Steve Dower in branch '3.6':
    Issue bpo-28161: Opening CON for write access fails
    https://hg.python.org/cpython/rev/d0bab9fda568

    New changeset 187a114b9ef4 by Steve Dower in branch 'default':
    Issue bpo-28161: Opening CON for write access fails
    https://hg.python.org/cpython/rev/187a114b9ef4

    @zooba
    Copy link
    Member

    zooba commented Sep 17, 2016

    The fix here was changing "n > 0 && buf[len] == '\x1a'" into "n == 0 || buf[len] == '\x1a'" when returning an empty bytes. Previously we were falling into the conversion code which wasn't happy with n == 0.

    @zooba zooba closed this as completed Sep 17, 2016
    @eryksun
    Copy link
    Contributor Author

    eryksun commented Sep 21, 2016

    For breaking out of the readall while loop, you only need to check if the current read is empty:

        /* when the read is empty we break */
        if (n == 0)
            break;
    

    Also, the logic is wrong here:

        if (len == 0 || buf[0] == '\x1a' && _buflen(self) == 0) {
            /* when the result starts with ^Z we return an empty buffer */
            PyMem_Free(buf);
            return PyBytes_FromStringAndSize(NULL, 0);
        }

    This is true when len is 0 or when buf[0] is Ctrl+Z and _buflen(self) is 0. Since buf[0] shouldn't ever be Ctrl+Z here (low-level EOF handling is abstracted in read_console_w), it's never checking the internal buffer. We can easily see this going wrong here:

        >>> a = sys.stdin.buffer.raw.read(1); b = sys.stdin.buffer.raw.read()
        Ā^Z
        >>> a
        b'\xc4'
        >>> b
        b''

    It misses the remaining byte in the internal buffer.

    This check can be simplified as follows:

        rn = _buflen(self);
        if (len == 0 && rn == 0) {
            /* return an empty buffer */
            PyMem_Free(buf);
            return PyBytes_FromStringAndSize(NULL, 0);
        }

    After this the code assumes that len isn't 0, which leads to more WideCharToMultiByte failure cases.

    In the last conversion it's overwrite bytes_size without including rn.

    I'm not sure what's going on with _PyBytes_Resize(&bytes, n * sizeof(wchar_t)). ISTM, it should be resized to bytes_size, and make sure this includes rn.

    Finally, _copyfrombuf is repeatedly overwriting buf[0] instead of writing to buf[n].

    With the attached patch, the behavior seems correct now:

        >>> sys.stdin.buffer.raw.read()
        ^Z
        b''
    
        >>> sys.stdin.buffer.raw.read()
        abc^Z
        ^Z
        b'abc\x1a\r\n'

    Split U+0100:

        >>> a = sys.stdin.buffer.raw.read(1); b = sys.stdin.buffer.raw.read()
        Ā^Z
        >>> a
        b'\xc4'
        >>> b
    b'\x80'

    Split U+1234:

        >>> a = sys.stdin.buffer.raw.read(1); b = sys.stdin.buffer.raw.read()
        ሴ^Z
        >>> a
        b'\xe1'
        >>> b
        b'\x88\xb4'

    The buffer still can't handle splitting an initial non-BMP character, stored as a surrogate pair. Both codes end up as replacement characters because they aren't transcoded as a unit.

    Split U+00010000:

        >>> a = sys.stdin.buffer.raw.read(1); b = sys.stdin.buffer.raw.read()
        𐀀^Z
        ^Z
        >>> a
        b'\xef'
        >>> b
        b'\xbf\xbd\xef\xbf\xbd\x1a\r\n'

    @eryksun eryksun reopened this Sep 21, 2016
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 8, 2016

    New changeset 4d4aefa52f49 by Steve Dower in branch '3.6':
    Issue bpo-28162: Fixes Ctrl+Z handling in console readall()
    https://hg.python.org/cpython/rev/4d4aefa52f49

    New changeset 947fa496ca6f by Steve Dower in branch 'default':
    Issue bpo-28162: Fixes Ctrl+Z handling in console readall()
    https://hg.python.org/cpython/rev/947fa496ca6f

    @zooba
    Copy link
    Member

    zooba commented Oct 8, 2016

    I didn't see your patch, but I made basically the same fix and added a test.

    @zooba zooba closed this as completed Oct 8, 2016
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life OS-windows stdlib Python modules in the Lib dir topic-IO type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants