New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WindowsConsoleIO readall() fails if first line starts with Ctrl+Z #72349
Comments
For a console readall(), if the first line starts with '\x1a' (i.e. Ctrl+Z), it breaks out of its read loop before incrementing len. Thus the input isn't handled properly as EOF, for which the check requires len > 0. Instead it ends up calling WideCharToMultiByte with len == 0, which fails as follows: >>> sys.stdin.buffer.raw.read()
^Z
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OSError: [WinError 87] The parameter is incorrect |
Unfortunately, very hard to test this because it requires interacting with an actual console. But it's an easy fix. |
New changeset d0bab9fda568 by Steve Dower in branch '3.6': New changeset 187a114b9ef4 by Steve Dower in branch 'default': |
The fix here was changing "n > 0 && buf[len] == '\x1a'" into "n == 0 || buf[len] == '\x1a'" when returning an empty bytes. Previously we were falling into the conversion code which wasn't happy with n == 0. |
For breaking out of the readall while loop, you only need to check if the current read is empty:
Also, the logic is wrong here: if (len == 0 || buf[0] == '\x1a' && _buflen(self) == 0) {
/* when the result starts with ^Z we return an empty buffer */
PyMem_Free(buf);
return PyBytes_FromStringAndSize(NULL, 0);
} This is true when len is 0 or when buf[0] is Ctrl+Z and _buflen(self) is 0. Since buf[0] shouldn't ever be Ctrl+Z here (low-level EOF handling is abstracted in read_console_w), it's never checking the internal buffer. We can easily see this going wrong here: >>> a = sys.stdin.buffer.raw.read(1); b = sys.stdin.buffer.raw.read()
Ā^Z
>>> a
b'\xc4'
>>> b
b'' It misses the remaining byte in the internal buffer. This check can be simplified as follows: rn = _buflen(self); if (len == 0 && rn == 0) {
/* return an empty buffer */
PyMem_Free(buf);
return PyBytes_FromStringAndSize(NULL, 0);
} After this the code assumes that len isn't 0, which leads to more WideCharToMultiByte failure cases. In the last conversion it's overwrite bytes_size without including rn. I'm not sure what's going on with _PyBytes_Resize(&bytes, n * sizeof(wchar_t)). ISTM, it should be resized to bytes_size, and make sure this includes rn. Finally, _copyfrombuf is repeatedly overwriting buf[0] instead of writing to buf[n]. With the attached patch, the behavior seems correct now: >>> sys.stdin.buffer.raw.read()
^Z
b''
>>> sys.stdin.buffer.raw.read()
abc^Z
^Z
b'abc\x1a\r\n' Split U+0100: >>> a = sys.stdin.buffer.raw.read(1); b = sys.stdin.buffer.raw.read()
Ā^Z
>>> a
b'\xc4'
>>> b
b'\x80' Split U+1234: >>> a = sys.stdin.buffer.raw.read(1); b = sys.stdin.buffer.raw.read()
ሴ^Z
>>> a
b'\xe1'
>>> b
b'\x88\xb4' The buffer still can't handle splitting an initial non-BMP character, stored as a surrogate pair. Both codes end up as replacement characters because they aren't transcoded as a unit. Split U+00010000: >>> a = sys.stdin.buffer.raw.read(1); b = sys.stdin.buffer.raw.read()
𐀀^Z
^Z
>>> a
b'\xef'
>>> b
b'\xbf\xbd\xef\xbf\xbd\x1a\r\n' |
New changeset 4d4aefa52f49 by Steve Dower in branch '3.6': New changeset 947fa496ca6f by Steve Dower in branch 'default': |
I didn't see your patch, but I made basically the same fix and added a test. |
Misc/NEWS
so that it is managed by towncrier #552Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: