Message277093
For breaking out of the readall while loop, you only need to check if the current read is empty:
/* when the read is empty we break */
if (n == 0)
break;
Also, the logic is wrong here:
if (len == 0 || buf[0] == '\x1a' && _buflen(self) == 0) {
/* when the result starts with ^Z we return an empty buffer */
PyMem_Free(buf);
return PyBytes_FromStringAndSize(NULL, 0);
}
This is true when len is 0 or when buf[0] is Ctrl+Z and _buflen(self) is 0. Since buf[0] shouldn't ever be Ctrl+Z here (low-level EOF handling is abstracted in read_console_w), it's never checking the internal buffer. We can easily see this going wrong here:
>>> a = sys.stdin.buffer.raw.read(1); b = sys.stdin.buffer.raw.read()
Ā^Z
>>> a
b'\xc4'
>>> b
b''
It misses the remaining byte in the internal buffer.
This check can be simplified as follows:
rn = _buflen(self);
if (len == 0 && rn == 0) {
/* return an empty buffer */
PyMem_Free(buf);
return PyBytes_FromStringAndSize(NULL, 0);
}
After this the code assumes that len isn't 0, which leads to more WideCharToMultiByte failure cases.
In the last conversion it's overwrite bytes_size without including rn.
I'm not sure what's going on with _PyBytes_Resize(&bytes, n * sizeof(wchar_t)). ISTM, it should be resized to bytes_size, and make sure this includes rn.
Finally, _copyfrombuf is repeatedly overwriting buf[0] instead of writing to buf[n].
With the attached patch, the behavior seems correct now:
>>> sys.stdin.buffer.raw.read()
^Z
b''
>>> sys.stdin.buffer.raw.read()
abc^Z
^Z
b'abc\x1a\r\n'
Split U+0100:
>>> a = sys.stdin.buffer.raw.read(1); b = sys.stdin.buffer.raw.read()
Ā^Z
>>> a
b'\xc4'
>>> b
b'\x80'
Split U+1234:
>>> a = sys.stdin.buffer.raw.read(1); b = sys.stdin.buffer.raw.read()
ሴ^Z
>>> a
b'\xe1'
>>> b
b'\x88\xb4'
The buffer still can't handle splitting an initial non-BMP character, stored as a surrogate pair. Both codes end up as replacement characters because they aren't transcoded as a unit.
Split U+00010000:
>>> a = sys.stdin.buffer.raw.read(1); b = sys.stdin.buffer.raw.read()
𐀀^Z
^Z
>>> a
b'\xef'
>>> b
b'\xbf\xbd\xef\xbf\xbd\x1a\r\n' |
|
Date |
User |
Action |
Args |
2016-09-21 05:46:33 | eryksun | set | recipients:
+ eryksun, paul.moore, tim.golden, python-dev, zach.ware, steve.dower |
2016-09-21 05:46:33 | eryksun | set | messageid: <1474436793.57.0.776081344241.issue28162@psf.upfronthosting.co.za> |
2016-09-21 05:46:33 | eryksun | link | issue28162 messages |
2016-09-21 05:46:31 | eryksun | create | |
|