Author xtreak
Recipients nascheme, xtreak
Date 2018-10-04.18:08:30
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
codecs.getreader('utf-8')(open('test.txt', 'rb')) during iteration str.splitlines on the decoded data that takes '\x0b' as a valid newline as specified in [0] being a superset of universal newlines. Thus splits on '\x0b' as a valid newline for string and works correctly.

Python 3.8.0a0 (heads/master:6f85b826b5, Oct  4 2018, 22:44:36)
[Clang 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> a = 'first line\x0b\x0bblah blah\nsecond line\n' # returned by codecs.getreader()
>>> a.splitlines(keepends=True)
['first line\x0b', '\x0b', 'blah blah\n', 'second line\n']

# for bytes bytes.splitlines works only on universal-newlines thus doesn't split on '\x0b' [1]
>>> b = b'first line\x0b\x0bblah blah\nsecond line\n' 
>>> b.splitlines(keepends=True)
[b'first line\x0b\x0bblah blah\n', b'second line\n']

But io.TextIOWrapper only accepts None, '', '\n', '\r\n' and '\r' as newline for text mode but for binary files it's different as noted in readline to accept only '\n' [2]

> The line terminator is always b'\n' for binary files; for text
> files, the newlines argument to open can be used to select the line
> terminator(s) recognized.

Thus 'first line\x0b\x0bblah blah\nsecond line\n' gives ['first line\x0b\x0bblah blah\n', 'second line\n'] . Trying to use '\x0b' as new line results in illegal newline error in TextIOWrapper.

Hope I am correct on the above analysis.

Date User Action Args
2018-10-04 18:08:31xtreaksetrecipients: + xtreak, nascheme
2018-10-04 18:08:31xtreaksetmessageid: <>
2018-10-04 18:08:31xtreaklinkissue34801 messages
2018-10-04 18:08:30xtreakcreate