Title: TextIOWrapper.readline and str.splitlines have different behavior
Type: behavior Stage: needs patch
Components: Documentation, Interpreter Core, IO Versions: Python 3.2, Python 3.3, Python 2.7
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: benjamin.peterson, docs@python, pitrou, r.david.murray
Priority: normal Keywords:

Created on 2011-02-24 02:19 by benjamin.peterson, last changed 2011-02-24 02:45 by r.david.murray.

Messages (5)
msg129240 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2011-02-24 02:19
For example:
>>> 'print 1\n\x0cprint 2\n\n'.splitlines()
['print 1\n', '\x0cprint 2\n', '\n']
>>> list(io.StringIO('print 1\n\x0cprint 2\n\n'))

I'm not sure which is preferable.
msg129241 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-02-24 02:29
Your example got a little messed up.

>>> list(io.StringIO('print 1\n\x0cprint 2\n\n'))
['print 1\n', '\x0cprint 2\n', '\n']
>>> 'print 1\n\x0cprint 2\n\n'.splitlines(True)
['print 1\n', '\x0c', 'print 2\n', '\n']
>>> list(io.StringIO('print 1\x0cprint 2\n\n'))
['print 1\x0cprint 2\n', '\n']
>>> 'print 1\x0cprint 2\n\n'.splitlines(True)
['print 1\x0c', 'print 2\n', '\n']

I think splitlines has it correct.
msg129242 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-02-24 02:31
On the other hand, I believe io is documented as only recognizing /r and /n, so its behavior matches its documentation.
msg129244 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2011-02-24 02:33
I don't see that, but the chances of changing either of these is quite low, so I suppose we should just document.
msg129245 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-02-24 02:45
"newline controls how universal newlines works (it only applies to text mode). It can be None, '', '\n', '\r', and '\r\n'..."

Whereas splitlines says:

"Return a list of the lines in the string, breaking at line boundaries."

So if we are fixing docs, we need to add that "line boundaries" are based on the relevant unicode properties (see issue 7643).

And, indeed, Antoine has already pronounced on this in issue 6664.

Since this has come up more than once, adding a note that they are not recognized by design in the io module to the io docs might be a good idea.
Date User Action Args
2011-02-24 02:45:03r.david.murraysetassignee: docs@python
type: behavior
components: + Documentation
versions: + Python 2.7
nosy: + docs@python, pitrou

messages: + msg129245
stage: needs patch
2011-02-24 02:33:42benjamin.petersonsetnosy: benjamin.peterson, r.david.murray
messages: + msg129244
2011-02-24 02:31:22r.david.murraysetnosy: benjamin.peterson, r.david.murray
messages: + msg129242
2011-02-24 02:29:40r.david.murraysetnosy: + r.david.murray
messages: + msg129241
2011-02-24 02:19:52benjamin.petersoncreate