This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Counterintuitive str.splitlines() inconsistency.
Type: behavior Stage: resolved
Components: Versions: Python 2.6
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: flox, r.david.murray, vencabot_teppoo
Priority: normal Keywords:

Created on 2010-01-05 02:55 by vencabot_teppoo, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (5)
msg97245 - (view) Author: David (vencabot_teppoo) Date: 2010-01-05 02:55
Qualifier: This is the first issue that I've raised, so I apologise before-hand for any protocol flubs.

str.splitlines()'s implementation jives unexpectedly with str.split().

In this code snippet, a server buffers input until it receives a blank line, and then it processes the input:

request_buffer = ""

while request_buffer.split("\n")[-1] != "" or request_buffer == "":
    request_buffer += self.conn.recv(1024)
    print("Got a line!")

print("Got an empty line!")
self.handleRequest(request_buffer)


I found out the hard way that this code isn't prepared to handle clients that use a different "new line" standard, such as those that send "\r". I discovered str.splitlines() at that point and found that, to some extent, it works as advertised: splitting lines regardless of exactly what new line character is being used.

However, this code doesn't work:

request_buffer = ""

while request_buffer.splitlines[-1] != "" or request_buffer == "":
    request_buffer += self.conn.recv(1024)
    print("Got a line!")

print("Got an empty line!")
self.handleRequest(request_buffer)


Python complains that -1 is out of request_buffer.splitlines()'s range. I know that str.splitlines() treats empty lines, because I've used it on longer strings for testing trailing blank lines before; it only refuses to count a line as being blank if there isn't another line after it. "derp".splitlines() has a length of 1, but "".splitlines() has a length of 0. "derp\n".splitlines() also has a length of 1, thus excluding the trailing blank line.

In my opinion, "".splitlines() should have 1 element. "derp".splitlines() should persist as having 1 element, but "derp\n".splitlines() should have 2 elements. This would result in the same functionality as str.split("\n") (where "\n".split("\n") results in two empty-string elements), but it would have the benefit of working predictably with all line-breaking standards, which I assume was the idea all along.
msg97246 - (view) Author: David (vencabot_teppoo) Date: 2010-01-05 03:02
I typoed when copying my second snippet.

while request_buffer.splitlines[-1] != "" or request_buffer == "":


It should be:

while request_buffer.splitlines()[-1] != "" or request_buffer == "":


This code has the problem that I'm complaining of. I only failed at copying by-hand into the form.
msg97248 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-01-05 04:55
No apologies needed, but you probably aren't going to like the answer :)

First of all, a change like you propose would be unlikely to be accepted since it would create considerable backward-compatibility pain.

That aside, however, splitlines and split are not meant to be parallel.  They do two very different jobs.  splitlines is *line* oriented, and lines are understood to end with line ends.  The file equivalent of "" has length zero, and the unix 'wc' command reports it has having 0 lines.  A file containing "derp\n' is reported by wc to have one line, not two.  Files without a final line end are arguably broken, but all good tools accept that final line as a line, though some complain about it.  (And other tools break in various odd ways.)

If you want something parallel to split that handles line ends 'universally', try re.split with an appropriate regex.
msg97249 - (view) Author: David (vencabot_teppoo) Date: 2010-01-05 05:11
Thank you for the clarification, David. I thought that it might have been a calculated decision beyond my understanding, and I can rest easy knowing that this behavior isn't accidental. I was thinking that I might have to do something like a regular expression, and I probably will. Thanks for the advice.

Have a good one!
msg97252 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-01-05 07:33
IMHO this code will do the trick:

while not request_buffer.endswith(('\r', '\n')):
    request_buffer += self.conn.recv(1024)
    print("Got a line!")

print("Got an empty line!")
self.handleRequest(request_buffer)
History
Date User Action Args
2022-04-11 14:56:56adminsetgithub: 51887
2010-01-05 07:33:59floxsetnosy: + flox
messages: + msg97252
2010-01-05 05:11:34vencabot_teppoosetmessages: + msg97249
2010-01-05 04:55:13r.david.murraysetstatus: open -> closed
priority: normal


nosy: + r.david.murray
messages: + msg97248
resolution: not a bug
stage: resolved
2010-01-05 03:02:37vencabot_teppoosetmessages: + msg97246
2010-01-05 02:55:17vencabot_teppoocreate