Issue 7638: Counterintuitive str.splitlines() inconsistency.

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/51887

classification

Title:	Counterintuitive str.splitlines() inconsistency.
Type:	behavior	Stage:	resolved
Components:		Versions:	Python 2.6

process

Status:	closed	Resolution:	not a bug
Dependencies:		Superseder:
Assigned To:		Nosy List:	flox, r.david.murray, vencabot_teppoo
Priority:	normal	Keywords:

Created on 2010-01-05 02:55 by vencabot_teppoo, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (5)
msg97245 - (view)	Author: David (vencabot_teppoo)	Date: 2010-01-05 02:55
Qualifier: This is the first issue that I've raised, so I apologise before-hand for any protocol flubs. str.splitlines()'s implementation jives unexpectedly with str.split(). In this code snippet, a server buffers input until it receives a blank line, and then it processes the input: request_buffer = "" while request_buffer.split("\n")[-1] != "" or request_buffer == "": request_buffer += self.conn.recv(1024) print("Got a line!") print("Got an empty line!") self.handleRequest(request_buffer) I found out the hard way that this code isn't prepared to handle clients that use a different "new line" standard, such as those that send "\r". I discovered str.splitlines() at that point and found that, to some extent, it works as advertised: splitting lines regardless of exactly what new line character is being used. However, this code doesn't work: request_buffer = "" while request_buffer.splitlines[-1] != "" or request_buffer == "": request_buffer += self.conn.recv(1024) print("Got a line!") print("Got an empty line!") self.handleRequest(request_buffer) Python complains that -1 is out of request_buffer.splitlines()'s range. I know that str.splitlines() treats empty lines, because I've used it on longer strings for testing trailing blank lines before; it only refuses to count a line as being blank if there isn't another line after it. "derp".splitlines() has a length of 1, but "".splitlines() has a length of 0. "derp\n".splitlines() also has a length of 1, thus excluding the trailing blank line. In my opinion, "".splitlines() should have 1 element. "derp".splitlines() should persist as having 1 element, but "derp\n".splitlines() should have 2 elements. This would result in the same functionality as str.split("\n") (where "\n".split("\n") results in two empty-string elements), but it would have the benefit of working predictably with all line-breaking standards, which I assume was the idea all along.
msg97246 - (view)	Author: David (vencabot_teppoo)	Date: 2010-01-05 03:02
I typoed when copying my second snippet. while request_buffer.splitlines[-1] != "" or request_buffer == "": It should be: while request_buffer.splitlines()[-1] != "" or request_buffer == "": This code has the problem that I'm complaining of. I only failed at copying by-hand into the form.
msg97248 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2010-01-05 04:55
No apologies needed, but you probably aren't going to like the answer :) First of all, a change like you propose would be unlikely to be accepted since it would create considerable backward-compatibility pain. That aside, however, splitlines and split are not meant to be parallel. They do two very different jobs. splitlines is line oriented, and lines are understood to end with line ends. The file equivalent of "" has length zero, and the unix 'wc' command reports it has having 0 lines. A file containing "derp\n' is reported by wc to have one line, not two. Files without a final line end are arguably broken, but all good tools accept that final line as a line, though some complain about it. (And other tools break in various odd ways.) If you want something parallel to split that handles line ends 'universally', try re.split with an appropriate regex.
msg97249 - (view)	Author: David (vencabot_teppoo)	Date: 2010-01-05 05:11
Thank you for the clarification, David. I thought that it might have been a calculated decision beyond my understanding, and I can rest easy knowing that this behavior isn't accidental. I was thinking that I might have to do something like a regular expression, and I probably will. Thanks for the advice. Have a good one!
msg97252 - (view)	Author: Florent Xicluna (flox) *	Date: 2010-01-05 07:33
IMHO this code will do the trick: while not request_buffer.endswith(('\r', '\n')): request_buffer += self.conn.recv(1024) print("Got a line!") print("Got an empty line!") self.handleRequest(request_buffer)

History
Date	User	Action	Args
2022-04-11 14:56:56	admin	set	github: 51887
2010-01-05 07:33:59	flox	set	nosy: + flox messages: + msg97252
2010-01-05 05:11:34	vencabot_teppoo	set	messages: + msg97249
2010-01-05 04:55:13	r.david.murray	set	status: open -> closed priority: normal nosy: + r.david.murray messages: + msg97248 resolution: not a bug stage: resolved
2010-01-05 03:02:37	vencabot_teppoo	set	messages: + msg97246
2010-01-05 02:55:17	vencabot_teppoo	create