Message 225755 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	terry.reedy
Recipients	barry, ezio.melotti, r.david.murray, scharron, serhiy.storchaka, terry.reedy, vstinner
Date	2014-08-23.18:24:44
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1408818284.93.0.288576570276.issue22232@psf.upfronthosting.co.za>
In-reply-to

Content
Unless there is already another issue for improving the doc, this should at least be left open as a doc issue. But I had the same thought as Serhiy, that we should at least optionally make the current doc correct. Two possibilities: newlines=False If true, only split on \r, \n, \r\n; otherwise split on all latin-1 linebreak characters -- <list>. {This is rather awkward.} linebreak=True If true, split on all latin-1 linebreak characters <list>; otherwise only split on \r, \n, \r\n. {Better, to me} Changing both code and doc, at least in 3.5, says that both are wrong. If we agree on this, there is still the awkward issue of what to do for 3.4. Just change the doc? Then email must do something different in 3.4 to work around the code behavior. I think this may warrant a pydev discussion. Another issue is whether latin-1 linebreaks are privileged. Why not implement the full unicode linebreak algorithm. An additional complication is that in 2.x, .splitlines acts as advertised. >>> 'a\x0ab\x0bc\x0cd\x0dda\x0d\x0a1c\x1c1d\x1d1e\x1e85\x85end'.splitlines() ['a', 'b\x0bc\x0cd', 'da', '1c\x1c1d\x1d1e\x1e85\x85end']

Unless there is already another issue for improving the doc, this should at least be left open as a doc issue.

But I had the same thought as Serhiy, that we should at least optionally make the current doc correct. Two possibilities:

newlines=False  If true, only split on \r, \n, \r\n; otherwise split on all latin-1 linebreak characters -- <list>.  {This is rather awkward.}

linebreak=True  If true, split on all latin-1 linebreak characters <list>; otherwise only split on \r, \n, \r\n.  {Better, to me}

Changing both code and doc, at least in 3.5, says that both are wrong. If we agree on this, there is still the awkward issue of what to do for 3.4.  Just change the doc?  Then email must do something different in 3.4 to work around the code behavior. I think this may warrant a pydev discussion.

Another issue is whether latin-1 linebreaks are privileged.  Why not implement the full unicode linebreak algorithm.

An additional complication is that in 2.x, .splitlines acts as advertised.

>>> 'a\x0ab\x0bc\x0cd\x0dda\x0d\x0a1c\x1c1d\x1d1e\x1e85\x85end'.splitlines()
['a', 'b\x0bc\x0cd', 'da', '1c\x1c1d\x1d1e\x1e85\x85end']

History
Date	User	Action	Args
2014-08-23 18:24:44	terry.reedy	set	recipients: + terry.reedy, barry, vstinner, ezio.melotti, r.david.murray, serhiy.storchaka, scharron
2014-08-23 18:24:44	terry.reedy	set	messageid: <1408818284.93.0.288576570276.issue22232@psf.upfronthosting.co.za>
2014-08-23 18:24:44	terry.reedy	link	issue22232 messages
2014-08-23 18:24:44	terry.reedy	create