This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lemburg
Recipients belopolsky, doerwalter, ezio.melotti, lemburg, nascheme, r.david.murray, serhiy.storchaka, vstinner, wpk, xtreak
Date 2018-10-05.08:07:02
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1538726822.65.0.545547206417.issue18291@psf.upfronthosting.co.za>
In-reply-to
Content
The Unicode .splitlines() splits strings on what Unicode defines as linebreak characters (all code points with character properties Zl or bidirectional property B).

This is different than what typical CSV file parsers or other parsers built for the ASCII text files treat as newline. They usually only break on CR, CRLF, LF, so the use of .splitlines() in this context is wrong, not the method itself.

It may make sense extending .splitlines() to pass in a list of linebreak characters to break on, but that would make it a lot slower and the same can already be had by using re.split() on Unicode strings.

Closing this as won't fix.
History
Date User Action Args
2018-10-05 08:07:02lemburgsetrecipients: + lemburg, doerwalter, nascheme, belopolsky, vstinner, ezio.melotti, r.david.murray, serhiy.storchaka, wpk, xtreak
2018-10-05 08:07:02lemburgsetmessageid: <1538726822.65.0.545547206417.issue18291@psf.upfronthosting.co.za>
2018-10-05 08:07:02lemburglinkissue18291 messages
2018-10-05 08:07:02lemburgcreate