This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author gregory.p.smith
Recipients gregory.p.smith
Date 2015-07-10.02:18:32
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1436494713.33.0.435726120833.issue24601@psf.upfronthosting.co.za>
In-reply-to
Content
for bytes, \v (0x0b) is not considered a line break.  for unicode, it is.

this traces back to the Objects/stringlib/ code where unicode defers to the decision made by Objects/unicodeobject.c's ascii_linebreak table which contains 7 line breaks in the 0..127 character range:

static unsigned char ascii_linebreak[] = {
    0, 0, 0, 0, 0, 0, 0, 0,
/*         0x000A, * LINE FEED */
/*         0x000B, * LINE TABULATION */
/*         0x000C, * FORM FEED */
/*         0x000D, * CARRIAGE RETURN */
    0, 0, 1, 1, 1, 1, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0,
/*         0x001C, * FILE SEPARATOR */
/*         0x001D, * GROUP SEPARATOR */
/*         0x001E, * RECORD SEPARATOR */
    0, 0, 0, 0, 1, 1, 1, 0,


Whereas Objects/stringlib/stringdefs.h used by only considers \r and \n.

I think these should be consistent.  But making this change likely breaks existing code in weird ways.

This does come up when porting from 2 to 3 as a str '' type with one of those other characters in it was not broken by splitlines in 2.x but is broken by splitlines in 3.x.
History
Date User Action Args
2015-07-10 02:18:33gregory.p.smithsetrecipients: + gregory.p.smith
2015-07-10 02:18:33gregory.p.smithsetmessageid: <1436494713.33.0.435726120833.issue24601@psf.upfronthosting.co.za>
2015-07-10 02:18:33gregory.p.smithlinkissue24601 messages
2015-07-10 02:18:32gregory.p.smithcreate