Message246538
for bytes, \v (0x0b) is not considered a line break. for unicode, it is.
this traces back to the Objects/stringlib/ code where unicode defers to the decision made by Objects/unicodeobject.c's ascii_linebreak table which contains 7 line breaks in the 0..127 character range:
static unsigned char ascii_linebreak[] = {
0, 0, 0, 0, 0, 0, 0, 0,
/* 0x000A, * LINE FEED */
/* 0x000B, * LINE TABULATION */
/* 0x000C, * FORM FEED */
/* 0x000D, * CARRIAGE RETURN */
0, 0, 1, 1, 1, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
/* 0x001C, * FILE SEPARATOR */
/* 0x001D, * GROUP SEPARATOR */
/* 0x001E, * RECORD SEPARATOR */
0, 0, 0, 0, 1, 1, 1, 0,
Whereas Objects/stringlib/stringdefs.h used by only considers \r and \n.
I think these should be consistent. But making this change likely breaks existing code in weird ways.
This does come up when porting from 2 to 3 as a str '' type with one of those other characters in it was not broken by splitlines in 2.x but is broken by splitlines in 3.x. |
|
Date |
User |
Action |
Args |
2015-07-10 02:18:33 | gregory.p.smith | set | recipients:
+ gregory.p.smith |
2015-07-10 02:18:33 | gregory.p.smith | set | messageid: <1436494713.33.0.435726120833.issue24601@psf.upfronthosting.co.za> |
2015-07-10 02:18:33 | gregory.p.smith | link | issue24601 messages |
2015-07-10 02:18:32 | gregory.p.smith | create | |
|