Message227508
Currently regular expressions support on '\n' as line boundary. To meet Unicode standard requirement RL1.6 [1] all Unicode line separators should be supported: '\n', '\r', '\v', '\f', '\x85', '\u2028', '\u2029' and two-character '\r\n'. Also it is recommended that '.' in "dotall" mode matches '\r\n'. Also strongly recommended to support the '\R' pattern which matches all line separators (equivalent to '(?:\\r\n|(?!\r\n)[\n\v\f\r\x85\u2028\u2029]').
>>> [m.start() for m in re.finditer('$', '\r\n\n\r', re.M)]
[1, 2, 4] # should be [0, 2, 3, 4]
>>> [m.start() for m in re.finditer('^', '\r\n\n\r', re.M)]
[0, 2, 3] # should be [0, 2, 3, 4]
>>> [m.group() for m in re.finditer('.', '\r\n\n\r', re.M|re.S)]
['\r', '\n', '\n', '\r'] # should be ['\r\n', '\n', '\r']
>>> [m.group() for m in re.finditer(r'\R', '\r\n\n\r')]
[] # should be ['\r\n', '\n', '\r']
[1] http://www.unicode.org/reports/tr18/#RL1.6 |
|
Date |
User |
Action |
Args |
2014-09-25 06:56:27 | serhiy.storchaka | set | recipients:
+ serhiy.storchaka, pitrou, ezio.melotti, mrabarnett |
2014-09-25 06:56:27 | serhiy.storchaka | set | messageid: <1411628187.04.0.177569242065.issue22491@psf.upfronthosting.co.za> |
2014-09-25 06:56:26 | serhiy.storchaka | link | issue22491 messages |
2014-09-25 06:56:26 | serhiy.storchaka | create | |
|