This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author serhiy.storchaka
Recipients ezio.melotti, mrabarnett, pitrou, serhiy.storchaka
Date 2014-09-25.06:56:26
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1411628187.04.0.177569242065.issue22491@psf.upfronthosting.co.za>
In-reply-to
Content
Currently regular expressions support on '\n' as line boundary. To meet Unicode standard requirement RL1.6 [1] all Unicode line separators should be supported: '\n', '\r', '\v', '\f', '\x85', '\u2028', '\u2029' and two-character '\r\n'. Also it is recommended that '.' in "dotall" mode matches '\r\n'. Also strongly recommended to support the '\R' pattern which matches all line separators (equivalent to '(?:\\r\n|(?!\r\n)[\n\v\f\r\x85\u2028\u2029]').

>>> [m.start() for m in re.finditer('$', '\r\n\n\r', re.M)]
[1, 2, 4]  # should be [0, 2, 3, 4]
>>> [m.start() for m in re.finditer('^', '\r\n\n\r', re.M)]
[0, 2, 3]  # should be [0, 2, 3, 4]
>>> [m.group() for m in re.finditer('.', '\r\n\n\r', re.M|re.S)]
['\r', '\n', '\n', '\r']  # should be ['\r\n', '\n', '\r']
>>> [m.group() for m in re.finditer(r'\R', '\r\n\n\r')]
[]  # should be ['\r\n', '\n', '\r']

[1] http://www.unicode.org/reports/tr18/#RL1.6
History
Date User Action Args
2014-09-25 06:56:27serhiy.storchakasetrecipients: + serhiy.storchaka, pitrou, ezio.melotti, mrabarnett
2014-09-25 06:56:27serhiy.storchakasetmessageid: <1411628187.04.0.177569242065.issue22491@psf.upfronthosting.co.za>
2014-09-25 06:56:26serhiy.storchakalinkissue22491 messages
2014-09-25 06:56:26serhiy.storchakacreate