classification
Title: MULTILINE confuses re.split
Type: behavior Stage: resolved
Components: Regular Expressions Versions: Python 2.7
process
Status: closed Resolution: duplicate
Dependencies: Superseder: re.sub confusion between count and flags args
View: 11957
Assigned To: Nosy List: dabrahams, ezio.melotti, mrabarnett, serhiy.storchaka
Priority: normal Keywords:

Created on 2012-08-02 14:58 by dabrahams, last changed 2014-10-29 16:13 by vstinner. This issue is now closed.

Messages (5)
msg167228 - (view) Author: Dave Abrahams (dabrahams) Date: 2012-08-02 14:58
compare the output of

$ python -c "open('/tmp/tst','w').write(100*'x\n');import re;print len(re.split('\n(?=x)', open('/tmp/tst').read()))"
100

with

$ python -c "open('/tmp/tst','w').write(100*'x\n');import re;print len(re.split('\n(?=x)', open('/tmp/tst').read(), re.MULTILINE))"
9
msg167240 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-08-02 17:05
re.split = split(pattern, string, maxsplit=0, flags=0)
    Split the source string by the occurrences of the pattern,
    returning a list containing the resulting substrings.  If
    capturing parentheses are used in pattern, then the text of all
    groups in the pattern are also returned as part of the resulting
    list.  If maxsplit is nonzero, at most maxsplit splits occur,
    and the remainder of the string is returned as the final element
    of the list.

maxsplit=0 in your fist example and maxsplit=8 (re.MULTILINE is 8) in your second example. This is not a bug, this is a wrong understanding.
msg167243 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2012-08-02 17:28
There are actually 2 issues here:

1. The third argument is 'maxsplit', the fourth is 'flags'.

2. It never splits on a zero-width match. See issue 3262.
msg167282 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-08-03 02:41
See also #11957.
msg167283 - (view) Author: Dave Abrahams (dabrahams) Date: 2012-08-03 02:47
Dang!  Thanks, and sorry for wasting everyone's time on this.
History
Date User Action Args
2014-10-29 16:13:16vstinnersetsuperseder: re.sub confusion between count and flags args
resolution: not a bug -> duplicate
2012-08-04 21:07:14r.david.murraylinkissue15536 superseder
2012-08-03 02:47:47dabrahamssetmessages: + msg167283
2012-08-03 02:41:48ezio.melottisetstatus: open -> closed
type: behavior
messages: + msg167282

resolution: not a bug
stage: resolved
2012-08-02 17:29:00mrabarnettsetmessages: + msg167243
2012-08-02 17:05:06serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg167240
2012-08-02 14:58:56dabrahamscreate