Message334573
> I cannot see why changing the order of the alternation should have this effect.
The first regex, r'(a|ab)*?b', looks for the first alternative group by matching left-to-right [1] stopping at the first matching alternation "a". Roughly, the regex simplifies to r'(a)*?b' giving 'a' in the captured group.
The second regex, r'(ab|a)*?b', looks for the first alternative group by matching left-to-right [1] stopping at the first matching alternation "ab". Roughly, the regex simplifies to r'(ab)*?b' giving '' in the captured group.
From there, I'm not clear on how a non-greedy kleene-star works with capturing groups and with the overall span(). A starting point would be to look at the re.DEBUG output for each pattern [2][3].
[1] From the re docs for the alternation operator:
As the target string is scanned, REs separated by '|' are tried from left to right. When one pattern completely matches, that branch is accepted. This means that once A matches, B will not be tested further, even if it would produce a longer overall match. In other words, the '|' operator is never greedy.
[2] re.DEBUG output for r'(a|ab)*?b'
0. INFO 4 0b0 1 MAXREPEAT (to 5)
5: REPEAT 19 0 MAXREPEAT (to 25)
9. MARK 0
11. LITERAL 0x61 ('a')
13. BRANCH 3 (to 17)
15. JUMP 7 (to 23)
17: branch 5 (to 22)
18. LITERAL 0x62 ('b')
20. JUMP 2 (to 23)
22: FAILURE
23: MARK 1
25: MIN_UNTIL
26. LITERAL 0x62 ('b')
28. SUCCESS
[3] re.DEBUG output for r'(ab|a)*?b'
MIN_REPEAT 0 MAXREPEAT
SUBPATTERN 1 0 0
LITERAL 97
BRANCH
LITERAL 98
OR
LITERAL 98
0. INFO 4 0b0 1 MAXREPEAT (to 5)
5: REPEAT 19 0 MAXREPEAT (to 25)
9. MARK 0
11. LITERAL 0x61 ('a')
13. BRANCH 5 (to 19)
15. LITERAL 0x62 ('b')
17. JUMP 5 (to 23)
19: branch 3 (to 22)
20. JUMP 2 (to 23)
22: FAILURE
23: MARK 1
25: MIN_UNTIL
26. LITERAL 0x62 ('b')
28. SUCCESS |
|
Date |
User |
Action |
Args |
2019-01-30 16:59:26 | rhettinger | set | recipients:
+ rhettinger, ezio.melotti, mrabarnett, malin, davisjam |
2019-01-30 16:59:24 | rhettinger | set | messageid: <1548867564.27.0.765343075837.issue35859@roundup.psfhosted.org> |
2019-01-30 16:59:24 | rhettinger | link | issue35859 messages |
2019-01-30 16:59:24 | rhettinger | create | |
|