This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Title: SRE bug with capturing groups in alternatives in repeats
Type: Stage:
Components: Regular Expressions Versions: Python 2.3
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: niemeyer Nosy List: glchapman, niemeyer
Priority: normal Keywords:

Created on 2003-04-21 17:16 by glchapman, last changed 2022-04-10 16:08 by admin. This issue is now closed.

File name Uploaded Description Edit
rep_alts_patch.txt glchapman, 2003-04-21 17:16
Messages (3)
msg15562 - (view) Author: Greg Chapman (glchapman) Date: 2003-04-21 17:16
SRE does not always correctly handle groups in 
alternatives in repeats.  For example:

>>> re.match('((a)|b)*', 'abc').groups()
('b', '')

Group 2 should obviously never be an empty string.  As I 
understand it, the rule for groups inside a repeat is that 
they should have the last value they matched during the 
iterations of the repeat (or None if they never match), so 
in the above case Group 2 should be 'a'.  To fix this, it 
appears that (when inside a repeat) the BRANCH 
opcode must call mark_save before trying an alternative 
and then call mark_restore if the alternative fails.  The 
attached patch does this.

msg15563 - (view) Author: Gustavo Niemeyer (niemeyer) * (Python committer) Date: 2003-04-27 12:35
Logged In: YES 

Good catch Greg!

Just for reference, here are two tests to confirm that
you're right:

perl -e '"abc" =~ /^((a)|b)*/; print "$1 $2\n";'
echo "abc" | sed -r -e "s/^((a)|b)*/\1 \2|/"

The only change I made was to port your tests to

Applied as:

Modules/_sre.c: 2.94
Lib/test/ 1.40

msg15564 - (view) Author: Gustavo Niemeyer (niemeyer) * (Python committer) Date: 2003-04-27 14:26
Logged In: YES 

Greg, I'm going to change the fix slightly, moving the
mark_save() to outside of the for loop.
Date User Action Args
2022-04-10 16:08:16adminsetgithub: 38344
2003-04-21 17:16:52glchapmancreate