This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Python2.0 re module: greedy regexp bug
Type: Stage:
Components: Regular Expressions Versions:
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: effbot Nosy List: effbot
Priority: normal Keywords:

Created on 2001-03-02 04:37 by anonymous, last changed 2022-04-10 16:03 by admin. This issue is now closed.

Messages (2)
msg3649 - (view) Author: Nobody/Anonymous (nobody) Date: 2001-03-02 04:37
Python-2.0 fails to correctly handle cases when greedy
and non-greedy regular expressions are present in the
same pattern. In some cases non-greedy are more
aggressive than greedy searches!!

Here is an example:

TESTB-BED system: San Solaris 2.6
Python 2.0 (#1, Dec  9 2000, 12:35:40) 
[GCC 2.95.2 19991024 (release)] on sunos5

Please, run the following code to see the bug:

import re
str='first_XXXX_last'
#                   \1  \2    \3
mo = re.search(r'^(.*?)(X*)(.*?)$', str)
# above, the groups \1 and \3 are NON-greedy regexp
# while group \2 is greedy.
# Unfortunately Python-2.0 demonstrates buggy behavior
# here:
print "BUGGY result:", mo.groups()
print "should be   : ('first_', 'XXXX', '_last')"
#EOF

When I run the code above it prints the following:

BUGGY result: ('', '', 'first_XXXX_last')
should be   : ('first_', 'XXXX', '_last')
>>> 

Thanks,
--Leo  <slonika@yahoo.com>
msg3650 - (view) Author: Fredrik Lundh (effbot) * (Python committer) Date: 2001-03-02 08:14
Logged In: YES 
user_id=38376

you forgot that X* and .*? may both match empty strings.

try changing one of them to a +, and the expression will 
work as you expected.

Cheers /F
History
Date User Action Args
2022-04-10 16:03:48adminsetgithub: 34046
2001-03-02 04:37:49anonymouscreate