This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author jfrechet
Recipients
Date 2007-01-29.22:35:21
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
Hi!

re.finditer() seems to incorrectly increment the current position immediately after matching a zero-length substring.  For example:

>>> [m.groups() for m in re.finditer(r'(^z*)|(\w+)', 'abc')]
[('', None), (None, 'bc')]

What happened to the 'a'?  I expected this result:

[('', None), (None, 'abc')]

Perl agrees with me:

% perl -le 'print defined($1)?"\"$1\"":"undef",",",defined($2)?"\"$2\"":"undef" while "abc" =~ /(z*)|(\w+)/g' 
"",undef
undef,"abc"
"",undef

Similarly, if I remove the ^:

>>> [m.groups() for m in re.finditer(r'(z*)|(\w+)', 'abc')]
[('', None), ('', None), ('', None), ('', None)]

Now all of the letters have fallen through the cracks!  I expected this result:

[('', None), (None, 'abc'), ('', None)]

Again, perl agrees:

% perl -le 'print defined($1)?"\"$1\"":"undef",",",defined($2)?"\"$2\"":"undef" while "abc" =~ /(z*)|(\w+)/g' 
"",undef
undef,"abc"
"",undef

If this bug has already been reported, I apologize -- I wasn't able to find it here.  I haven't looked at the code for the re module, but this seems like the sort of bug that might have been accidentally introduced in order to try to prevent the same zero-length match from being returned forever.

Thanks,
Jacques
History
Date User Action Args
2007-08-23 14:51:37adminlinkissue1647489 messages
2007-08-23 14:51:37admincreate