Message 31129 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	jfrechet
Recipients
Date	2007-01-29.22:35:21
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
Hi! re.finditer() seems to incorrectly increment the current position immediately after matching a zero-length substring. For example: >>> [m.groups() for m in re.finditer(r'(^z)\|(\w+)', 'abc')] [('', None), (None, 'bc')] What happened to the 'a'? I expected this result: [('', None), (None, 'abc')] Perl agrees with me: % perl -le 'print defined($1)?"\"$1\"":"undef",",",defined($2)?"\"$2\"":"undef" while "abc" =~ /(z)\|(\w+)/g' "",undef undef,"abc" "",undef Similarly, if I remove the ^: >>> [m.groups() for m in re.finditer(r'(z)\|(\w+)', 'abc')] [('', None), ('', None), ('', None), ('', None)] Now all of the letters have fallen through the cracks! I expected this result: [('', None), (None, 'abc'), ('', None)] Again, perl agrees: % perl -le 'print defined($1)?"\"$1\"":"undef",",",defined($2)?"\"$2\"":"undef" while "abc" =~ /(z)\|(\w+)/g' "",undef undef,"abc" "",undef If this bug has already been reported, I apologize -- I wasn't able to find it here. I haven't looked at the code for the re module, but this seems like the sort of bug that might have been accidentally introduced in order to try to prevent the same zero-length match from being returned forever. Thanks, Jacques

Hi!

re.finditer() seems to incorrectly increment the current position immediately after matching a zero-length substring.  For example:

>>> [m.groups() for m in re.finditer(r'(^z*)|(\w+)', 'abc')]
[('', None), (None, 'bc')]

What happened to the 'a'?  I expected this result:

[('', None), (None, 'abc')]

Perl agrees with me:

% perl -le 'print defined($1)?"\"$1\"":"undef",",",defined($2)?"\"$2\"":"undef" while "abc" =~ /(z*)|(\w+)/g' 
"",undef
undef,"abc"
"",undef

Similarly, if I remove the ^:

>>> [m.groups() for m in re.finditer(r'(z*)|(\w+)', 'abc')]
[('', None), ('', None), ('', None), ('', None)]

Now all of the letters have fallen through the cracks!  I expected this result:

[('', None), (None, 'abc'), ('', None)]

Again, perl agrees:

% perl -le 'print defined($1)?"\"$1\"":"undef",",",defined($2)?"\"$2\"":"undef" while "abc" =~ /(z*)|(\w+)/g' 
"",undef
undef,"abc"
"",undef

If this bug has already been reported, I apologize -- I wasn't able to find it here.  I haven't looked at the code for the re module, but this seems like the sort of bug that might have been accidentally introduced in order to try to prevent the same zero-length match from being returned forever.

Thanks,
Jacques

History
Date	User	Action	Args
2007-08-23 14:51:37	admin	link	issue1647489 messages
2007-08-23 14:51:37	admin	create