Message31129
Hi!
re.finditer() seems to incorrectly increment the current position immediately after matching a zero-length substring. For example:
>>> [m.groups() for m in re.finditer(r'(^z*)|(\w+)', 'abc')]
[('', None), (None, 'bc')]
What happened to the 'a'? I expected this result:
[('', None), (None, 'abc')]
Perl agrees with me:
% perl -le 'print defined($1)?"\"$1\"":"undef",",",defined($2)?"\"$2\"":"undef" while "abc" =~ /(z*)|(\w+)/g'
"",undef
undef,"abc"
"",undef
Similarly, if I remove the ^:
>>> [m.groups() for m in re.finditer(r'(z*)|(\w+)', 'abc')]
[('', None), ('', None), ('', None), ('', None)]
Now all of the letters have fallen through the cracks! I expected this result:
[('', None), (None, 'abc'), ('', None)]
Again, perl agrees:
% perl -le 'print defined($1)?"\"$1\"":"undef",",",defined($2)?"\"$2\"":"undef" while "abc" =~ /(z*)|(\w+)/g'
"",undef
undef,"abc"
"",undef
If this bug has already been reported, I apologize -- I wasn't able to find it here. I haven't looked at the code for the re module, but this seems like the sort of bug that might have been accidentally introduced in order to try to prevent the same zero-length match from being returned forever.
Thanks,
Jacques |
|
Date |
User |
Action |
Args |
2007-08-23 14:51:37 | admin | link | issue1647489 messages |
2007-08-23 14:51:37 | admin | create | |
|