Message15565
SRE is broken in some subtle ways when you combine
capturing groups with assertions. For example:
>>> re.match('((?!(a)c)[ab])*', 'abc').groups()
('b', '')
In the above '(a)' has matched an empty string. Or
worse:
>>> re.match('(a)((?!(b)*))*', 'abb').groups()
('b', None, None)
Here '(a)' matches 'b'.
Although Perl reports matches for groups in negative
assertions, I think it is better to adopt the PCRE rule
that these groups are always reported as unmatched
outside the assertion (inside the assertion, if used with
backreferences, they should behave as normal). This
would make the handling of subpatterns in negative
assertions consistent with that of subpatterns in
branches:
>>> re.match('(a)c|ab', 'ab').groups()
(None,)
In the above, although '(a)' matches before the branch
fails, the failure of the branch means '(a)' is considered
not to have matched.
Anyway, the attached patch is an effort to fix this
problem by saving the values of marks before calling the
assertion, and then restoring them afterwards (thus
undoing whatever might have been done in the assertion).
|
|
Date |
User |
Action |
Args |
2007-08-23 14:12:38 | admin | link | issue725149 messages |
2007-08-23 14:12:38 | admin | create | |
|