Message43246
The attached patch fixes two bugs in _sre.c; it also
does a bit of reorganization.
First the bugs. 672491 points out that lastindex is
calculated differently in 2.3 than in previous versions.
This patch restores the previous behavior. Since
lastindex cannot be restored (when backtracking) from
lastmark alone, it is now saved and restored
independently (by the LASTMARK_SAVE and
RESTORE macros).
The second bug appears when minimizing repeats are
combined with assertions:
>>> re.match('([ab]*?)(?=(b)?)c', 'abc').groups()
('ab', 'b')
The second group should be None, since the 'b' is
consumed by the first group. To fix this, it is necessary
to save lastmark before attempting to match the tail in
OP_MIN_UNTIL and to restore it if the tail fails to match.
The reorganization has to do with the handling of the
SRE_STATE's lastmark and mark array. The mark
array tracks the start and end of capturing groups;
lastmark is the highest index in the array so far
encountered. Previously, whenever lastmark was
restored back to a lower value (in 2.3a2 this is done in
the lastmark_restore function), the tail of the mark array
was NULLed out (using memset). This patch adopts the
rule that all indexes greater than lastmark are invalid, so
restoring lastmark does not also require clearing the
tail. To ensure that indexes <= lastmark have valid
pointers, OP_MARK checks if lastmark is being
increased by more than one; if so, it NULLs out the
intervening pointers. This rule also required changes to
the GROUPREF opcodes and the state_getslice
function to ensure that they do not access indexes
greater than lastmark. For consistency, lastmark is
now initialized to –1, to indicate that no entries in the
mark array are valid.
Needless to say, the reorganization is not necessary to
fix the bugs; it may be a bad idea. It seems to be
marginally faster than a version that fixes the bugs but is
similar to the current code (including a memset inside
the LASTMARK_RESTORE macro).
One other thing. I have removed a test for string ==
Py_None from state_getslice, since I can’t find any way
for string to be Py_None at that point (string is always
the object providing the text to be searched; if it were
Py_None, an exception should be raised by the
getstring function called by state_init). Perhaps I
missed something?
|
|
Date |
User |
Action |
Args |
2007-08-23 15:21:49 | admin | link | issue712900 messages |
2007-08-23 15:21:49 | admin | create | |
|