This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author glchapman
Recipients
Date 2003-03-31.19:54:33
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
The attached patch fixes two bugs in _sre.c; it also 
does a bit of reorganization.  

First the bugs.  672491 points out that lastindex is 
calculated differently in 2.3 than in previous versions.  
This patch restores the previous behavior.  Since 
lastindex cannot be restored (when backtracking) from 
lastmark alone, it is now saved and restored 
independently (by the LASTMARK_SAVE and 
RESTORE macros).

The second bug appears when minimizing repeats are 
combined with assertions:

>>> re.match('([ab]*?)(?=(b)?)c', 'abc').groups()
('ab', 'b')

The second group should be None, since the 'b' is 
consumed by the first group.  To fix this, it is necessary 
to save lastmark before attempting to match the tail in 
OP_MIN_UNTIL and to restore it if the tail fails to match.

The reorganization has to do with the handling of the 
SRE_STATE's lastmark and mark array.  The mark 
array tracks the start and end of capturing groups; 
lastmark is the highest index in the array so far 
encountered.  Previously, whenever lastmark was 
restored back to a lower value (in 2.3a2 this is done in 
the lastmark_restore function), the tail of the mark array 
was NULLed out (using memset).  This patch adopts the 
rule that all indexes greater than lastmark are invalid, so 
restoring lastmark does not also require clearing the 
tail.  To ensure that indexes <= lastmark have valid 
pointers, OP_MARK checks if lastmark is being 
increased by more than one; if so, it NULLs out the 
intervening pointers.  This rule also required changes to 
the GROUPREF opcodes and the state_getslice 
function to ensure that they do not access indexes 
greater than lastmark.  For consistency, lastmark is 
now initialized to –1, to indicate that no entries in the 
mark array are valid.

Needless to say, the reorganization is not necessary to 
fix the bugs; it may be a bad idea.  It seems to be 
marginally faster than a version that fixes the bugs but is 
similar to the current code (including a memset inside 
the LASTMARK_RESTORE macro).

One other thing.  I have removed a test for string == 
Py_None from state_getslice, since I can’t find any way 
for string to be Py_None at that point (string is always 
the object providing the text to be searched; if it were 
Py_None, an exception should be raised by the 
getstring function called by state_init).  Perhaps I 
missed something?
 
History
Date User Action Args
2007-08-23 15:21:49adminlinkissue712900 messages
2007-08-23 15:21:49admincreate