Author glchapman
Date 2003-06-16.18:44:34
SpamBayes Score
Marked as misclassified
Logged In: YES 

FWIW, one way to efficiently match this sort of pattern 
(where you have literal text at the end of something 
complex) would be to use the match once and don't 
backtrack operator ('(?>pattern)').  SRE doesn't have the 
(?> operator, but it can be emulated by '(?=(pattern))\1'.  
So one way to avoid the exponential time would be to use 
something like this:

rx = re.compile(
   ,         #trailing literal
   re.I | re.X

By the way, it would be fairly easy to add the (?> operator to 
SRE; it's almost identical to a postive lookahead assertion.  
The only difference is that, upon successfully matching the 
subexpression inside the parentheses, the position in the 
string being matched is advanced to the end of the text 
matched by the subexpression.

Also, in case anyone's interested in why Perl is able to fail so 
quickly here, it appears that the Perl engine keeps track of 
positions (in the target text) where something like 
'(pattern)+' has already been tried and has failed, so it can 
quickly fail if backtracking causes an attempt to match again 
at that position (at least that's my interpretation of the 
numerous "already tried at this position..." messages in the 
trace from running the supplied pattern and text with "use 
re 'debug';").
Date User Action Args
2007-08-23 14:13:52adminlinkissue753711 messages
2007-08-23 14:13:52admincreate