Author chriscog
Recipients
Date 2000-08-23.04:23:14
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
Ah yes, I understand what's going on, here. For those that are just tuning in, the part of the re that is taking so much time is this:

\s+.+

The problem is that \s+ can reduce one or more spaces, and the .+ reduces the rest. Of course, the RE engine doesnt know how many spaces the \s should consume, and tries every combination exhaustively. This doesnt take much time in itself, but when coupled with the (...)+ group, its instantly of the order n^2

I can solve this problem by changing the \s+ to simply \s, since I dont really care if I match one or more \s, as long as there's at least one. tim_one suggested using [ \t] instead, since it makes sure I dont gobble a \n either.

As an ancillary, I think there's been recent optimisations in the pcre code (external to python), as I cannot reproduce the problem on my box at home, which also uses python 1.5.2. I can only guess that it was compiled with a later version of PCRE.
History
Date User Action Args
2007-08-23 13:50:07adminlinkissue212521 messages
2007-08-23 13:50:07admincreate