Message30015
Logged In: YES
user_id=11375
I haven't dug very far into the code, but suspect this isn't
a bug in the regex code.
The pattern uses lots of .*? subpatterns, and this often
means the pattern takes a long time to fail if it isn't
going to match. The regex engine matches the <link> group,
and then there's a .*?, followed by <b>. The engine looks
at every character and if it sees a <b>, tries another .*?.
This is O(n**2) where n is the number of character in the
string being searched, and that string is 93,000 characters
long. If you limit the string to 5K or so, the match fails
pretty quickly.
I strongly suggest working with the HTML. You could run the
HTML through tidy to convert to XHTML and use ElementTree on
the resulting XML.
|
|
Date |
User |
Action |
Args |
2007-08-23 14:43:06 | admin | link | issue1566086 messages |
2007-08-23 14:43:06 | admin | create | |
|