This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: infinite loop in re.match
Type: Stage:
Components: Regular Expressions Versions:
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: effbot Nosy List: dmaurer, effbot, gvanrossum, tim.peters
Priority: normal Keywords:

Created on 2001-07-10 09:44 by dmaurer, last changed 2022-04-10 16:04 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
S2.py dmaurer, 2001-07-10 09:44 bug reproduction
Messages (5)
msg5339 - (view) Author: Dieter Maurer (dmaurer) Date: 2001-07-10 09:44
Python 2.0.1 "re.match" enters infinite loop for
a regular expression that was carefully designed not
to lead to loops (an easy expression).

See attached file.
msg5340 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2001-07-10 17:01
Logged In: YES 
user_id=31435

Assigned to /F, but I don't see a bug here.  While your 
regexp should be speedy *when* applied to a string that 
actually matches (the regexp does not match the long string 
in your test case), you create an exponential number of 
backtracking possibilities by repeating \s* at both ends of 
the first repeated group (in effect, you're specifying an 
unbounded number of instances of the highly ambiguous 
\s*\s*).  So when the regexp doesn't match, it can be 
unboundedly slow.  This minor rewrite doesn't have that 
problem (it simply moves the first \s* "out of the loop"):

r='</[^>]*>\s*(<[?][^>]*>\s*)+<[^/?][^>]*>'

BTW, your test string doesn't match because it ends with

</list>

It *would* match if it ended with

<list>

It's unclear whether you expected it to match.
msg5341 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2001-07-10 19:58
Logged In: YES 
user_id=6380

A general comment for folks who report problems with regular
expressions: Komodo has a tool that debugs a regular
expression, by stepping through its execution step by step.
Very educational, and would probably help understanding
what's going on in cases like the case here reported.
msg5342 - (view) Author: Fredrik Lundh (effbot) * (Python committer) Date: 2001-07-11 10:39
Logged In: YES 
user_id=38376

See Tim's analysis for the correct solution.

(time to add a "canned response"?)
msg5343 - (view) Author: Dieter Maurer (dmaurer) Date: 2001-07-11 10:56
Logged In: YES 
user_id=265829

Thank you all!

I learned something...
History
Date User Action Args
2022-04-10 16:04:11adminsetgithub: 34720
2001-07-10 09:44:17dmaurercreate