Title: infinite loop in re module
Type: resource usage Stage:
Components: Regular Expressions Versions: Python 2.4
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: brett.cannon Nosy List: andresriancho, brett.cannon
Priority: normal Keywords:

Created on 2007-09-27 19:18 by andresriancho, last changed 2007-09-28 03:23 by brett.cannon. This issue is now closed.

File name Uploaded Description Edit
a.txt andresriancho, 2007-09-27 19:18
Messages (5)
msg56178 - (view) Author: Andres Riancho (andresriancho) Date: 2007-09-27 19:18
dz0@dz0cybsec:/tmp$ python
Python 2.4.4 (#2, Aug 16 2007, 02:03:40) 
[GCC 4.1.3 20070812 (prerelease) (Debian 4.1.2-15)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> re.findall( '.*://(.*):(.*)@.*' , file('a.txt').read() )

===!infinite loop here!===
===!execute "kill pid"!===


dz0@dz0cybsec:/tmp$ dpkg -l python
| Status=Not/Installed/Config-files/Unpacked/Failed-config/Half-installed
|/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err:
||/ Name                              Version                          
ii  python                            2.4.4-6                          
An interactive high-level object-oriented language (default version)

dz0@dz0cybsec:/tmp$ uname -a
Linux dz0cybsec 2.6.21-2-k7 #1 SMP Wed Jul 11 04:29:08 UTC 2007 i686

See attached a.txt file.
msg56180 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2007-09-27 21:04
I am not convinced this an infinite loop but more of a poor-performing
regex over a large string.

You have three greedy quantifiers in that regex.  That means the first
one is going to consume the entire file, and then you begin back-off
looking for '://'.  Once that is found you consume the rest of the
string again and then back off looking for ':'.  Then the rest of the
string is consumed and then back-off begins for '@' which should fail
since I can't find a single instance of '@'.

But that is a lot of backing off over a large string which has a ton of
partial matches before failure occurs when looking for the '@'.  It
looks like you are trying to match a URL, which means you probably want
something more specific than '.' for those greedy quantifiers.

Closing as invalid.
msg56181 - (view) Author: Andres Riancho (andresriancho) Date: 2007-09-27 21:10
Have you tested it ?
Is the re.findall() finishing it's work ? I left it working for 5
minutes or more, and got no response.

msg56182 - (view) Author: Andres Riancho (andresriancho) Date: 2007-09-28 02:14
I think this should be reopened. The findall call is running for 3 hours
now. I think that it's a clear case of an infinite loop.
msg56184 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2007-09-28 03:23
If you chop out a bunch of the text it finishes fine.

I more succinct example that triggers the infinite recursion is
necessary before I will believe this is not just because the regex has
horrible performance thanks to its backtracking requirements.
Date User Action Args
2007-09-28 03:23:45brett.cannonsetmessages: + msg56184
2007-09-28 02:14:08andresrianchosetmessages: + msg56182
2007-09-27 21:10:58andresrianchosetmessages: + msg56181
2007-09-27 21:04:08brett.cannonsetstatus: open -> closed
assignee: brett.cannon
resolution: not a bug
messages: + msg56180
nosy: + brett.cannon
2007-09-27 19:18:55andresrianchocreate