This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Regexp findall freezes
Type: crash Stage:
Components: Library (Lib) Versions: Python 3.2, Python 2.6
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: alex, complex, ezio.melotti, mrabarnett, vstinner
Priority: normal Keywords:

Created on 2011-03-24 22:29 by complex, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
regexp_freeze.py complex, 2011-03-24 22:29 Script to reproduce the freeze
Messages (7)
msg132048 - (view) Author: Viktor Ferenczi (complex) Date: 2011-03-24 22:29
Finding all matches of a expression freezes:

{{{
fviktor@sirius:~$ python3.2
Python 3.2 (r32:88445, Mar  8 2011, 01:24:57) 
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> text = '\\   = 0) & (lag < 1000) & (registered = 1) & !computer & (autocolor = 0) &'
>>> rx = re.compile(r'(<(?:(?:[^<>]*)|(?:"[^"]*"))*>)')
>>> rx.findall(text)

It freezes at this point with 100% CPU load. So I pressed Ctrl-C to break it, which works:

^CTraceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyboardInterrupt
>>> 
}}}

It is freezing on Python 2.6.6 as well, so it seems to be an old issue just (re)discovered.

The regexp is ugly, I know. It can be written much simpler (r'(<.*?>|".*?")'), which is working fine. But this issue points out a possible vulnerability: DOS attack due to freezing a Python application utilizing an affected regexp to parse user input.

I wasn't able to narrow down it further, but this issue is also depending on the text parsed, not only on the regexp pattern itself.
msg132049 - (view) Author: Viktor Ferenczi (complex) Date: 2011-03-24 22:33
Fixing my typo in this bug report:

"Finding all matches of a regular expression freezes:"

(I'm not allowed to edit the report's body itself.)
msg132050 - (view) Author: Alex Gaynor (alex) * (Python committer) Date: 2011-03-24 22:59
Yes, this is known as catastrophic backtracking, and there isn't really a solution for it, some regexps can't be efficiently matched.
msg132051 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-03-24 23:06
If I read correctly '(<(?:(?:[^<>]*)|(?:"[^"]*"))*>)', it is something like (A*|B)*. Regex like (A*)* is *very* slow. It can easily be optimized to A*. Or for (A*|B)* => (A|B)*.

So '(<(?:(?:[^<>]*)|(?:"[^"]*"))*>)' can be optimized to '(<(?:(?:[^<>])|(?:"[^"]*"))*>)'.

I hope that it does match the same thing :-)

I wrote a library to optimize regular expression, but you are unliky: it doesn't support (?:...) yet :-)
https://bitbucket.org/haypo/hachoir/wiki/hachoir-regex
msg132114 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2011-03-25 16:36
Alex is correct.

This part:

    [^<>]*

can match an empty string, and it's nested with a repeated group. It stalls, repeatedly matching an empty string.

Incidentally, my regex implementation (available on PyPI) returns [].
msg133095 - (view) Author: Viktor Ferenczi (complex) Date: 2011-04-05 22:33
I found this ugly regexp in old code and optimized this, certainly. I'll need to search for more of this in the code to make sure it won't freeze next time.

I think you can close this ticket, since it is only another way to write an infinite loop in Python, not a real bug.
msg133097 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-04-05 22:55
Ok.
History
Date User Action Args
2022-04-11 14:57:15adminsetgithub: 55874
2011-04-05 22:55:07vstinnersetstatus: open -> closed
resolution: not a bug
messages: + msg133097
2011-04-05 22:33:19complexsetmessages: + msg133095
2011-03-25 16:36:09mrabarnettsetmessages: + msg132114
2011-03-25 07:51:32ezio.melottisetnosy: + ezio.melotti, mrabarnett
2011-03-24 23:06:59vstinnersetnosy: + vstinner
messages: + msg132051
2011-03-24 22:59:03alexsetnosy: + alex
messages: + msg132050
2011-03-24 22:33:38complexsetmessages: + msg132049
2011-03-24 22:29:03complexcreate