This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Re module's match() fails to halt on a particular input
Type: behavior Stage: resolved
Components: Regular Expressions Versions: Python 3.4, Python 2.7
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: ceridwen, ezio.melotti, mrabarnett, serhiy.storchaka
Priority: normal Keywords:

Created on 2015-02-27 20:18 by ceridwen, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
mwe.py ceridwen, 2015-02-27 20:18 Minimal working example to illustrate infinite loop.
Messages (3)
msg236829 - (view) Author: Ceridwen (ceridwen) Date: 2015-02-27 20:18
Attached is a three-line script whose third line, a call to match() for a compiled regex, fails to halt on Python 2.7 and on 3.4 (after changing ur in front of the regex string to r to accommodate the change in Unicode handling).  I found it by stepping through the debugger so I suspect the problem is in the C backend, not the Python.
msg236833 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-02-27 20:39
Minimal example for your case is

re.match(r'(\w+\s*)+\s+\d+', 'Compiling Regular Patterns to Sequential Machines')

It doesn't halt, it just needs too long time to finish. Your should rewrite the regular expression to avoid catastrophic backtracking (http://www.regular-expressions.info/catastrophic.html). For example rewrite the start of your expression as

re.match(r'\w[\w\s]*\s\d+', 'Compiling Regular Patterns to Sequential Machines')
msg236843 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2015-02-27 21:31
The problem is with the "(\w+\s*)+". \s* can match an empty string, so when matching a word it's like "(\w+)+".

If you have n letters, there are 2**n ways it could match, and if what follows never matches, it'll try all of them.

It _will_ finish, eventually...
History
Date User Action Args
2022-04-11 14:58:13adminsetgithub: 67729
2015-03-02 09:08:10serhiy.storchakasetstatus: open -> closed
resolution: wont fix
stage: resolved
2015-02-27 21:31:54mrabarnettsetmessages: + msg236843
2015-02-27 20:39:30serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg236833
2015-02-27 20:18:35ceridwencreate