classification
Title: Small case which hangs
Type: Stage:
Components: Regular Expressions Versions: Python 2.7
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Adding a new regex module (compatible with re)
View: 2636
Assigned To: niemeyer Nosy List: georg.brandl, goatchurch, mrabarnett, niemeyer, rsc, timehorse
Priority: low Keywords:

Created on 2007-05-18 20:12 by goatchurch, last changed 2010-08-21 23:53 by georg.brandl. This issue is now closed.

Messages (6)
msg32073 - (view) Author: Julian Todd (goatchurch) Date: 2007-05-18 20:12
import re
s = "Add.1, 2020 and Add.1, 2021-2023, 2025, 2028 and 2029 and Add.1) R"
r = "(?:\s|,|and|Add\S*?|Parts?|\([^\)]*\)|[IV\-\d]+)*$"
print "It's going to crash"
print re.search(r, s)
print "It hasn't crashed"
msg32074 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2007-05-18 20:55
I assume this is one of the pathological cases where the RE algorithm backtracks "to death". That's not really a bug, but a limitation.
msg32075 - (view) Author: Julian Todd (goatchurch) Date: 2007-05-18 22:37
I can shorten it further to: 

s = "Add.1, 2020 Add.1, 20212023, 2025, 2028 and 2029 Add.1) R"
r = "(?:\s|,|and|Add\S*?|[\d]+)*$"

Sometimes it finishes after a long time when you shorten the s-string from the front.  If you take the "R" from the end it goes through straight away.  It's even very slow with match.  
msg73983 - (view) Author: Jeffrey C. Jacobs (timehorse) Date: 2008-09-28 19:12
Tested on 2.6rc2 and slow but successful.  Issue 1662851 may be related.
msg81421 - (view) Author: Matthew Barnett (mrabarnett) * Date: 2009-02-08 21:34
This problem has been addressed in issue #2636.

Although the extra checks certainly aren't foolproof, neither of the
examples given are slow.
msg114612 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2010-08-21 23:53
If at all, this will be fixed by applying #2636.
History
Date User Action Args
2010-08-21 23:53:41georg.brandlsetstatus: open -> closed
resolution: duplicate
superseder: Adding a new regex module (compatible with re)
messages: + msg114612
2009-02-08 21:34:12mrabarnettsetnosy: + mrabarnett
messages: + msg81421
2008-09-28 19:12:53timehorsesetmessages: + msg73983
2008-09-28 19:04:25timehorsesetnosy: + timehorse
versions: + Python 2.7, - Python 2.4
2008-04-24 21:12:05rscsetnosy: + rsc
2007-05-18 20:12:09goatchurchcreate