classification
Title: re search infinite loop
Type: Stage:
Components: Regular Expressions Versions: Python 2.4, Python 2.3, Python 2.5
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Check for signals during regular expression matches
View: 846388
Assigned To: niemeyer Nosy List: christian.heimes, donallen, mrabarnett, niemeyer, schmir
Priority: normal Keywords:

Created on 2006-03-12 14:46 by donallen, last changed 2009-02-08 21:41 by mrabarnett. This issue is now closed.

Files
File name Uploaded Description Edit
test.csv donallen, 2006-03-12 14:46
Messages (5)
msg27760 - (view) Author: Don Allen (donallen) Date: 2006-03-12 14:46
Given the attached test.csv file, the following program
loops forever (can't even ^c):

import re

orig = open('test.csv')

file_contents = orig.read()
orig.close()

find_line = re.compile(r'^(".*")?(,(".*")?)*\n')
search_result = find_line.search(file_contents)
print search_result.span()

The corresponding tcl program works correctly:

set orig [open test.csv r]

set file_contents [read $orig]
close $orig

regexp -indices {^(".*")?(,(".*")?)*\n} $file_contents
\ indices
puts "Indices were $indices"

Both tests were run on a TP G41 running Gentoo Linux.

msg27761 - (view) Author: Don Allen (donallen) Date: 2006-03-12 15:22
Logged In: YES 
user_id=1474165

If you eliminate the \n at the end of the regular
expression, the python program works correctly (for this
example; I am trying to use regular expressions to parse the
.csv files generated by Microsoft Outlook, which contain
eols inside fields, so I'm trying to find the eols *not*
inside fields with this regexp, so I need the \n; I'll have
to go to Plan B, I suppose).
msg59626 - (view) Author: Ralf Schmitt (schmir) Date: 2008-01-09 21:22
You're being a victim of two issues here:

1.regular expression matching can take a long time. see:
http://bugs.python.org/issue1662581

2. regular expression matching was not interruptible:
http://bugs.python.org/issue846388
msg59649 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2008-01-10 01:55
Duplicate and partly fixed
msg81422 - (view) Author: Matthew Barnett (mrabarnett) * Date: 2009-02-08 21:41
This problem has been addressed in issue #2636.

Although the extra checks certainly aren't foolproof, some regular
expressions which were slow won't be any more.
History
Date User Action Args
2009-02-08 21:41:53mrabarnettsetnosy: + mrabarnett
messages: + msg81422
2008-01-10 01:55:51christian.heimessetstatus: open -> closed
resolution: duplicate
superseder: Check for signals during regular expression matches
messages: + msg59649
nosy: + christian.heimes
2008-01-09 21:22:13schmirsetnosy: + schmir
messages: + msg59626
versions: + Python 2.5, Python 2.3
2006-03-12 14:46:30donallencreate