classification
Title: Regex.finditer infinite loops with certain input
Type: crash Stage: resolved
Components: Regular Expressions Versions: Python 3.4, Python 3.5, Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Alan Mislove, ezio.melotti, mrabarnett, serhiy.storchaka
Priority: normal Keywords:

Created on 2016-03-13 10:15 by Alan Mislove, last changed 2016-03-13 11:51 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
bug.py Alan Mislove, 2016-03-13 10:15 Example demonstrating bug
Messages (4)
msg261692 - (view) Author: Alan Mislove (Alan Mislove) Date: 2016-03-13 10:15
I found a regex and input that causes re.finditer() to appear to get into an infinite loop.  Please see the attached minimal python script that triggers the behavior.  I've verified the bug exists on 2.7.6, 3.4.0, and 3.5.1; I haven't yet be able to test whether it also exists in the latest 3.6.
msg261693 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-03-13 10:36
This is finite loop. Your script just needs too much time to finish.
msg261694 - (view) Author: Alan Mislove (Alan Mislove) Date: 2016-03-13 10:45
Thanks for the quick reply, Serhiy! 

You're right -- after letting it run for even longer, it does complete.  Sorry for the trouble.  I have two quick followup questions:

1.  Would this be considered a performance bug?  On my machine, it runs for over 20 seconds before completing.

2.  This regex was a much-simplified version of a larger one (I was trying to isolate the problem for the bug report).  When running the larger regex on the same input, it ran for multiple days without completing.  Should I send the larger regex along, or is this issue out of scope?
msg261697 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-03-13 11:51
This is known issue. Some regular expressions has quadratic or even exponential complexity. Existing implementation can't be easy fixed. You can try an regular expressions engine that use completely different algorithms, i.e. re2 (https://pypi.python.org/pypi/re2).

Or rewrite your regular expression. You can ask for help on the comp.lang.python newsgroup (news:comp.lang.python). It is also accessible as a mailing list (https://mail.python.org/mailman/listinfo/python-list). There is a Web-interface (http://dir.gmane.org/gmane.comp.python.general).
History
Date User Action Args
2016-03-13 11:51:35serhiy.storchakasetmessages: + msg261697
2016-03-13 10:45:52Alan Mislovesetmessages: + msg261694
2016-03-13 10:36:51serhiy.storchakasetstatus: open -> closed

nosy: + serhiy.storchaka
messages: + msg261693

resolution: not a bug
stage: resolved
2016-03-13 10:15:14Alan Mislovecreate