classification
Title: Regular expressions: (?:X|\s+)*$ takes a long time
Type: performance Stage:
Components: Versions: Python 2.7
process
Status: closed Resolution: wont fix
Dependencies: Superseder: the re module can perform poorly: O(2**n) versus O(n**2)
View: 1662581
Assigned To: Nosy List: alex, ericp, ezio.melotti, pitrou, terry.reedy
Priority: normal Keywords:

Created on 2012-01-06 21:43 by ericp, last changed 2012-11-07 17:43 by mark.dickinson. This issue is now closed.

Messages (2)
msg150770 - (view) Author: Eric Promislow (ericp) Date: 2012-01-06 21:43
This regular expression takes a few seconds to be evaluated
against any text:

(.*?)((?:X|\s+)*)$

This reg ex is much faster:

(.*?)((?:X|\s)*)$

To be fair, Ruby's performance with the first regex is the same as Python's. PHP and JavaScript both fail to match the first regex
at all.  Only Perl evaluates both regexes nearly instantly.
msg151221 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012-01-14 04:10
I believe it is a known fact that repeated repeats, like (...+)*, make for slow matching (if they work at all) with the current re engine.
[I would not be surprised if Perl does some special casing to (in effect at least) rewrite the re to your second version.] This is not going to be improved in 2.7, nor immediately in 3.x. You can try the regex module on pypi, but it may act the same. I suspect there are similar issues like this on the tracker. Best to write the re properly.

[Antoine or Ezio: If you think I am mistaken in closing this, please reopen.]
History
Date User Action Args
2012-11-07 17:43:13mark.dickinsonsetsuperseder: the re module can perform poorly: O(2**n) versus O(n**2)
2012-01-14 04:10:05terry.reedysetstatus: open -> closed

nosy: + terry.reedy, ezio.melotti, pitrou
messages: + msg151221

resolution: wont fix
2012-01-06 21:46:33alexsetnosy: + alex
2012-01-06 21:43:55ericpcreate