classification
Title: RecursionError in re with '(' * 500
Type: behavior Stage: resolved
Components: Library (Lib), Regular Expressions Versions: Python 3.6, Python 3.4, Python 3.5
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: The Compiler, effbot, ezio.melotti, mrabarnett, pitrou, serhiy.storchaka, terry.reedy
Priority: normal Keywords:

Created on 2015-11-04 09:02 by The Compiler, last changed 2016-10-16 08:25 by serhiy.storchaka. This issue is now closed.

Messages (7)
msg254040 - (view) Author: Florian Bruhin (The Compiler) * Date: 2015-11-04 09:02
I just found this thanks to Hypothesis[1]:

    >>> import re
    >>> re.compile('(' * 500)
    Traceback (most recent call last): 
      File "<stdin>", line 1, in <module>
      File "/usr/lib/python3.5/re.py", line 224, in compile
        return _compile(pattern, flags)
      File "/usr/lib/python3.5/re.py", line 293, in _compile
        p = sre_compile.compile(pattern, flags)
      File "/usr/lib/python3.5/sre_compile.py", line 536, in compile
        p = sre_parse.parse(p, flags)
      File "/usr/lib/python3.5/sre_parse.py", line 829, in parse
        p = _parse_sub(source, pattern, 0)
      File "/usr/lib/python3.5/sre_parse.py", line 437, in _parse_sub
        itemsappend(_parse(source, state))
      File "/usr/lib/python3.5/sre_parse.py", line 778, in _parse
        p = _parse_sub(source, state)
      File "/usr/lib/python3.5/sre_parse.py", line 437, in _parse_sub
        itemsappend(_parse(source, state))
      File "/usr/lib/python3.5/sre_parse.py", line 778, in _parse
        p = _parse_sub(source, state)
      File "/usr/lib/python3.5/sre_parse.py", line 437, in _parse_sub
        itemsappend(_parse(source, state))
      [...]
      File "/usr/lib/python3.5/sre_parse.py", line 493, in _parse
        subpattern = SubPattern(state)
    RecursionError: maximum recursion depth exceeded

It seems a maximum recursion error has been treated as a bug before (like in issue401612), so I'm guessing this isn't intended here either.

[1] https://hypothesis.readthedocs.org/
msg254042 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-04 09:27
This isn't a bug, this is a limitation of the implementation. Regular expression parser is recursive and Python has a limit for recursion depth. You can increase this limit if needed.

>>> import sys, re
>>> sys.setrecursionlimit(2000)
>>> re.compile('(' * 500)
Traceback (most recent call last):
  File "/home/serhiy/py/cpython/Lib/sre_parse.py", line 437, in _parse_sub
    itemsappend(_parse(source, state))
  File "/home/serhiy/py/cpython/Lib/sre_parse.py", line 778, in _parse
    p = _parse_sub(source, state)
  File "/home/serhiy/py/cpython/Lib/sre_parse.py", line 437, in _parse_sub
    itemsappend(_parse(source, state))
...
  File "/home/serhiy/py/cpython/Lib/sre_parse.py", line 778, in _parse
    p = _parse_sub(source, state)
  File "/home/serhiy/py/cpython/Lib/sre_parse.py", line 437, in _parse_sub
    itemsappend(_parse(source, state))
  File "/home/serhiy/py/cpython/Lib/sre_parse.py", line 781, in _parse
    source.tell() - start)
sre_constants.error: missing ), unterminated subpattern at position 499
msg254046 - (view) Author: Florian Bruhin (The Compiler) * Date: 2015-11-04 10:26
I see how it's not possible to compile that pattern as it's using recursion - I don't mind that.

However, I think this should be handled and re-raised as a re.error ("Exception raised [...] when some other error occurs during compilation or matching.").

I think no matter what the pattern is, it's quite unexpected to get anything other than a re.error (or a TypeError) from re.compile.
msg254060 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-04 15:54
re.compile can also raise ValueError and OverflowError. Agree that may be these exceptions can be converted to re.error.

But it can also raise MemoryError, and KeyboardInterrupt. And these exceptions can't be converted to re.error.

RecursionError is closer to the latter case. It depends not only on the pattern itself, but on the place where it is called. A pattern that is compiled successfully at top level can produce a RecursionError if it is compiled deeply in the recursion.

>>> import re
>>> def r(n):
...     if not n:
...         return re.compile('x')
...     return r(n-1)
... 
>>> r(990)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 4, in r
  File "<stdin>", line 4, in r
...
  File "<stdin>", line 4, in r
  File "<stdin>", line 3, in r
  File "/home/serhiy/py/cpython/Lib/re.py", line 224, in compile
    return _compile(pattern, flags)
  File "/home/serhiy/py/cpython/Lib/re.py", line 293, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/home/serhiy/py/cpython/Lib/sre_compile.py", line 555, in compile
    p = sre_parse.parse(p, flags)
  File "/home/serhiy/py/cpython/Lib/sre_parse.py", line 822, in parse
    source = Tokenizer(str)
  File "/home/serhiy/py/cpython/Lib/sre_parse.py", line 225, in __init__
    self.__next()
RecursionError: maximum recursion depth exceeded
>>> re.compile('x')
re.compile('x')
msg254236 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2015-11-06 22:22
Is the recursion limit hit enough in real-world problems that we should consider replacing recursive calls and the implicit C call stack with loops and an explicit stack?
msg254261 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-07 07:52
I think it is possible to make the parser non-recursive, but without reports about such real-world problems it looks premature and just will complicate the code. For years before 3.5 there was a limitation on only 100 capturing groups.
msg278752 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-10-16 08:25
Closed as not a bug. Unlike to issue401612 this is not an infinite recursion. Raising RecursionError depends on the context (it is more likely in deeply nested call) and there is a common workaround -- just increase maximum recursion depth using sys.setrecursionlimit(). There is nothing specific to regular expressions.
History
Date User Action Args
2016-10-16 08:25:41serhiy.storchakasetstatus: open -> closed
resolution: not a bug
messages: + msg278752

stage: resolved
2015-11-07 07:52:41serhiy.storchakasetmessages: + msg254261
2015-11-06 22:22:05terry.reedysetnosy: + terry.reedy
messages: + msg254236
2015-11-04 15:54:55serhiy.storchakasetmessages: + msg254060
2015-11-04 10:26:25The Compilersetmessages: + msg254046
2015-11-04 09:27:11serhiy.storchakasetmessages: + msg254042
2015-11-04 09:07:10SilentGhostsetnosy: + mrabarnett

components: + Regular Expressions
versions: + Python 3.6
2015-11-04 09:02:56The Compilercreate