This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author martin.panter
Recipients ezio.melotti, martin.panter, mrabarnett
Date 2016-08-19.12:06:59
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1471608420.14.0.872966965651.issue27800@psf.upfronthosting.co.za>
In-reply-to
Content
In the documentation for the “re” module, it says repetition codes like {4} and “*” operate on the preceding regular expression. But even though “a{4}” is a valid expression, the obvious way to apply a “*” repetition to it fails:

>>> re.compile("a{4}*")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/proj/python/cpython/Lib/re.py", line 223, in compile
    return _compile(pattern, flags)
  File "/home/proj/python/cpython/Lib/re.py", line 292, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/home/proj/python/cpython/Lib/sre_compile.py", line 555, in compile
    p = sre_parse.parse(p, flags)
  File "/home/proj/python/cpython/Lib/sre_parse.py", line 792, in parse
    p = _parse_sub(source, pattern, 0)
  File "/home/proj/python/cpython/Lib/sre_parse.py", line 406, in _parse_sub
    itemsappend(_parse(source, state))
  File "/home/proj/python/cpython/Lib/sre_parse.py", line 610, in _parse
    source.tell() - here + len(this))
sre_constants.error: multiple repeat at position 4

As a workaround, I found I can wrap the inner repetition in (?:. . .):

>>> re.compile("(?:a{4})*")
re.compile('(?:a{4})*')

The problems with the workaround are (a) it is far from obvious, and (b) it adds more complicated syntax. Either this limitation should be documented, or if there is no good reason for it, it should be lifted. It is not clear if my workaround is entirely valid, or if I just found a way to bypass some sanity check.

My original use case was scanning a base-64 encoding for Issue 27799:

# Without the second level of brackets, this raises a "multiple repeat" error
chunk_re = br'(?: (?: [^A-Za-z0-9+/=]* [A-Za-z0-9+/=] ){4} )*'
History
Date User Action Args
2016-08-19 12:07:00martin.pantersetrecipients: + martin.panter, ezio.melotti, mrabarnett
2016-08-19 12:07:00martin.pantersetmessageid: <1471608420.14.0.872966965651.issue27800@psf.upfronthosting.co.za>
2016-08-19 12:07:00martin.panterlinkissue27800 messages
2016-08-19 12:06:59martin.pantercreate