Title: re does not allow back references in {} matching operator
Type: enhancement Stage: resolved
Components: Regular Expressions Versions: Python 3.5
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, hardkrash, mrabarnett
Priority: normal Keywords:

Created on 2014-02-18 20:43 by hardkrash, last changed 2014-09-18 10:01 by serhiy.storchaka. This issue is now closed.

Messages (4)
msg211551 - (view) Author: steven Michalske (hardkrash) Date: 2014-02-18 20:43
When writing a regular expression to match the following text.

d = """num interesting lines: 3

# I only want to match the interesting lines.

m = re.match(".+?: (\d+)\n((?:.+\n){\1})", d)
# prints: None
# Expected a match object.
# Causes Exception.
# Expected: ('3', '1\n2\n3\n')

# Works with hard coded match length.
m = re.match(".+?: (\d+)\n((?:.+\n){3})", d)
('3', '1\n2\n3\n')

Workaround it to have two regular expressions.  one to extract the desired length the second to extract the interesting lines.
msg211554 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2014-02-18 21:01
I don't know of any regex implementation that lets you do that.
msg211581 - (view) Author: steven Michalske (hardkrash) Date: 2014-02-19 01:04
The RE compiler will not error out, with a back reference in there...
It treats the {\1} as a literal {\1} in the string.

In [180]:"(\d) fo.{\1}", '3 foo{\1}').group(0)
Out[180]: '3 foo{\x01}'
msg211588 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2014-02-19 02:40

If it's not a valid repeat, then it's treated as a literal.

Perl does the same.

By the way, "\1" isn't a group reference; it's the same as "\x01". You should be either doubling the backslashes ("\\1") or using a raw string literal (r"\1").
Date User Action Args
2014-09-18 10:01:17serhiy.storchakasetstatus: open -> closed
resolution: not a bug
stage: resolved
2014-02-19 02:40:41mrabarnettsetmessages: + msg211588
2014-02-19 01:04:02hardkrashsetmessages: + msg211581
2014-02-18 21:01:35mrabarnettsettype: behavior -> enhancement
messages: + msg211554
2014-02-18 20:43:28hardkrashcreate