classification
Title: Regex compilation crashed if I change order of alternatives under quantifier
Type: behavior Stage:
Components: Regular Expressions Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Renji, ezio.melotti, mrabarnett
Priority: normal Keywords:

Created on 2021-01-09 01:25 by Renji, last changed 2021-01-09 03:04 by Renji.

Messages (5)
msg384703 - (view) Author: (Renji) Date: 2021-01-09 01:25
I can compile "((a)|b\2)*" expression and this expression successfully return captures from first repetition and second repetition in one time. But if I write (b\2|(a))* expression, I get "invalid group reference 2 at position 3" error. Either first or second behavior incorrect.
python3 --version Python 3.7.3

import re
text="aba"
#match=re.search(r"(b\2|(a))*",text) - not worked
match=re.search(r"((a)|b\2)*",text)
if(match):
    #show aba ba a
    print(match.group(0)+" "+match.group(1)+" "+match.group(2))
msg384708 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2021-01-09 02:00
It's not a crash. It's complaining that you're referring to group 2 before defining it. The re module doesn't support forward references to groups, but only backward references to them.
msg384709 - (view) Author: (Renji) Date: 2021-01-09 02:11
In my example reference and capture group presents in two difference alternatives. They don't follow each other, but executed in random order. If this don't supported in one case, why it supported in other case?
msg384711 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2021-01-09 02:38
Example 1:

    ((a)|b\2)*
     ^^^       Group 2

    ((a)|b\2)*
          ^^   Reference to group 2

    The reference refers backwards to the group.

Example 2:

    (b\2|(a))*
         ^^^   Group 2

    (b\2|(a))*
      ^^       Reference to group 2

    The reference refers forwards to the group.

As I said, the re module doesn't support forward references to groups.

If you have a regex where forward references are unavoidable, try the 3rd-party 'regex' module instead. It's available on PyPI.
msg384712 - (view) Author: (Renji) Date: 2021-01-09 03:04
I through "forward reference" is "\1 (abcd)". Not "some sort of reference in second repetition to data from first repetition".

Ok. In other words refers from on repetition to other supported, but with purely formal restrictions. And remove this restrictions don't planned. Than this issue may be closed.
History
Date User Action Args
2021-01-09 03:04:21Renjisetmessages: + msg384712
2021-01-09 02:38:37mrabarnettsetmessages: + msg384711
2021-01-09 02:11:40Renjisetmessages: + msg384709
2021-01-09 02:00:02mrabarnettsetmessages: + msg384708
2021-01-09 01:25:30Renjicreate