This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author tristanlatr
Recipients ezio.melotti, mrabarnett, tristanlatr
Date 2021-10-29.18:35:49
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1635532549.63.0.469467756484.issue45674@roundup.psfhosted.org>
In-reply-to
Content
From Python 3.7, sre_parse.parse() do not create SubPattern instances that can be used to back reproduce original expression if containing non-capturing groups. 

In Python 3.6: 

>>> import sre_parse
>>> sre_parse.parse("(?:foo (?:bar) | (?:baz))").dump()
SUBPATTERN None 0 0
  BRANCH
    LITERAL 102
    LITERAL 111
    LITERAL 111
    LITERAL 32
    SUBPATTERN None 0 0
      LITERAL 98
      LITERAL 97
      LITERAL 114
    LITERAL 32
  OR
    LITERAL 32
    SUBPATTERN None 0 0
      LITERAL 98
      LITERAL 97
      LITERAL 122


In Python 3.7 and beyond: 

>>> import sre_parse
>>> sre_parse.parse("(?:foo (?:bar) | (?:baz))").dump()
BRANCH
  LITERAL 102
  LITERAL 111
  LITERAL 111
  LITERAL 32
  LITERAL 98
  LITERAL 97
  LITERAL 114
  LITERAL 32
OR
  LITERAL 32
  LITERAL 98
  LITERAL 97
  LITERAL 122

This behaviour is making it impossible to write a correct colorizer for regular expressions using the sre_parse module from Python 3.7. I'm not a regex expert, so I cannot say wether this change has any effect on the matching itself, but if I trust regex101, it will add a capturing group in the place of the non-capturing group.
History
Date User Action Args
2021-10-29 18:35:49tristanlatrsetrecipients: + tristanlatr, ezio.melotti, mrabarnett
2021-10-29 18:35:49tristanlatrsetmessageid: <1635532549.63.0.469467756484.issue45674@roundup.psfhosted.org>
2021-10-29 18:35:49tristanlatrlinkissue45674 messages
2021-10-29 18:35:49tristanlatrcreate