This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: From Python 3.7, sre_parse.parse() do not create SubPattern instances that can be used to back reproduce original expression if containing non-capturing groups
Type: behavior Stage: resolved
Components: Regular Expressions Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: AlexWaygood, ezio.melotti, mrabarnett, serhiy.storchaka, tristanlatr
Priority: normal Keywords:

Created on 2021-10-29 18:35 by tristanlatr, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (3)
msg405327 - (view) Author: Tristan (tristanlatr) Date: 2021-10-29 18:35
From Python 3.7, sre_parse.parse() do not create SubPattern instances that can be used to back reproduce original expression if containing non-capturing groups. 

In Python 3.6: 

>>> import sre_parse
>>> sre_parse.parse("(?:foo (?:bar) | (?:baz))").dump()
SUBPATTERN None 0 0
  BRANCH
    LITERAL 102
    LITERAL 111
    LITERAL 111
    LITERAL 32
    SUBPATTERN None 0 0
      LITERAL 98
      LITERAL 97
      LITERAL 114
    LITERAL 32
  OR
    LITERAL 32
    SUBPATTERN None 0 0
      LITERAL 98
      LITERAL 97
      LITERAL 122


In Python 3.7 and beyond: 

>>> import sre_parse
>>> sre_parse.parse("(?:foo (?:bar) | (?:baz))").dump()
BRANCH
  LITERAL 102
  LITERAL 111
  LITERAL 111
  LITERAL 32
  LITERAL 98
  LITERAL 97
  LITERAL 114
  LITERAL 32
OR
  LITERAL 32
  LITERAL 98
  LITERAL 97
  LITERAL 122

This behaviour is making it impossible to write a correct colorizer for regular expressions using the sre_parse module from Python 3.7. I'm not a regex expert, so I cannot say wether this change has any effect on the matching itself, but if I trust regex101, it will add a capturing group in the place of the non-capturing group.
msg405329 - (view) Author: Alex Waygood (AlexWaygood) * (Python triager) Date: 2021-10-29 18:50
Bugfixes are only being applied for Python >=3.9, but I've reproduced this output on 3.11
msg405350 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-10-29 22:06
sre_parse.parse() is an internal function and this behaviour is an implementation detail.

This change enabled some optimizations which did not work with non-capturing groups before. It did not affect the matching itself.
History
Date User Action Args
2022-04-11 14:59:51adminsetgithub: 89837
2021-10-29 22:06:23serhiy.storchakasetstatus: open -> closed

nosy: + serhiy.storchaka
messages: + msg405350

resolution: not a bug
stage: resolved
2021-10-29 18:50:43AlexWaygoodsetnosy: + AlexWaygood

messages: + msg405329
versions: + Python 3.9, Python 3.10, Python 3.11, - Python 3.7
2021-10-29 18:35:49tristanlatrcreate