This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author matpi
Recipients ezio.melotti, malin, matpi, mrabarnett
Date 2020-06-16.10:13:21
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1592302401.69.0.875137830465.issue40980@roundup.psfhosted.org>
In-reply-to
Content
Agreed to some extent, but there is the difference that group names are embedded in the pattern, which has to be bytes if the target is bytes.

My use case is in an all-bytes, no-string project where I construct a large regular expression at startup, with semi-dynamical group names.

So it seems natural to have everything in bytes to concatenate the regular expression, incl. the group names.

But then group names that I receive back are strings, so I cannot look them up directly into the set of group names that I used to create the expression in the first place.

Of course I can live with it by storing them as strings in the first place and encode()'ing them during concatenation, but it does not feel "natural".

Furthermore, even if it is "just a name", a non-ascii group name will raise an error in bytes, even if encoded...:

```
>>> re.compile("(?P<" + "é" + ">)")
re.compile('(?P<é>)')
>>> re.compile(b"(?P<" + "é".encode() + b">)")
Traceback (most recent call last):
  File "<pyshell#9>", line 1, in <module>
    re.compile(b"(?P<" + "é".encode() + b">)")
  File "/usr/lib/python3.8/re.py", line 252, in compile
    return _compile(pattern, flags)
  File "/usr/lib/python3.8/re.py", line 304, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/usr/lib/python3.8/sre_compile.py", line 764, in compile
    p = sre_parse.parse(p, flags)
  File "/usr/lib/python3.8/sre_parse.py", line 948, in parse
    p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
  File "/usr/lib/python3.8/sre_parse.py", line 443, in _parse_sub
    itemsappend(_parse(source, state, verbose, nested + 1,
  File "/usr/lib/python3.8/sre_parse.py", line 703, in _parse
    raise source.error(msg, len(name) + 1)
re.error: bad character in group name 'é' at position 4
```

So no, it's not really "just a name", considering that in Python "é" should is a valid name.
History
Date User Action Args
2020-06-16 10:13:21matpisetrecipients: + matpi, ezio.melotti, mrabarnett, malin
2020-06-16 10:13:21matpisetmessageid: <1592302401.69.0.875137830465.issue40980@roundup.psfhosted.org>
2020-06-16 10:13:21matpilinkissue40980 messages
2020-06-16 10:13:21matpicreate