This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author matpi
Recipients ezio.melotti, malin, matpi, mrabarnett
Date 2020-06-16.11:19:48
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1592306388.53.0.413990454089.issue40980@roundup.psfhosted.org>
In-reply-to
Content
Of course an inconvenience in my program is not per se the reason to change the language. I just wanted to motivate that the current situation gives unexpected results.

"\xe9" doesn't look like proper utf-8 to me:

```
>>> "é".encode("latin-1")
b'\xe9'
>>> "é".encode()
b'\xc3\xa9'
```

Let's try another one: how would you go for Δ ("\u0394") as a group name?


```
>>> "Δ".encode()
b'\xce\x94'
>>> "Δ".encode("latin-1")
Traceback (most recent call last):
  File "<pyshell#21>", line 1, in <module>
    "Δ".encode("latin-1")
UnicodeEncodeError: 'latin-1' codec can't encode character '\u0394' in position 0: ordinal not in range(256)
>>> re.match(b'(?P<\xce\x94>)', b'').groupdict()
Traceback (most recent call last):
  File "<pyshell#16>", line 1, in <module>
    re.match(b'(?P<\xce\x94>)', b'').groupdict()
  File "/usr/lib/python3.8/re.py", line 191, in match
    return _compile(pattern, flags).match(string)
  File "/usr/lib/python3.8/re.py", line 304, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/usr/lib/python3.8/sre_compile.py", line 764, in compile
    p = sre_parse.parse(p, flags)
  File "/usr/lib/python3.8/sre_parse.py", line 948, in parse
    p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
  File "/usr/lib/python3.8/sre_parse.py", line 443, in _parse_sub
    itemsappend(_parse(source, state, verbose, nested + 1,
  File "/usr/lib/python3.8/sre_parse.py", line 703, in _parse
    raise source.error(msg, len(name) + 1)
re.error: bad character in group name 'Î\x94' at position 4
>>> re.match(b'(?P<\u0394>)', b'').groupdict()
Traceback (most recent call last):
  File "<pyshell#12>", line 1, in <module>
    re.match(b'(?P<\u0394>)', b'').groupdict()
  File "/usr/lib/python3.8/re.py", line 191, in match
    return _compile(pattern, flags).match(string)
  File "/usr/lib/python3.8/re.py", line 304, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/usr/lib/python3.8/sre_compile.py", line 764, in compile
    p = sre_parse.parse(p, flags)
  File "/usr/lib/python3.8/sre_parse.py", line 948, in parse
    p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
  File "/usr/lib/python3.8/sre_parse.py", line 443, in _parse_sub
    itemsappend(_parse(source, state, verbose, nested + 1,
  File "/usr/lib/python3.8/sre_parse.py", line 703, in _parse
    raise source.error(msg, len(name) + 1)
re.error: bad character in group name '\\u0394' at position 4
```
History
Date User Action Args
2020-06-16 11:19:48matpisetrecipients: + matpi, ezio.melotti, mrabarnett, malin
2020-06-16 11:19:48matpisetmessageid: <1592306388.53.0.413990454089.issue40980@roundup.psfhosted.org>
2020-06-16 11:19:48matpilinkissue40980 messages
2020-06-16 11:19:48matpicreate