This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author matpi
Recipients malin, matpi
Date 2020-06-17.00:26:29
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1592353590.53.0.149226661557.issue40980@roundup.psfhosted.org>
In-reply-to
Content
I just had an "aha moment": What re claims is that, rather than doing as I suggested:

> ```
> # consider the following bytestring pattern
> >>> p = b"(?P<\xc3\xba>)"
> 
> # what character does the group name correspond to?
> # maybe we can try to infer it by decoding the bytestring?
> # let's try to do it with the default encoding... that's natural, right?
> >>> p.decode()
> '(?P<ú>)'
> ```

the actual way to know what group name is represented would be to look at the (unicode) string with the same "graphical representation":

```
# consider the following bytestring pattern
>>> p = b"(?P<\xc3\xba>)"

# what character does the group name correspond to?
# to discover it, we instead consider the string that "looks the same":
>>> "(?P<\xc3\xba>)"
'(?P<ú>)'

# ok so the group name will be "ú"
```

This way of going from bytes to strings _naively_ (which happens to be called latin-1) makes IMHO as much sense as saying that 0x10, 0b10 and 0o10 should be the same value, just because they "look the same" in the source code.

This is like throwing away everything we ever learned about Unicode and how a code point is fundamentally different from what is stored in memory.
History
Date User Action Args
2020-06-17 00:26:30matpisetrecipients: + matpi, malin
2020-06-17 00:26:30matpisetmessageid: <1592353590.53.0.149226661557.issue40980@roundup.psfhosted.org>
2020-06-17 00:26:30matpilinkissue40980 messages
2020-06-17 00:26:29matpicreate