This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author matpi
Recipients malin, matpi
Date 2020-06-16.20:37:53
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1592339873.63.0.277699758327.issue40980@roundup.psfhosted.org>
In-reply-to
Content
You questioned my knowledge of encodings. Let's quote from one of the most famous introductory articles on the subject (https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/):

> It does not make sense to have a string without knowing what encoding it uses

So I have that bytestring that comes from somewhere, maybe it was originally utf-8 or cp1250 or ... encoded, but I won't tell or don't know, the only thing I swear is that it originally was a valid Python identifier.
Now I pass it as a group name in re.match (it was a valid Python identifier, so that has to be alright per the docs) and I get back a (unicode) string.
re.match, how dare you giving me back a string when _you have no clue what my bytestring originally represented, resp. what it originally was encoded with_?
Maybe re.match will even crash, because it wrongly and assumes the bytestring to have been latin-1 encoded!

So: latin-1 is an arbitrary choice that is no better than any other, and the fact that it "naturally" converts bytes to unicode code points is an implementation detail.
If you want to keep it so, it ought (cf. the quote above) to be made clear in the docs that group names come out as latin-1-encoded strings, with all the restrictions that follow from that choice.
But the more logical way would be to renounce this arbitrary encoding altogether.
History
Date User Action Args
2020-06-16 20:37:53matpisetrecipients: + matpi, malin
2020-06-16 20:37:53matpisetmessageid: <1592339873.63.0.277699758327.issue40980@roundup.psfhosted.org>
2020-06-16 20:37:53matpilinkissue40980 messages
2020-06-16 20:37:53matpicreate