This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author matpi
Recipients malin, matpi
Date 2020-06-16.13:51:43
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1592315503.45.0.107578900502.issue40980@roundup.psfhosted.org>
In-reply-to
Content
> It seems you don't know some knowledge of encoding yet.

I don't have to be ashamed of my knowledge of encoding. Yet you are right that I was missing a subtlety, which is that latin-1 is a strict subset of Unicode rather than a completely arbitrary encoding. Thank you for that.

So what you are saying is that group names in bytes regexes can only be specified directly (without -explicit- encoding), so de facto they are limited to the latin-1 subset.

Very well.

But then, once again:

1) why convert them to string when spitting them out? bytes they were when going in, bytes they should remain... **By converting them you are choosing an arbitrary encoding, even if it is the "natural" one.**
2) this limitation to the latin-1 subset is not compatible with the documentation, which says that valid Python identifiers are valid group names. If this was really the case, then I would expect to be able to use any string for which .isidentifier() is true as a group name, programmatically.
History
Date User Action Args
2020-06-16 13:51:43matpisetrecipients: + matpi, malin
2020-06-16 13:51:43matpisetmessageid: <1592315503.45.0.107578900502.issue40980@roundup.psfhosted.org>
2020-06-16 13:51:43matpilinkissue40980 messages
2020-06-16 13:51:43matpicreate