Message 371718 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	matpi
Recipients	malin, matpi
Date	2020-06-17.08:10:16
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1592381417.36.0.0407511496227.issue40980@roundup.psfhosted.org>
In-reply-to

Content
Because utf-8 is Python's default encoding, e.g. in source files, decode() and encode(). Literally everywhere. If you ask around "I have a bytestring, I need a string, what do I do?", using latin-1 will not be the first answer (and moreover, the correct answer should be "it depends on the encoding", which re happily ignores by just asserting one). Saying "just strip that b prefix, it's fine" cannot be taken seriously. Yes latin-1 will never give an error on converting a bytestring, because it has full coverage of the 256 byte values, but saying that this is the reason why it should be used instead of another is forgetting why we have Unicode in the first place. It is just pretending that Unicode never was a thing. It is not because it can decode any bytestring that it will not return garbage _when the bytestring is not latin-1-encoded in the first place_. Take a look at the documentation: https://docs.python.org/3/howto/unicode.html 7 references to latin-1, none saying that latin-1 is the way to go because it is so much better than anything else. latin-1 used to be prominent in the 2.x world, it should slowly be time to recognize that this is over, and we cannot ignore anymore that encoding is a thing.

Because utf-8 is Python's default encoding, e.g. in source files, decode() and encode(). Literally everywhere.

If you ask around "I have a bytestring, I need a string, what do I do?", using latin-1 will not be the first answer (and moreover, the correct answer should be "it depends on the encoding", which re happily ignores by just asserting one).

Saying "just strip that b prefix, it's fine" cannot be taken seriously.

Yes latin-1 will never give an error on converting a bytestring, because it has full coverage of the 256 byte values, but saying that this is the reason why it should be used instead of another is forgetting why we have Unicode in the first place. **It is just pretending that Unicode never was a thing**. It is not because it can decode any bytestring that it will not return garbage _when the bytestring is not latin-1-encoded in the first place_.

Take a look at the documentation: https://docs.python.org/3/howto/unicode.html
7 references to latin-1, none saying that latin-1 is the way to go because it is so much better than anything else.

latin-1 used to be prominent in the 2.x world, it should slowly be time to recognize that this is over, and we cannot ignore anymore that encoding is a thing.

History
Date	User	Action	Args
2020-06-17 08:10:17	matpi	set	recipients: + matpi, malin
2020-06-17 08:10:17	matpi	set	messageid: <1592381417.36.0.0407511496227.issue40980@roundup.psfhosted.org>
2020-06-17 08:10:17	matpi	link	issue40980 messages
2020-06-17 08:10:16	matpi	create