This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author matpi
Recipients malin, matpi
Date 2020-06-17.08:10:16
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1592381417.36.0.0407511496227.issue40980@roundup.psfhosted.org>
In-reply-to
Content
Because utf-8 is Python's default encoding, e.g. in source files, decode() and encode(). Literally everywhere.

If you ask around "I have a bytestring, I need a string, what do I do?", using latin-1 will not be the first answer (and moreover, the correct answer should be "it depends on the encoding", which re happily ignores by just asserting one).

Saying "just strip that b prefix, it's fine" cannot be taken seriously.

Yes latin-1 will never give an error on converting a bytestring, because it has full coverage of the 256 byte values, but saying that this is the reason why it should be used instead of another is forgetting why we have Unicode in the first place. **It is just pretending that Unicode never was a thing**. It is not because it can decode any bytestring that it will not return garbage _when the bytestring is not latin-1-encoded in the first place_.

Take a look at the documentation: https://docs.python.org/3/howto/unicode.html
7 references to latin-1, none saying that latin-1 is the way to go because it is so much better than anything else.

latin-1 used to be prominent in the 2.x world, it should slowly be time to recognize that this is over, and we cannot ignore anymore that encoding is a thing.
History
Date User Action Args
2020-06-17 08:10:17matpisetrecipients: + matpi, malin
2020-06-17 08:10:17matpisetmessageid: <1592381417.36.0.0407511496227.issue40980@roundup.psfhosted.org>
2020-06-17 08:10:17matpilinkissue40980 messages
2020-06-17 08:10:16matpicreate