This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients ezio.melotti, ncoghlan, r.david.murray, vstinner
Date 2015-04-18.22:29:23
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
> "if you are using the C locale you or the OS are broken anyway, so we'll just pass the bytes through"

Exactly. Even if you use Unicode, the Python 3 str type, you store text as raw bytes (in a custom format, as surrogate characters).

> I'm not entirely convinced this won't cause issues, but I suppose it might not cause any more issues that having things break due to the C locale does.

The most obvious issue is the come back of mojibake. Since you manipulate raw bytes, it's easy to concatenate two bytes strings encoded to two different encodings.

The problem is that the question is not how bad it is use to manipulate text as bytes. The problem is that a working application written for Python 2 starts to randomly fail (on non-ASCII characters) on Python 3 when the LC_CTYPE locale is the POSIX locale ("C"). The first question is: should I keep Python 2 or write my application in a language which doesn't force me to understand Unicode?
Date User Action Args
2015-04-18 22:29:23vstinnersetrecipients: + vstinner, ncoghlan, ezio.melotti, r.david.murray
2015-04-18 22:29:23vstinnersetmessageid: <>
2015-04-18 22:29:23vstinnerlinkissue23993 messages
2015-04-18 22:29:23vstinnercreate