Author vstinner
Recipients Sworddragon, a.badger, bkabrda, ezio.melotti, ishimoto, jwilk, larry, loewis, martin.panter, ncoghlan, pitrou, python-dev, r.david.murray, serhiy.storchaka, vstinner
Date 2014-04-27.23:57:39
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1398643059.92.0.0558590506668.issue19977@psf.upfronthosting.co.za>
In-reply-to
Content
> We should not overcomplicate this. I suggest that we simply use utf-8 under the C locale.

Please open a new issue if you would prefer UTF-8. You will have to solve different technical issues. I tried to list some of them in issues #19846 and #19847.

In short, you should always decode and encode "OS data" with the same encoding. Python "file system encoding" is the locale encoding because in some places, PyUnicode_DecodeLocale[AndSize]() is used (ex: to decode PYTHONWARNINGS environment variable). A common location is PyUnicode_DecodeFSDefaultAndSize() before the Python codec is loaded. See also _Py_wchar2char() and _Py_char2wchar() functions which use the locale encoding and are used in many places.

I'm now closing the issue because the initial point (use surrogateescape error handler) is implemented in Python 3.5, and backporting such major change in Python 3.4 branch is risky right now.
History
Date User Action Args
2014-04-27 23:57:40vstinnersetrecipients: + vstinner, loewis, ishimoto, ncoghlan, pitrou, larry, jwilk, ezio.melotti, a.badger, r.david.murray, Sworddragon, python-dev, martin.panter, serhiy.storchaka, bkabrda
2014-04-27 23:57:39vstinnersetmessageid: <1398643059.92.0.0558590506668.issue19977@psf.upfronthosting.co.za>
2014-04-27 23:57:39vstinnerlinkissue19977 messages
2014-04-27 23:57:39vstinnercreate