Message 217332 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	Sworddragon, a.badger, bkabrda, ezio.melotti, ishimoto, jwilk, larry, loewis, martin.panter, ncoghlan, pitrou, python-dev, r.david.murray, serhiy.storchaka, vstinner
Date	2014-04-27.23:57:39
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1398643059.92.0.0558590506668.issue19977@psf.upfronthosting.co.za>
In-reply-to

Content
> We should not overcomplicate this. I suggest that we simply use utf-8 under the C locale. Please open a new issue if you would prefer UTF-8. You will have to solve different technical issues. I tried to list some of them in issues #19846 and #19847. In short, you should always decode and encode "OS data" with the same encoding. Python "file system encoding" is the locale encoding because in some places, PyUnicode_DecodeLocale[AndSize]() is used (ex: to decode PYTHONWARNINGS environment variable). A common location is PyUnicode_DecodeFSDefaultAndSize() before the Python codec is loaded. See also _Py_wchar2char() and _Py_char2wchar() functions which use the locale encoding and are used in many places. I'm now closing the issue because the initial point (use surrogateescape error handler) is implemented in Python 3.5, and backporting such major change in Python 3.4 branch is risky right now.

> We should not overcomplicate this. I suggest that we simply use utf-8 under the C locale.

Please open a new issue if you would prefer UTF-8. You will have to solve different technical issues. I tried to list some of them in issues #19846 and #19847.

In short, you should always decode and encode "OS data" with the same encoding. Python "file system encoding" is the locale encoding because in some places, PyUnicode_DecodeLocale[AndSize]() is used (ex: to decode PYTHONWARNINGS environment variable). A common location is PyUnicode_DecodeFSDefaultAndSize() before the Python codec is loaded. See also _Py_wchar2char() and _Py_char2wchar() functions which use the locale encoding and are used in many places.

I'm now closing the issue because the initial point (use surrogateescape error handler) is implemented in Python 3.5, and backporting such major change in Python 3.4 branch is risky right now.

History
Date	User	Action	Args
2014-04-27 23:57:40	vstinner	set	recipients: + vstinner, loewis, ishimoto, ncoghlan, pitrou, larry, jwilk, ezio.melotti, a.badger, r.david.murray, Sworddragon, python-dev, martin.panter, serhiy.storchaka, bkabrda
2014-04-27 23:57:39	vstinner	set	messageid: <1398643059.92.0.0558590506668.issue19977@psf.upfronthosting.co.za>
2014-04-27 23:57:39	vstinner	link	issue19977 messages
2014-04-27 23:57:39	vstinner	create