This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Encoding issues with the locale encoding
Type: Stage: resolved
Components: Unicode Versions: Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, vstinner
Priority: normal Keywords: patch

Created on 2018-01-15 11:20 by vstinner, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 5193 merged vstinner, 2018-01-15 16:52
Messages (6)
msg309963 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-01-15 11:20
Python 3.6 doesn't use the right encoding in os.strerror(), time.stftime(), locale.localeconv(), time.tzname, etc. on macOS, FreeBSD and other platforms.

See my fix locale encodings in bpo-29240: commit 7ed7aead9503102d2ed316175f198104e0cd674c, and test_all_locales.py attached to bpo-29240.

See also the bpo-31900 for locale.localeconv() encoding issue when LC_NUMERIC encoding is different than the LC_CTYPE encoding.
msg309964 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-01-15 11:21
I'm not sure that locale.bindtextdomain() uses the right encoding neither. I propose the following fix:

diff --git a/Modules/_localemodule.c b/Modules/_localemodule.c
index 324b694b83..1de17d3620 100644
--- a/Modules/_localemodule.c
+++ b/Modules/_localemodule.c
@@ -555,7 +555,7 @@ PyIntl_bindtextdomain(PyObject* self,PyObject*args)
         PyErr_SetFromErrno(PyExc_OSError);
         return NULL;
     }
-    result = PyUnicode_DecodeLocale(current_dirname, NULL);
+    result = PyUnicode_DecodeFSDefault(current_dirname);
     Py_XDECREF(dirname_bytes);
     return result;
 }
msg309965 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-01-15 11:25
Another issue: _Py_DecodeUTF8Ex() creates surrogate pairs with 16-bit wchar_t (on Windows), whereas input bytes should be escaped. I'm quite sure that it's a bug.
msg309994 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-01-15 16:08
Example of bug on FreeBSD 11:

haypo@freebsd$ LC_ALL=C ./python -c 'import locale, os; locale.setlocale(locale.LC_ALL, "fr_FR.ISO8859-1"); print(ascii(os.strerror(2)))'

'Fichier ou r\udce9pertoire inexistant'

Expected result:

haypo@freebsd$ LC_ALL=fr_FR.ISO8859-1 ./python -c 'import locale, os; locale.setlocale(locale.LC_ALL, ""); print(ascii(os.strerror(2)))'

'Fichier ou r\xe9pertoire inexistant'
msg310021 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-01-15 22:43
New changeset b92c159efada05b3a5ff9d0dbce3fcb2334631f6 by Victor Stinner in branch '3.6':
[3.6] bpo-32555: Fix locale encodings (#5193)
https://github.com/python/cpython/commit/b92c159efada05b3a5ff9d0dbce3fcb2334631f6
msg320156 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-06-21 10:58
freebsd$ LC_ALL=fr_FR.ISO8859-1 ./python -c 'import locale, os; locale.setlocale(locale.LC_ALL, ""); print(ascii(os.strerror(2)))'
'Fichier ou r\xe9pertoire inexistant'

I ran manually this test on FreeBSD: it pass on Python 3.6, 3.7 and master. I close the issue.


> Another issue: _Py_DecodeUTF8Ex() creates surrogate pairs with 16-bit wchar_t (on Windows), whereas input bytes should be escaped. I'm quite sure that it's a bug.

I created bpo-33928 for that one.
History
Date User Action Args
2022-04-11 14:58:56adminsetgithub: 76736
2018-06-21 10:58:44vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg320156

stage: patch review -> resolved
2018-01-15 22:43:27vstinnersetmessages: + msg310021
2018-01-15 16:52:15vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request5047
2018-01-15 16:08:30vstinnersetmessages: + msg309994
2018-01-15 11:25:43vstinnersetmessages: + msg309965
2018-01-15 11:21:24vstinnersetmessages: + msg309964
2018-01-15 11:20:42vstinnercreate