classification
Title: Mac OS X: don't use the locale encoding but UTF-8 to encode and decode filenames
Type: Stage:
Components: Macintosh Versions: Python 3.4, Python 3.2, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: haypo Nosy List: asvetlov, haypo, pitrou, python-dev, ronaldoussoren, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2012-11-05 22:18 by haypo, last changed 2012-12-03 13:14 by haypo. This issue is now closed.

Files
File name Uploaded Description Edit
macosx.patch haypo, 2012-11-05 22:18 review
macosx-2.patch haypo, 2012-11-12 13:32 review
Messages (13)
msg174943 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-11-05 22:18
Since the changeset 45079ad1e260 (issue #4388), command line arguments are decoded from UTF-8 instead of the locale encoding. Functions of Python/fileutils.c are still using the locale encoding.

It does not work: see issue #16218. On Mac OS X, in the command line "python script.py", the filename "script.py" is decoded from UTF-8 (by _Py_DecodeUTF8_surrogateescape) but then it is passed to _Py_fopen() which encodes the filename to the locale encoding (ex: ISO-8859-1 if $LANG, $LC_CTYPE and $LC_ALL environment variables are not set). The result is mojibake and Python fails to open the script.

Attached patch modifies function of Python/fileutils.c to use UTF-8 to encode and decode filenames, instead of the locale encoding on Mac OS X.

I don't know yet if Module/getpath.c should also decode paths from UTF-8 instead of the locale encoding on Mac OS X. We may expose _Py_decode_filename().
msg175441 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-11-12 13:32
macosx-2.patch patches _Py_wchar2char() and _Py_char2wchar() functions to
use UTF-8/surrogateescape for any function using the locale encoding, not
only file related functions of fileutils.h. The patch does also simplify
the code, no more specific #ifdef __APPLE__ in python.c:

-#ifdef __APPLE__
-        argv_copy[i] = _Py_DecodeUTF8_surrogateescape(argv[i],
strlen(argv[i]));
-#else
         argv_copy[i] = _Py_char2wchar(argv[i], NULL);
-#endif

2012/11/7 Andrew Svetlov <report@bugs.python.org>

>
> Changes by Andrew Svetlov <andrew.svetlov@gmail.com>:
>
>
> ----------
> nosy: +asvetlov
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue16416>
> _______________________________________
>
msg175477 - (view) Author: Roundup Robot (python-dev) Date: 2012-11-12 22:03
New changeset 48fbdaf3a849 by Victor Stinner in branch 'default':
Issue #16416: OS data are now always encoded/decoded to/from
http://hg.python.org/cpython/rev/48fbdaf3a849
msg175478 - (view) Author: Roundup Robot (python-dev) Date: 2012-11-12 22:48
New changeset f3e512b5ffb3 by Victor Stinner in branch 'default':
Issue #16416: Fix error handling in _Py_wchar2char() _Py_char2wchar() functions
http://hg.python.org/cpython/rev/f3e512b5ffb3
msg175479 - (view) Author: Roundup Robot (python-dev) Date: 2012-11-12 23:06
New changeset 1b97cc71a05e by Victor Stinner in branch 'default':
Issue #16416: Fix Misc/NEWS entry, mention Mac OS X
http://hg.python.org/cpython/rev/1b97cc71a05e
msg175480 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-11-12 23:07
@Serhiy: Thanks for your review, I missed it before my first commit.
msg175481 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-12 23:15
Victor, are you going to backport this to 3.3?
msg175482 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-11-12 23:18
> Victor, are you going to backport this to 3.3?

I'm waiting for the result of the buildbots, and maybe also the fix for the issue #16455 (which has an impact on tests on undecodable bytes).
msg176490 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-11-27 20:19
Victor, could you please backport to 3.3?
msg176797 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-12-02 16:08
Ping.
msg176836 - (view) Author: Roundup Robot (python-dev) Date: 2012-12-03 11:49
New changeset c838c9b117f1 by Victor Stinner in branch '3.2':
Issue #16416: On Mac OS X, operating system data are now always
http://hg.python.org/cpython/rev/c838c9b117f1

New changeset 26c4748351cb by Victor Stinner in branch '3.3':
(Merge 3.2) Issue #16416: On Mac OS X, operating system data are now always
http://hg.python.org/cpython/rev/26c4748351cb
msg176840 - (view) Author: Roundup Robot (python-dev) Date: 2012-12-03 13:13
New changeset af6fd3ca6de9 by Victor Stinner in branch '3.2':
Issue #16416: Fix compilation error
http://hg.python.org/cpython/rev/af6fd3ca6de9
msg176841 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-12-03 13:14
The issue should now be fixed in Python 3.2, 3.3 and 3.4.
History
Date User Action Args
2012-12-03 13:14:59hayposetstatus: open -> closed
resolution: fixed
messages: + msg176841

versions: + Python 3.2, Python 3.3
2012-12-03 13:13:13python-devsetmessages: + msg176840
2012-12-03 11:49:13python-devsetmessages: + msg176836
2012-12-02 16:08:32pitrousetmessages: + msg176797
2012-11-27 20:19:54pitrousetassignee: ronaldoussoren -> haypo

messages: + msg176490
nosy: + pitrou
2012-11-12 23:18:23hayposetmessages: + msg175482
2012-11-12 23:15:05serhiy.storchakasetmessages: + msg175481
2012-11-12 23:07:43hayposetmessages: + msg175480
2012-11-12 23:06:18python-devsetmessages: + msg175479
2012-11-12 22:48:34python-devsetmessages: + msg175478
2012-11-12 22:03:44python-devsetnosy: + python-dev
messages: + msg175477
2012-11-12 13:32:54hayposetfiles: + macosx-2.patch

messages: + msg175441
2012-11-07 11:58:43asvetlovsetnosy: + asvetlov
2012-11-05 22:20:18hayposetnosy: + serhiy.storchaka
2012-11-05 22:18:23haypocreate