classification
Title: sys.path[0] is incorrect if PYTHONFSENCODING is used
Type: Stage:
Components: Unicode Versions: Python 3.2
process
Status: closed Resolution: fixed
Dependencies: 10039 Superseder:
Assigned To: Nosy List: vstinner
Priority: normal Keywords: patch

Created on 2010-10-02 12:14 by vstinner, last changed 2010-10-13 22:25 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
realpath_fs_encoding-2.patch vstinner, 2010-10-07 23:10
Messages (7)
msg117870 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-10-02 12:14
In the following example, sys.path[0] should be '/home/SHARE/SVN/py3k\udcc3\udca9' (my locale and filesystem encodings are utf-8):

$ cd /home/SHARE/SVN/py3ké
$ echo "import sys; print(sys.path[0])" > x.py
$ ./python x.py
/home/SHARE/SVN/py3ké
$ PYTHONFSENCODING=ascii ./python x.py
/home/SHARE/SVN/py3ké

The problem is that PySys_SetArgvEx() inserts argv[0] at sys.path[0], but argv[0] is decoded using the locale encoding (by _Py_char2wchar() in main()), whereas paths of sys.path are supposed to be encodable (and decoded) by sys.getfilesystemencoding().

argv array should be decoded using the filesystem encoding (see issue #9992) or argv[0] should be redecoded (encode to the locale encoding, and decode from the filesystem encoding, see issue #9630).
msg118088 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-10-06 23:37
> The problem is that PySys_SetArgvEx() ...

Not only PySys_SetArgvEx(). There is another issue with RunMainFromImporter() which do: sys.path[0] = filename
msg118090 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-10-07 01:26
See also issue #10039.
msg118101 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-10-07 11:33
This issue depends on issue #10039.
msg118102 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-10-07 11:41
r85302: _wrealpath() and _Py_wreadlink() support surrogates in the input path.

--

realpath_fs_encoding.patch: patch _wrealpath() to encode the resulting path with the filesystem encoding (with surrogateescape) instead of the locale encoding. This patch is incomplete: it doesn't fix the issue for non-Windows platforms without the realpath() function.

redecode_filename.patch (from issue #10039) + realpath_fs_encoding.patch fix this issue on my Linux (Debian Sid) box.
msg118151 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-10-07 23:10
I just created Python/fileutils.c: update the patch for this new file.
msg118595 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-10-13 22:20
Fixed by r85430 (remove PYTHONFSENCODING), see #9992.
History
Date User Action Args
2010-10-13 22:25:10vstinnersetstatus: open -> closed
2010-10-13 22:20:25vstinnersetresolution: fixed
messages: + msg118595
2010-10-07 23:11:05vstinnersetfiles: - realpath_fs_encoding.patch
2010-10-07 23:10:58vstinnersetfiles: + realpath_fs_encoding-2.patch

messages: + msg118151
2010-10-07 11:41:50vstinnersetfiles: + realpath_fs_encoding.patch
keywords: + patch
messages: + msg118102
2010-10-07 11:33:05vstinnersetdependencies: + python é.py fails with UnicodeEncodeError if PYTHONFSENCODING is used
messages: + msg118101
2010-10-07 01:26:20vstinnersetmessages: + msg118090
2010-10-06 23:37:41vstinnersetmessages: + msg118088
2010-10-02 12:14:21vstinnercreate