classification
Title: Python3: use ASCII for the file system encoding on initfsencoding() failure
Type: Stage:
Components: Interpreter Core, Unicode Versions: Python 3.2
process
Status: closed Resolution: not a bug
Dependencies: 8611 8715 Superseder:
Assigned To: Nosy List: Arfrever, lemburg, loewis, pitrou, vstinner
Priority: normal Keywords: patch

Created on 2010-05-15 12:39 by vstinner, last changed 2010-10-19 23:55 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
fsencoding_ascii-2.patch vstinner, 2010-05-16 01:13
Messages (5)
msg105804 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-15 12:39
I introduced initfsencoding() in #8610 to ensure that Py_FileSystemEncoding is not more NULL. In the discussion, Marc Lemburg noticed that falling back the UTF-8 on nl_langinfo(CODESET) error is a bad idea: ASCII is better (I agree).

We cannot fall back to ASCII yet because there are two other problems that have to be fixed before that:

 - Python3 doesn't support surrogates in module filenames: see #8611
 - If Py_FileSystemEncoding is NULL, encoding functions fallback to utf-8 (PyUnicode_GetDefaultEncoding()). #8715 proposes a new PyUnicode_EncodeFSDefault() function to fix this problem

Attached patch is a partial fix for this issue.
msg105820 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-15 16:34
PyUnicode_AsEncodedString() contains a special path for the file system encoding. I don't think that it is still needed, but I don't know how to check that. => read msg105810
msg105842 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-16 01:13
Version 2:
 - #8715 has been commited: patch PyUnicode_EncodeFSDefault()
 - fix the documentation according the changes
msg111758 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-07-28 01:29
I tried the patch on my import_unicode branch and it doesn't work if the locale encoding is not ASCII (as the current code doesn't work if the locale encoding is not UTF-8, #8611).

If Py_FileSystemUnicodeEncoding is NULL: PyUnicode_EncodeFSDefault() should use mbcstowcs() and PyUnicode_DecodeFSDefault() should use wcstombcs(). They may reuse _Py_wchar2char() and _Py_char2wchar().

"ascii" should be used in initfsencoding().
msg119180 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-10-19 23:55
initfsencoding() now raises a fatal error on get_codeset() error. Use a encoding different than the locale encoding on get_codeset() only leads to mojibake and encoding issues, it's not a good idea. Close this issue as invalid.
History
Date User Action Args
2010-10-19 23:55:25vstinnersetstatus: open -> closed
resolution: not a bug
messages: + msg119180
2010-07-28 01:29:23vstinnersetmessages: + msg111758
2010-05-16 01:13:13vstinnersetfiles: - fsencoding_ascii.patch
2010-05-16 01:13:07vstinnersetfiles: + fsencoding_ascii-2.patch

messages: + msg105842
2010-05-15 16:34:19vstinnersetmessages: + msg105820
2010-05-15 12:40:20vstinnersetnosy: + lemburg, loewis, pitrou, Arfrever
dependencies: + Python3 doesn't support locale different than utf8 and an non-ASCII path (POSIX), Create PyUnicode_EncodeFSDefault() function
2010-05-15 12:39:05vstinnercreate