Author vstinner
Recipients Sworddragon, larry, lemburg, loewis, ncoghlan, pitrou, r.david.murray, serhiy.storchaka, terry.reedy, vstinner
Date 2013-12-08.22:22:15
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1386541336.36.0.656118740651.issue19846@psf.upfronthosting.co.za>
In-reply-to
Content
>> Or said differently, the filesystem encoding is different than the
>> locale encoding.

> Indeed, but the FS encoding and the IO encoding are the same.
> "locale encoding" doesn't really matter here, as we are assuming that
> it's wrong.

Oh, I realized that "FS encoding" term in not clear. When I wrote "FS encoding", I mean sys.getfilesystemencoding() which is mbcs on Windows, UTF-8 on Mac OS X and (currently) the locale encoding on other platforms (UNIX, ex: Linux/FreeBSD/Solaris/AIX).

--

IMO there are two different points in this issue:

(a) which encoding should be used when the C locale is used: the encoding announced by the OS using nl_langinfo(CODESET) (current choice) or use an arbitrary optimistic "utf-8" encoding?

(b) for technical reasons, Python reuses the C codec during Python initialization to decode and encode OS data, and so currently Python *must* use the locale encoding for its "filesystem encoding"

Before being able to pronounce me on the point (a), I would like to see a patch fixing the point (b). I'm not against fixing point (b). I'm just saying that it's not trivial and obviously it must be fixed to change the status of point (a). I even gave clues to fix point (b).

--

asciilocale.patch has many issues. Try to run the Python test suite using this patch to see what I mean. Example of failures:

======================================================================
FAIL: test_non_ascii (test.test_cmd_line.CmdLineTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/haypo/prog/python/default/Lib/test/test_cmd_line.py", line 140, in test_non_ascii
    assert_python_ok('-c', command)
  File "/home/haypo/prog/python/default/Lib/test/script_helper.py", line 69, in assert_python_ok
    return _assert_python(True, *args, **env_vars)
  File "/home/haypo/prog/python/default/Lib/test/script_helper.py", line 55, in _assert_python
    "stderr follows:\n%s" % (rc, err.decode('ascii', 'ignore')))
AssertionError: Process return code is 1, stderr follows:
Unable to decode the command from the command line:
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc3' in position 12: surrogates not allowed

======================================================================
FAIL: test_ioencoding_nonascii (test.test_sys.SysModuleTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/haypo/prog/python/default/Lib/test/test_sys.py", line 603, in test_ioencoding_nonascii
    self.assertEqual(out, os.fsencode(test.support.FS_NONASCII))
AssertionError: b'' != b'\xc3\xa6'

======================================================================
FAIL: test_nonascii (test.test_warnings.CEnvironmentVariableTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/haypo/prog/python/default/Lib/test/test_warnings.py", line 774, in test_nonascii
    "['ignore:Deprecaci\xf3nWarning']".encode('utf-8'))
AssertionError: b"['ignore:Deprecaci\\udcc3\\udcb3nWarning']" != b"['ignore:Deprecaci\xc3\xb3nWarning']"

======================================================================
FAIL: test_nonascii (test.test_warnings.PyEnvironmentVariableTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/haypo/prog/python/default/Lib/test/test_warnings.py", line 774, in test_nonascii
    "['ignore:Deprecaci\xf3nWarning']".encode('utf-8'))
AssertionError: b"['ignore:Deprecaci\\udcc3\\udcb3nWarning']" != b"['ignore:Deprecaci\xc3\xb3nWarning']"


test_warnings is probably #9988, test_cmd_line failure is maybe #9992.

There are maybe other issues, the Python test suite only have a few tests for non-ASCII characters.

--

If anything is changed, I would prefer to have more than a few months of test to make sure that it doesn't break anything. So I set the version field to Python 3.5.
History
Date User Action Args
2013-12-08 22:22:16vstinnersetrecipients: + vstinner, lemburg, loewis, terry.reedy, ncoghlan, pitrou, larry, r.david.murray, Sworddragon, serhiy.storchaka
2013-12-08 22:22:16vstinnersetmessageid: <1386541336.36.0.656118740651.issue19846@psf.upfronthosting.co.za>
2013-12-08 22:22:16vstinnerlinkissue19846 messages
2013-12-08 22:22:15vstinnercreate