classification
Title: Python launcher does not support unicode characters
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.4, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: asvetlov, ezio.melotti, gklein, haypo, jcea, jkloth, koobs, pitrou, python-dev, serhiy.storchaka, skrah, tim.golden, turncc
Priority: normal Keywords: 3.3regression, patch

Created on 2012-10-13 14:24 by turncc, last changed 2016-06-22 19:18 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
pythonrun_filename_decoding.patch serhiy.storchaka, 2012-10-20 07:55 review
pythonrun_filename_decoding_2.patch serhiy.storchaka, 2012-10-24 23:54 review
pythonrun_filename_decoding_test.patch serhiy.storchaka, 2012-11-02 18:12 Fix the test review
pythonrun_filename_decoding_test_2.patch serhiy.storchaka, 2012-11-03 13:43 review
test_non_ascii.patch haypo, 2012-11-04 23:35 review
Messages (65)
msg172807 - (view) Author: Turn (turncc) Date: 2012-10-13 14:24
If there are non ASCII character in the py.exe arguments, the execution will fail. The script file name or path may contain non ASCII characters.
msg173359 - (view) Author: Tim Golden (tim.golden) * (Python committer) Date: 2012-10-19 19:48
Confirming that this doesn't happen on 2.7

py -2 £.py succeeds
py -3 £.py gives:

 python: failed to set __main__.__loader__
msg173373 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-20 07:25
I can reproduce this on Linux (3.3+ only):

$ name=$(printf "\xff")
$ echo "print('Hello, world')" >$name
$ ./python $name
python: failed to set __main__.__loader__

The issue is in PyRun_SimpleFileExFlags() function, which gets raw char * as the file name (the documentation says about the filesystem encoding (sys.getfilesystemencoding())), but then this name decoded from UTF-8 in set_main_loader().
msg173374 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-20 07:55
Here is a patch which fixes filename decoding error in PyRun_SimpleFileExFlags().
msg173376 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-10-20 09:16
The patch looks correct, but a test is missing.
msg173382 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-20 10:31
Where we have tests for Python launch? I can't find. runpy is not affected.
msg173724 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-24 23:54
Test added.
msg174408 - (view) Author: Roundup Robot (python-dev) Date: 2012-11-01 12:52
New changeset 02d25098ad57 by Andrew Svetlov in branch '3.3':
Issue #16218: Support non ascii characters in python launcher.
http://hg.python.org/cpython/rev/02d25098ad57

New changeset 1267d64c14b3 by Andrew Svetlov in branch 'default':
Merge issue #16218: Support non ascii characters in python launcher.
http://hg.python.org/cpython/rev/1267d64c14b3
msg174409 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2012-11-01 12:52
Fixed. Thanks, Serhiy.
msg174427 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2012-11-01 16:23
I'm not especially familiar with this code, but just trying to understand - how come filename_obj isn't decref'd on normal exit?
msg174430 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2012-11-01 16:37
Vinay, it's processed in 
PyObject_CallFunction(loader_type, "sN", "__main__", filename_obj)
Please note "sN" format istead "sO".
"N" means PyObject* is passed but unlike "sO" that object is not increfed.
msg174433 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2012-11-01 17:21
> Please note "sN" format istead "sO".

I see. Thanks.
msg174521 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-11-02 14:34
Some of the buildbots are failing with the new test:

======================================================================
FAIL: test_non_utf8 (test.test_cmd_line_script.CmdLineTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Lib/test/test_cmd_line_script.py", line 373, in test_non_utf8
    importlib.machinery.SourceFileLoader)
  File "/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Lib/test/test_cmd_line_script.py", line 126, in _check_script
    rc, out, err = assert_python_ok(*run_args)
  File "/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Lib/test/script_helper.py", line 54, in assert_python_ok
    return _assert_python(True, *args, **env_vars)
  File "/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Lib/test/script_helper.py", line 46, in _assert_python
    "stderr follows:\n%s" % (rc, err.decode('ascii', 'ignore')))
AssertionError: Process return code is 1, stderr follows:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 15-20: ordinal not in range(128)

----------------------------------------------------------------------
Ran 23 tests in 8.959s
msg174529 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2012-11-02 14:51
Reopening bug.

Quite a few buildbots are failing with this patch. Please, commit a new version or revert.
msg174531 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2012-11-02 14:57
I see. Sorry, my fault. 
Give me weekend to figure out why it fails.
Thanks.
msg174549 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-02 18:12
I was not able to reproduce this error, I got other errors. The issue not in Python interpreter, the test is broken. Here is a patch that might solve the issue on some platforms (need to test on Windows).

I guess failing of all command line tests when the path to temporary directory contains non-ascii.
msg174560 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-11-02 19:33
Serhiy, your original example from msg173373 still fails on
FreeBSD:

$ name=$(printf "\xff")
$ echo "print('Hello, world')" >$name
$ ./python $name
UnicodeEncodeError: 'ascii' codec can't encode character '\xff' in position 0: ordinal not in range(128)
[41257 refs]
msg174568 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-02 20:29
> Serhiy, your original example from msg173373 still fails on
> FreeBSD:

Thank you for a report. I have not any ideas what happened (note that
error on encoding, not decoding). Can you please show me the results of
sys.getdefaultencoding(), sys.getfilesystemencoding(),
locale.getpreferredencoding(True), locale.getpreferredencoding(False),
the output of locale command?
msg174571 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-11-02 20:40
This is it:

>>> 
>>> sys.getdefaultencoding()
'utf-8'
>>> sys.getfilesystemencoding()
'ascii'
>>> locale.getpreferredencoding(True)
'US-ASCII'
>>> locale.getpreferredencoding(False)
'US-ASCII'
>>> 

$ locale
LANG=
LC_CTYPE="C"
LC_COLLATE="C"
LC_TIME="C"
LC_NUMERIC="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_ALL=
msg174573 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2012-11-02 20:51
Perhaps we have to skip tests if filesystem encoding doesn't support wide characters.
Not sure about the way: should we skip if sys.getfilesystemencoding() is not utf8 or better to try encode path and skip if it fails?
I think the later is better.
msg174577 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-11-02 21:03
On FreeBSD both Serhiy's original test case as well as the unit test work
if the locale is ISO8859-15:

>>> sys.getdefaultencoding()
'utf-8'
>>> sys.getfilesystemencoding()
'iso8859-15'
>>> locale.getpreferredencoding(True)
'ISO8859-15'
>>> locale.getpreferredencoding(False)
'ISO8859-15'

Naturally, if the locale is utf-8 the test works as well.
msg174581 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2012-11-02 21:17
Looking on the last message from Stefan I think we have to check cmdpath to be encoded via sys.getfilesystemencoding() first and skip test if fails.
msg174587 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-11-02 21:36
That sounds good for Unix.


For Windows I'm getting a more informative error message than from the
buildbot output if I run the test via an ssh client:

======================================================================                                      
FAIL: test_non_utf8 (test.test_cmd_line_script.CmdLineTest)                                                 
----------------------------------------------------------------------                                      
Traceback (most recent call last):                                                                          
  File "C:\Users\stefan\pydev\cpython\lib\test\test_cmd_line_script.py", line 373, in test_non_utf8         
    importlib.machinery.SourceFileLoader)                                                                   
  File "C:\Users\stefan\pydev\cpython\lib\test\test_cmd_line_script.py", line 129, in _check_script         
    expected_package, expected_loader)                                                                      
  File "C:\Users\stefan\pydev\cpython\lib\test\test_cmd_line_script.py", line 113, in _check_output         
    self.assertIn(printed_file.encode('utf-8'), data)                                                       
AssertionError: b"__file__=='c:\\\\users\\\\stefan\\\\appdata\\\\local\\\\temp\\\\tmpr6shx4\\\\\\udcf1\\udce
a\\udcf0\\udce8\\udcef\\udcf2.py'" not found in b"__loader__==<class '_frozen_importlib.SourceFileLoader'>\r
\n__file__=='<encoding error>'\r\n__package__==None\r\nsys.argv[0]=='c:\\\\users\\\\stefan\\\\appdata\\\\loc
al\\\\temp\\\\tmpr6shx4\\\\\\udcf1\\udcea\\udcf0\\udce8\\udcef\\udcf2.py'\r\nsys.path[0]=='c:\\\\users\\\\st
efan\\\\appdata\\\\local\\\\temp\\\\tmpr6shx4'\r\ncwd=='C:\\\\Users\\\\stefan\\\\pydev\\\\cpython\\\\build\\
\\test_python_2424'\r\n"                 



It looks to me as if on Windows perhaps some utf-8 encoding steps should
be skipped because the file name *is* unicode on Windows.
msg174588 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2012-11-02 21:43
I will fix it tomorrow at Kiev Python sprint.
msg174590 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-02 21:54
> For Windows I'm getting a more informative error message than from the
> buildbot output if I run the test via an ssh client:

Try with my last patch (pythonrun_filename_decoding_test.patch). It
fixes also fail on Linux with 8-bit locale.

$ LC_ALL=en_US.ISO-8859-1 LANG=en_US.ISO-8859-1 LANGUAGE= ./python -m
test -m test_non_utf8 test_cmd_line_script
[1/1] test_cmd_line_script
test test_cmd_line_script failed -- Traceback (most recent call last):
  File "/home/serhiy/py/cpython/Lib/test/test_cmd_line_script.py", line
373, in test_non_utf8
    importlib.machinery.SourceFileLoader)
  File "/home/serhiy/py/cpython/Lib/test/test_cmd_line_script.py", line
129, in _check_script
    expected_package, expected_loader)
  File "/home/serhiy/py/cpython/Lib/test/test_cmd_line_script.py", line
113, in _check_output
    self.assertIn(printed_file.encode('utf-8'), data)
AssertionError: b"__file__=='/tmp/tmpda64hd/\\udcf1\\udcea\\udcf0\\udce8
\\udcef\\udcf2.py'" not found in b"__loader__==<class
'_frozen_importlib.SourceFileLoader'>\n__file__=='/tmp/tmpda64hd/\\xf1\
\xea\\xf0\\xe8\\xef\\xf2.py'\n__package__==None
\nsys.argv[0]=='/tmp/tmpda64hd/\\xf1\\xea\\xf0\\xe8\\xef\
\xf2.py'\nsys.path[0]=='/tmp/tmpda64hd'\ncwd=='/home/serhiy/py/cpython/build/test_python_3546'\n"
msg174595 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-11-02 22:53
Serhiy Storchaka <report@bugs.python.org> wrote:
> Try with my last patch (pythonrun_filename_decoding_test.patch). It
> fixes also fail on Linux with 8-bit locale.

Unfortunately your last patch does not work on Windows. -- I'm too lazy
to step through the domain specific language of test_cmd_line_script.py.
Is this what is supposed to be tested:

Python 3.4.0a0 (default:b2bd62d1644f+, Nov  2 2012, 22:56:48) [MSC v.1600 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> s = '\udcf1\udcea\udcf0\udce8\udcef\udcf2'
>>> f = open(s, "w")
>>> f.write('print("hello world")\n')
>>> f.close()

C:\Users\stefan\pydev\cpython>PCbuild\amd64\python_d.exe ïïïïï�
hello world

Because that just works without the complex test machinery. :)
msg174603 - (view) Author: Roundup Robot (python-dev) Date: 2012-11-03 10:50
New changeset 884c2e93d3f7 by Andrew Svetlov in branch 'default':
Issue #16218: Fix broken test for supporting nonascii characters in python launcher
http://hg.python.org/cpython/rev/884c2e93d3f7
msg174604 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2012-11-03 10:51
I like to follow Stefan suggestion.
New test is simple and it works.
msg174606 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-11-03 11:23
I think this is what went wrong on Windows in the previous test (see
Lib/test/test_cmd_line_script.py:43):

>>> s = '\udcf1\udcea\udcf0\udce8\udcef\udcf2'
>>> f = open(s, "w")
>>> f.write("print('%s\\n' % __file__)")
>>> f.close()

C:\Users\stefan\pydev\cpython>PCbuild\amd64\python_d.exe ïïïïï�
<encoding error>

So __file__ isn't set correctly, which looks like a bug to me. I'm not sure
whether it should be part of this issue or if we should open a new one.
msg174611 - (view) Author: Roundup Robot (python-dev) Date: 2012-11-03 12:37
New changeset 95d1adf144ee by Andrew Svetlov in branch 'default':
Issue #16218: skip test if filesystem doesn't support required encoding
http://hg.python.org/cpython/rev/95d1adf144ee
msg174620 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-03 13:43
Andrew, you shod a flea.

1. Now the test skipped on non Cyrillic-compatible locales (such as en_US.ISO-8859-1).
2. On UTF-8 locale the test does not test the bug (it passed even without the patch).

Here is a new patch. It should fail on FreeBSD with ASCII locale (because there is a yet not fixed bug), and I don't know how it will behave on Windows. Temporary you can explicitly skip the test for such case:

    @unittest.skipIf(sys.platform.startswith('freebsd') and
                     sys.getfilesystemencoding() == 'ascii',
                     'skip on FreeBSD with ASCII filesystem encoding')
msg174841 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-11-04 23:13
test_cmd_line_script.test_non_utf8() is failing on Mac OS X since the changeset 95d1adf144ee.

======================================================================
FAIL: test_non_utf8 (test.test_cmd_line_script.CmdLineTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Volumes/bay2/buildslave/cpython/3.x.snakebite-mountainlion-amd64/build/Lib/test/test_cmd_line_script.py", line 381, in test_non_utf8
    rc, out, _ = assert_python_ok(*run_args)
  File "/Volumes/bay2/buildslave/cpython/3.x.snakebite-mountainlion-amd64/build/Lib/test/script_helper.py", line 54, in assert_python_ok
    return _assert_python(True, *args, **env_vars)
  File "/Volumes/bay2/buildslave/cpython/3.x.snakebite-mountainlion-amd64/build/Lib/test/script_helper.py", line 46, in _assert_python
    "stderr follows:\n%s" % (rc, err.decode('ascii', 'ignore')))
AssertionError: Process return code is 2, stderr follows:
/Volumes/bay2/buildslave/cpython/3.x.snakebite-mountainlion-amd64/build/python.exe: can't open file '<unprintable file name>': [Errno 92] Illegal byte sequence

http://buildbot.python.org/all/builders/AMD64%20Mountain%20Lion%20%5BSB%5D%203.x/builds/404/steps/test/logs/stdio
msg174842 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-11-04 23:18
> @unittest.skipIf(sys.platform.startswith('freebsd') and
>                  sys.getfilesystemencoding() == 'ascii',
>                  'skip on FreeBSD with ASCII filesystem encoding')

Such skip is not a good idea. Many OS uses the Latin1 encoding when the C locale is used (even if ASCII encoding is announced :-/): Solaris, FreeBSD, Mac OS X, etc.

pythonrun_filename_decoding_test_2.patch: 'surrogateescape' error handler is not used on Windows (and must not be used), whereas the initial issue was reported on Windows.
msg174844 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-11-04 23:35
I propose a test with a single non-ASCII character, which should be supported by more code pages/locale encodings. It checks also the value of __file__. I only ran the test on Linux with UTF-8 locale encoding.
msg174864 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2012-11-05 06:19
I like the last patch from Victor. It works on Windows also.
msg174865 - (view) Author: Roundup Robot (python-dev) Date: 2012-11-05 06:20
New changeset 56df0d4f0011 by Andrew Svetlov in branch 'default':
Issue #16218: Fix test for issue again
http://hg.python.org/cpython/rev/56df0d4f0011
msg174871 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-11-05 07:22
How does the test which has been committed even test the Python launcher? It only calls assert_python_ok(), which should use the regular Python interpreter.
msg174874 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2012-11-05 07:46
Well. Fix (and test) is related to bug in python itself (./Python/pythonrun.c) 
pylauncher should be tested also, you are right.
msg174876 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-05 07:55
Such test is not enough.

1. It skipped on locales which does not support "£" (cp1006, cp1250, cp1251, cp737, cp852, cp855, cp866, cp874, cp949, euc_kr, gb2312, gbk, hz, iso2022_kr, iso8859_10, iso8859_11, iso8859_16, iso8859_2, iso8859_4, iso8859_5, iso8859_6, johab, koi8_r, koi8_u, mac_arabic, mac_farsi, ptcp154, tis_620).  But the bug is actual on such locales.

2. It tests nothing on utf-8 locale (test passed even when bug is not fixed).

We should test every filename which can be used in file system, even if it can not be decoded using current locale or UTF-8 encoding.  On Unix filenames are bytes sequences and we should use non_ascii_bytes.decode(sys.getfilesystemencoding(), 'surrogateescape') as script name.  On Windows it possible will be chr(k) where k is minimal code > 127 such that chr(k).encode('mbcs') is not fails (I am not sure).
msg174877 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-11-05 08:07
> It tests nothing on utf-8 locale (test passed even when bug is not fixed).

The issue is about Windows and UTF-8 is never used as filesystem encoding on Windows.
msg174878 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-11-05 08:16
The test is still failing on Mac OS X:




======================================================================
FAIL: test_non_ascii (test.test_cmd_line_script.CmdLineTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Volumes/bay2/buildslave/cpython/3.x.snakebite-mountainlion-amd64/build/Lib/test/test_cmd_line_script.py", line 380, in test_non_ascii
    rc, stdout, stderr = assert_python_ok(script_name)
  File "/Volumes/bay2/buildslave/cpython/3.x.snakebite-mountainlion-amd64/build/Lib/test/script_helper.py", line 54, in assert_python_ok
    return _assert_python(True, *args, **env_vars)
  File "/Volumes/bay2/buildslave/cpython/3.x.snakebite-mountainlion-amd64/build/Lib/test/script_helper.py", line 46, in _assert_python
    "stderr follows:\n%s" % (rc, err.decode('ascii', 'ignore')))
AssertionError: Process return code is 2, stderr follows:
/Volumes/bay2/buildslave/cpython/3.x.snakebite-mountainlion-amd64/build/python.exe: can't open file './@test_63568_tmp.py': [Errno 2] No such file or directory

http://buildbot.python.org/all/builders/AMD64%20Mountain%20Lion%20%5BSB%5D%203.x/builds/410/steps/test/logs/stdio

--

If I remember correctly, the command line is always decoded from UTF-8/surrogateescape on Mac OS X. That's why we have the function _Py_DecodeUTF8_surrogateescape() (for bootstrap reasons).

Such example should not work if the locale encoding is not UTF-8 on Mac OS X:
---
arg = _Py_DecodeUTF8_surrogateescape(...);
filename = _Py_wchar2char(arg);
fp = fopen(filename, "r");
---

run_file() uses a different strategy:

        unicode = PyUnicode_FromWideChar(filename, wcslen(filename));
        if (unicode != NULL) {
            bytes = PyUnicode_EncodeFSDefault(unicode);
            Py_DECREF(unicode);
        }
        if (bytes != NULL)
            filename_str = PyBytes_AsString(bytes);
        else {
            PyErr_Clear();
            filename_str = "<encoding error>";
        }

run_file() looks to be right. Py_Main() should use similar code.

We should probably not encode and then decode the filename in each function, but this is another problem.
msg174881 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-05 08:50
> The issue is about Windows and UTF-8 is never used as filesystem encoding
> on Windows.

The issue exists on Linux as I reported in msg173373.
msg174898 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-11-05 12:12
"It skipped on locales which does not support "£" (cp1006, cp1250, cp1251, cp737, cp852, cp855, cp866, cp874, cp949, euc_kr, gb2312, gbk, hz, iso2022_kr, iso8859_10, iso8859_11, iso8859_16, iso8859_2, iso8859_4, iso8859_5, iso8859_6, johab, koi8_r, koi8_u, mac_arabic, mac_farsi, ptcp154, tis_620).  But the bug is actual on such locales."

This issue is not specific to this test: I create the issue #16414 to improve the situation.
msg174899 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-11-05 12:14
>>> It tests nothing on utf-8 locale (test passed even when bug is not fixed).
>> The issue is about Windows and UTF-8 is never used as filesystem encoding on Windows.
> The issue exists on Linux as I reported in msg173373.

I don't understand your problem. Non-ASCII filenames were already supported with UTF-8 locale encoding. The new test checks that there is no regression with UTF-8 locale encoding. The test pass without the fix because it was not supported.
msg174901 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-05 12:34
> Non-ASCII filenames were already supported with UTF-8 locale encoding.

Test the example in msg173373.  It fails without fix.
msg174944 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-11-05 22:20
I created the issue #16416 to fix the Mac OS X case.
msg175185 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-08 19:14
I think here should be used something like CommonTest.test_nonascii_abspath() in Lib/test/test_genericpath.py.
msg175255 - (view) Author: Kubilay Kocak (koobs) Date: 2012-11-10 01:16
If there's not another revision of the test patch in the wings, can 56df0d4f0011 also be applied to 3.3, as tests are still failing on at least koobs-freebsd and koobs-freebsd-clang buildbots.
msg175270 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-11-10 10:36
>> Non-ASCII filenames were already supported with UTF-8 locale encoding.
>
> Test the example in msg173373.  It fails without fix.

Oh, I didn't understand that, sorry. I created #16444 to test also UTF-8 locale encoding with undecodable filenames (undecodable from UTF-8 in *strict* mode, not by os.fsencode() which uses surrogateescape).
msg175273 - (view) Author: Roundup Robot (python-dev) Date: 2012-11-10 11:07
New changeset 6b8a8bc6ba9c by Victor Stinner in branch 'default':
Issue #16444, #16218: Use TESTFN_UNDECODABLE on UNIX
http://hg.python.org/cpython/rev/6b8a8bc6ba9c
msg175274 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-11-10 11:08
"If there's not another revision of the test patch in the wings, can 56df0d4f0011 also be applied to 3.3, as tests are still failing on at least koobs-freebsd and koobs-freebsd-clang buildbots."

I just applied the patch of the issue #16444. I will check 3.4 buildbots, and then backport to older Python versions (at least 3.3).
msg175290 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-11-10 17:20
> If there's not another revision of the test patch in the wings, can
> 56df0d4f0011 also be applied to 3.3, as tests are still failing on at
> least koobs-freebsd and koobs-freebsd-clang buildbots.

Let me insist on what koobs just said. The Windows buildbots are still 
broken on 3.3, so this either needs fixing or reverting.
msg175295 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2012-11-10 20:26
OpenIndiana 3.3 and 3.x buildbot broken too for a week.

I suggest to revert this patch and use the custom buildbots to "debug it" before committing again. A week, and counting, it is about time.

Feel free to hammer my OpenIndiana custom buildbots.
msg175414 - (view) Author: Roundup Robot (python-dev) Date: 2012-11-12 00:24
New changeset 6017f09ead53 by Victor Stinner in branch '3.3':
Issue #16218, #16444: Backport improvment on tests for non-ASCII characters
http://hg.python.org/cpython/rev/6017f09ead53
msg175435 - (view) Author: Kubilay Kocak (koobs) Date: 2012-11-12 10:58
Back to green for all branches on FreeBSD, thank you Victor
msg175436 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-11-12 11:07
The "Mountain Lion" bots still fail. :)
msg175437 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-11-12 11:14
> Back to green for all branches on FreeBSD, thank you Victor

FreeBSD buildbots are green because I disabled the test on undecodable bytes! See issue #16455 which proposes a fix for FreeBSD and OpenIndiana.

> The "Mountain Lion" bots still fail. :)

Yeah I know, see the issue #16416 which has a patch. I plan to commit it to 3.4, wait for buildbots, and then backport to 3.3.

--

Python 3.3 handles non-ASCII almost everywhere. Python 3.4 will probably handle non-ASCII everywhere.

Handling *undecodable* bytes is really hard. We cannot use the same code for UNIX and Windows. If we store data as bytes, it solves the issue, but we don't support any Unicode character on Windows anymore. If we store data as Unicode, it's the opposite (ok for Windows, decode error on UNIX).
msg176872 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-12-04 02:32
> New changeset c25635b137cc by Victor Stinner in branch 'default':
> Issue #16455: On FreeBSD and Solaris, if the locale is C, the
> http://hg.python.org/cpython/rev/c25635b137cc

This changeset should fix this issue on FreeBSD and Solaris: see the issue #16455 for more information.
msg178118 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2012-12-25 11:35
Victor, are you done all work for the issue?
Can it be closed?
msg178171 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-12-25 23:04
The issue is now fixed on all platforms for Python 3.4. Please keep the
issue open until all changes are backported to Python 3.3 or even Python
3.2.
msg178173 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2012-12-25 23:20
I assign the issue to you than. Is it ok?
msg178234 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-12-26 16:24
Status of the different issues:

#16416, Mac OS X: 3.2, 3.3, 3.4
#16455, FreeBSD and Solaris: 3.4
#16218, set_main_loader: 3.3, 3.4
#16218, test_cmd_line_script: 3.4 (3.3 has an old copy of the test)
#16414, add support.TESTFN_NONASCII: 3.4
#16444, use support.TESTFN_NONASCII: 3.4
msg178869 - (view) Author: Roundup Robot (python-dev) Date: 2013-01-03 00:59
New changeset 41658a4fb3cc by Victor Stinner in branch '3.2':
Issue #16218, #16414, #16444: Backport FS_NONASCII, TESTFN_UNDECODABLE,
http://hg.python.org/cpython/rev/41658a4fb3cc

New changeset 4d40c1ce8566 by Victor Stinner in branch '3.3':
(Merge 3.2) Issue #16218, #16414, #16444: Backport FS_NONASCII,
http://hg.python.org/cpython/rev/4d40c1ce8566
msg178871 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-01-03 01:08
> I assign the issue to you than. Is it ok?

Sure.

I backported all changesets related to this issue to Python 3.2 and 3.3. So I can finally close this issue.
msg179564 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2013-01-10 16:29
Thanks!
History
Date User Action Args
2016-06-22 19:18:10serhiy.storchakasetstage: commit review -> resolved
2013-01-10 16:29:28asvetlovsetmessages: + msg179564
2013-01-03 01:08:38hayposetstatus: open -> closed
assignee: haypo ->
resolution: fixed
messages: + msg178871
2013-01-03 00:59:43python-devsetmessages: + msg178869
2012-12-26 16:24:00hayposetmessages: + msg178234
2012-12-25 23:20:58asvetlovsetassignee: asvetlov -> haypo
messages: + msg178173
2012-12-25 23:04:19hayposetmessages: + msg178171
2012-12-25 11:35:19asvetlovsetmessages: + msg178118
2012-12-04 02:32:43hayposetmessages: + msg176872
2012-11-12 11:14:46hayposetmessages: + msg175437
2012-11-12 11:07:23skrahsetmessages: + msg175436
2012-11-12 10:58:36koobssetmessages: + msg175435
2012-11-12 00:24:15python-devsetmessages: + msg175414
2012-11-10 20:26:41jceasetmessages: + msg175295
2012-11-10 17:20:59pitrousetmessages: + msg175290
2012-11-10 11:08:21hayposetmessages: + msg175274
2012-11-10 11:07:35python-devsetmessages: + msg175273
2012-11-10 10:36:16hayposetmessages: + msg175270
2012-11-10 01:16:25koobssetnosy: + koobs
messages: + msg175255
2012-11-08 19:14:11serhiy.storchakasetmessages: + msg175185
2012-11-05 22:20:11hayposetmessages: + msg174944
2012-11-05 12:34:25serhiy.storchakasetmessages: + msg174901
2012-11-05 12:14:42hayposetmessages: + msg174899
2012-11-05 12:12:49hayposetmessages: + msg174898
2012-11-05 08:50:38serhiy.storchakasetmessages: + msg174881
2012-11-05 08:16:55hayposetmessages: + msg174878
2012-11-05 08:07:24hayposetmessages: + msg174877
2012-11-05 07:55:53serhiy.storchakasetmessages: + msg174876
2012-11-05 07:46:26asvetlovsetmessages: + msg174874
2012-11-05 07:22:26pitrousetnosy: + pitrou
messages: + msg174871
2012-11-05 06:20:25python-devsetmessages: + msg174865
2012-11-05 06:19:41asvetlovsetmessages: + msg174864
2012-11-04 23:35:09hayposetfiles: + test_non_ascii.patch

messages: + msg174844
2012-11-04 23:18:23hayposetmessages: + msg174842
2012-11-04 23:14:00hayposetmessages: + msg174841
2012-11-03 13:43:43serhiy.storchakasetfiles: + pythonrun_filename_decoding_test_2.patch

messages: + msg174620
2012-11-03 12:37:47python-devsetmessages: + msg174611
2012-11-03 11:23:54skrahsetmessages: + msg174606
2012-11-03 10:51:30asvetlovsetmessages: + msg174604
2012-11-03 10:50:18python-devsetmessages: + msg174603
2012-11-03 07:27:45Ramchandra Aptesettitle: Python launcher does not support non ascii characters -> Python launcher does not support unicode characters
2012-11-02 22:53:41skrahsetmessages: + msg174595
2012-11-02 21:54:52serhiy.storchakasetmessages: + msg174590
2012-11-02 21:43:12asvetlovsetmessages: + msg174588
2012-11-02 21:36:02skrahsetmessages: + msg174587
2012-11-02 21:17:59asvetlovsetmessages: + msg174581
2012-11-02 21:03:59skrahsetmessages: + msg174577
2012-11-02 20:51:46asvetlovsetmessages: + msg174573
2012-11-02 20:40:04skrahsetmessages: + msg174571
2012-11-02 20:30:01serhiy.storchakasetmessages: + msg174568
2012-11-02 19:37:38vinay.sajipsetnosy: - vinay.sajip
2012-11-02 19:33:26skrahsetmessages: + msg174560
2012-11-02 18:12:07serhiy.storchakasetfiles: + pythonrun_filename_decoding_test.patch

messages: + msg174549
2012-11-02 14:58:30brian.curtinsetnosy: - brian.curtin
2012-11-02 14:57:40asvetlovsetassignee: asvetlov
messages: + msg174531
2012-11-02 14:51:26jceasetstatus: closed -> open
resolution: fixed -> (no value)
messages: + msg174529

stage: resolved -> commit review
2012-11-02 14:42:42jceasetnosy: + jcea
2012-11-02 14:35:19skrahlinkissue16387 superseder
2012-11-02 14:34:45skrahsetnosy: + skrah
messages: + msg174521
2012-11-01 17:21:44vinay.sajipsetmessages: + msg174433
2012-11-01 16:37:00asvetlovsetmessages: + msg174430
2012-11-01 16:23:15vinay.sajipsetnosy: + vinay.sajip
messages: + msg174427
2012-11-01 12:52:51asvetlovsetstatus: open -> closed

nosy: + asvetlov
messages: + msg174409

resolution: fixed
stage: patch review -> resolved
2012-11-01 12:52:16python-devsetnosy: + python-dev
messages: + msg174408
2012-10-24 23:54:23serhiy.storchakasetfiles: + pythonrun_filename_decoding_2.patch

messages: + msg173724
stage: test needed -> patch review
2012-10-20 10:31:12serhiy.storchakasetmessages: + msg173382
2012-10-20 10:02:47ezio.melottisetnosy: + ezio.melotti

type: crash -> behavior
stage: test needed
2012-10-20 09:16:09hayposetnosy: + haypo
messages: + msg173376
2012-10-20 07:55:02serhiy.storchakasetfiles: + pythonrun_filename_decoding.patch
keywords: + patch
messages: + msg173374
2012-10-20 07:25:18serhiy.storchakasetversions: + Python 3.4
nosy: + serhiy.storchaka

messages: + msg173373

components: + Interpreter Core, - Windows
keywords: + 3.3regression
2012-10-19 19:48:27tim.goldensetmessages: + msg173359
2012-10-19 19:19:28gkleinsetnosy: + gklein
2012-10-14 20:46:21merwoksetnosy: + tim.golden, brian.curtin
2012-10-13 14:33:19jklothsetnosy: + jkloth
2012-10-13 14:24:38turncccreate