New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test_cmd_line fails on MacOS X #48638
Comments
There is one test failure with Python 3.0rc3 built on MacOS X 10.4.11 % python.exe Lib/test/test_cmd_line.py ====================================================================== Traceback (most recent call last):
File "Lib/test/test_cmd_line.py", line 143, in test_run_code
0)
AssertionError: 1 != 0 Ran 10 tests in 2.074s FAILED (failures=1)
Traceback (most recent call last):
File "Lib/test/test_cmd_line.py", line 151, in <module>
test_main()
File "Lib/test/test_cmd_line.py", line 147, in test_main
test.support.run_unittest(CmdLineTest)
File ".../Python-3.0rc3/Lib/test/support.py", line 698, in
run_unittest
_run_suite(suite)
File ".../Python-3.0rc3/Lib/test/support.py", line 681, in _run_suite
raise TestFailed(err)
test.support.TestFailed: Traceback (most recent call last):
File "Lib/test/test_cmd_line.py", line 143, in test_run_code
0)
AssertionError: 1 != 0 The results for this code snippet: # Test handling of non-ascii data
if sys.getfilesystemencoding() != 'ascii':
command = "assert(ord('\xe9') == 0xe9)"
self.assertEqual(
self.exit_code('-c', command),
0) are: % python.exe
Python 3.0rc3 (r30rc3:67312, Nov 21 2008, 14:20:38)
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getfilesystemencoding()
'utf-8'
>>> ord('\xe9') == 0xe9
True |
This seems to have something to do with the current locale. $ echo $LANG
C and the test fails. On OS X 10.5.5: $ echo $LANG
en_GB.UTF-8 and test_cmd_line.py passes. Moreover, after doing: $ export LANG=C test_cmd_line.py fails on OS X 10.5 too in the same way. |
Here's a minimal failing example, which I believe captures the cause of Python 3.0rc3+ (py3k:67335, Nov 22 2008, 09:11:58)
[GCC 4.0.1 (Apple Inc. build 5488)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys, posix
>>> sys.getfilesystemencoding()
'utf-8'
>>> posix.execv(sys.executable, [sys.executable, '-c', "ord('\xe9')"])
Traceback (most recent call last):
File "<string>", line 1, in <module>
TypeError: ord() expected a character, but string of length 2 found Clearly the single '\xe9' character is being encoded in utf8 as The encoding happens at around line 2988 of posixmodule.c, in posix_execv. |
I'm not competent enough in this area to judge how serious this bug is, or Martin, would you be able to take a look? |
There is some inconsistency in the conversions with the "command line":
|
The locale machinery on OSX is flaky. The question is what people really
I don't know whether it's possible to launch Python scripts from Finder, Without actual testing, I would assume that command line arguments are |
It looks like your conjectures are right in both cases. I tried adding a few lines to Modules/python.c to print out the argv
Yes, it appears that the terminal encoding is applied, if I'm reading ./python.exe a.py é with the terminal character encoding set to "Unicode (UTF-8)", Python
Yes. Python seems to receive the same string regardless of terminal |
The test was originally run with % echo $LANG The same failure occurs with LANG set to C % env LANG=C ../Python-3.0rc3/python.exe Lib/test/test_cmd_line.py
test_directories (__main__.CmdLineTest) ... ok
....
FAILED (failures=1)
Traceback (most recent call last):
File "Lib/test/test_cmd_line.py", line 151, in <module>
test_main()
File "Lib/test/test_cmd_line.py", line 147, in test_main
test.support.run_unittest(CmdLineTest)
File "/Users/jean/Desktop/Python-3.0rc3/Lib/test/support.py", line
698, in run_unittest
_run_suite(suite)
File "/Users/jean/Desktop/Python-3.0rc3/Lib/test/support.py", line
681, in _run_suite
raise TestFailed(err)
test.support.TestFailed: Traceback (most recent call last):
File "Lib/test/test_cmd_line.py", line 143, in test_run_code
0)
AssertionError: 1 != 0 But the test passes in both these cases: % env LANG=en_US.UTF-8 ../Python-3.0rc3/python.exe % env LANG=en_GB.UTF-8 ../Python-3.0rc3/python.exe |
The results from this script import os, sys
print('Python %s' % sys.version.split()[0])
print('env[LANG]: %s' % os.environ.get('LANG', '<not set>'))
print('default encoding: %s' % sys.getdefaultencoding())
print('filesystem encoding: %s' % sys.getfilesystemencoding()) are with Python 3.0: Python 3.0rc3 but for Python 2.3 thru 2.6: Python 2.6 All with Python built from source on MacOS X 10.4.11 (Intel). |
So the obvious quick fix is, on OS X only, to set the locale to e.g. Here's a patch that does this. I don't like this much, though. For one thing, I don't have any reason Anyone have any better suggestions? |
I'm opposed to this patch for the same reason.
We should manually decode the command line arguments with UTF-8 on OSX; Regards, |
I'm now very confused. In trying to follow things of type wchar_t* around the Python source, I
But this looks wrong: on OS X, sizeof(wchar_t) is 4 and I think w is Should I open an issue for this, or am I simply misunderstanding? |
That 'UCS-2' should be 'UTF-16', of course. |
I think you are right. However, conversion to/from wchar_t is/was |
Okay. So it's an issue, but not a blocker. Opened bpo-4474 for this. Thanks, Martin. |
"C locale (alias POSIX, ANSI_X3.4-1968) define is 7-bit char-set. After quick check into code |
This test still fails and is the only failure with Python 3.1rc2 on MacOS |
Confirming that the test fails on r77044. Tested on Mac OS X 10.4.11 (Intel). running build_scripts ====================================================================== Traceback (most recent call last):
File
"/Users/salman/svn/python/branches/py3k/Lib/test/test_cmd_line.py", line
132, in test_run_code
0)
AssertionError: 1 != 0 Ran 13 tests in 2.235s FAILED (failures=1)
test test_cmd_line failed -- Traceback (most recent call last):
File
"/Users/salman/svn/python/branches/py3k/Lib/test/test_cmd_line.py", line
132, in test_run_code
0)
AssertionError: 1 != 0 1 test failed: |
I still see this failure on Python 3 trunk with Mac OS X 10.6. |
This passes for me in Mac OS X Terminal (a UTF8 terminal) but fails in iTerm (an ascii terminal) on both 31-maint and py3k. |
This issue is specific to Mac OS X because the file system encoding is hardcoded to UTF-8 on this OS. As written in msg76244, the problem is that the encoding is different for input (sys.argv) and output arguments (arguments of child processes). As written in msg76255, program arguments are encoded to the locale (terminal) encoding. Finally, the problem is that subprocess, os.exec*(), etc. encode command line arguments with the file system encoding instead of the locale encoding. On Linux, it just work because the file system encoding is the locale encoding. |
What if os.system(), os.execvp() and friends used "wcstombs" (or locale.preferredencoding) to convert arguments from unicode to bytes? this would at least guarantee round-trip when spawning another python interpreter. An interesting test is to compare the effects of os.unlink(filename) and os.system('rm "%s"' % filename), where filename is non-ascii. Does it work today? |
Any progress on this? Is the best thing to just set LANG? |
FWIW, this still happens on the latest of /branches/py3k, when LANG does not match up to the enforced fs encoding-- which for me happened when I ran the buildslave under launchd. I was finally able to reproduce it, and after doing so, verified that cmdline_encoding-2.patch on bpo-9992 fixed it. |
ixokai has the bug on Snow Leopard x86. |
For the record, this can be now reproduced under Linux by forcing different locale and filesystem encodings: $ PYTHONFSENCODING=utf8 LANG=ISO-8859-1 ./python -m test.regrtest test_cmd_line [1/1] test_cmd_line
test test_cmd_line failed -- Traceback (most recent call last):
File "/home/antoine/py3k/__svn__/Lib/test/test_cmd_line.py", line 109, in test_run_code
assert_python_ok('-c', command)
File "/home/antoine/py3k/__svn__/Lib/test/script_helper.py", line 35, in assert_python_ok
return _assert_python(True, *args)
File "/home/antoine/py3k/__svn__/Lib/test/script_helper.py", line 31, in _assert_python
"stderr follows:\n%s" % (rc, err.decode('ascii', 'ignore')))
AssertionError: Process return code is 1, stderr follows:
Traceback (most recent call last):
File "<string>", line 1, in <module>
TypeError: ord() expected a character, but string of length 2 found |
I opened a separated issue for Linux, bpo-9992, because some Mac OS X users say Extract of msg111432 (bpo-8775): "To be honest, I'd say the behavior of OSX 10.4 |
This issue should be fixed by r85435 (OSX: decode command line arguments from utf-8), see bpo-9992. I will watch for the OSX buildbots. |
I don't know if it fixes the issue, but it introduces a regression. r85442 reverts it. --- Revert r85435 (and r85440): decode command line arguments from utf-8 Python exits with a fatal error if the command line contains an undecodable argument. PyUnicode_FromString() fails at the first undecodable byte because it calls the error handler, but error handlers are not ready before Python initialization. --- The problem is to get a function to decode a bytes string from utf-8 in main() (before Python initialization). Possibilities:
I prefer the _Py_char2wchar() solution because I'm sure that it works before Python initialization. |
osx_utf8_cmdline.patch: copy of r85435. |
One solution would be to duplicate the UTF-8 decoder for OSX, incorporating surrogate escape. This should be much shorter than the full UTF-8 codec, and perhaps at least utf8_code_length could be shared. |
Good idea, implemented in the attached patch [osx_utf8_cmdline-3.patch]. I tested the patch on x86 Snow Leopard 3.x and it looks like it fixes the test_cmd_line failure (I modified some tests to remove manually LC_ALL, LC_CTYPE and LANG environment variables). |
_Py_DecodeUTF8_surrogateescape() is a simplified version of
Hum, resize the buffer is maybe a good idea to not waste memory. |
I commited my patch to Python 3.2 (r85765), with a specific test in test_cmd_line. Reopen the issue if the bug is not fixed. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: