Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_cmd_line fails on MacOS X #48638

Closed
MrJean1 mannequin opened this issue Nov 22, 2008 · 35 comments
Closed

test_cmd_line fails on MacOS X #48638

MrJean1 mannequin opened this issue Nov 22, 2008 · 35 comments
Labels
tests Tests in the Lib/test dir topic-unicode type-bug An unexpected behavior, bug, or error

Comments

@MrJean1
Copy link
Mannequin

MrJean1 mannequin commented Nov 22, 2008

BPO 4388
Nosy @loewis, @smontanaro, @ronaldoussoren, @amauryfa, @mdickinson, @pitrou, @vstinner, @dvarrazzo, @voidspace
Files
  • issue4388.patch
  • osx_utf8_cmdline.patch
  • osx_utf8_cmdline-3.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2010-10-20.23:08:57.716>
    created_at = <Date 2008-11-22.04:54:40.590>
    labels = ['type-bug', 'tests', 'expert-unicode']
    title = 'test_cmd_line fails on MacOS X'
    updated_at = <Date 2010-10-20.23:08:57.715>
    user = 'https://bugs.python.org/MrJean1'

    bugs.python.org fields:

    activity = <Date 2010-10-20.23:08:57.715>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2010-10-20.23:08:57.716>
    closer = 'vstinner'
    components = ['Tests', 'Unicode']
    creation = <Date 2008-11-22.04:54:40.590>
    creator = 'MrJean1'
    dependencies = []
    files = ['12164', '19228', '19305']
    hgrepos = []
    issue_num = 4388
    keywords = ['patch']
    message_count = 35.0
    messages = ['76232', '76241', '76242', '76243', '76244', '76253', '76255', '76263', '76264', '76626', '76632', '76646', '76648', '76649', '76652', '76667', '89362', '96865', '104700', '104713', '106105', '106144', '106149', '115674', '118220', '118223', '118226', '118254', '118597', '118601', '118603', '118619', '119223', '119224', '119242']
    nosy_count = 13.0
    nosy_names = ['loewis', 'skip.montanaro', 'ixokai', 'ronaldoussoren', 'amaury.forgeotdarc', 'mark.dickinson', 'pitrou', 'vstinner', 'piro', 'MrJean1', 'rpetrov', 'michael.foord', 'slmnhq']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'needs patch'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue4388'
    versions = ['Python 3.1', 'Python 3.2']

    @MrJean1
    Copy link
    Mannequin Author

    MrJean1 mannequin commented Nov 22, 2008

    There is one test failure with Python 3.0rc3 built on MacOS X 10.4.11
    (Intel). Test test_cmd_line fails in the very last test as follows:

    % python.exe Lib/test/test_cmd_line.py
    test_directories (main.CmdLineTest) ... ok
    test_optimize (main.CmdLineTest) ... ok
    test_q (main.CmdLineTest) ... ok
    test_run_code (main.CmdLineTest) ... FAIL
    test_run_module (main.CmdLineTest) ... ok
    test_run_module_bug1764407 (main.CmdLineTest) ... ok
    test_site_flag (main.CmdLineTest) ... ok
    test_usage (main.CmdLineTest) ... ok
    test_verbose (main.CmdLineTest) ... ok
    test_version (main.CmdLineTest) ... ok

    ======================================================================
    FAIL: test_run_code (main.CmdLineTest)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "Lib/test/test_cmd_line.py", line 143, in test_run_code
        0)
    AssertionError: 1 != 0

    Ran 10 tests in 2.074s

    FAILED (failures=1)
    Traceback (most recent call last):
      File "Lib/test/test_cmd_line.py", line 151, in <module>
        test_main()
      File "Lib/test/test_cmd_line.py", line 147, in test_main
        test.support.run_unittest(CmdLineTest)
      File ".../Python-3.0rc3/Lib/test/support.py", line 698, in 
    run_unittest
        _run_suite(suite)
      File ".../Python-3.0rc3/Lib/test/support.py", line 681, in _run_suite
        raise TestFailed(err)
    test.support.TestFailed: Traceback (most recent call last):
      File "Lib/test/test_cmd_line.py", line 143, in test_run_code
        0)
    AssertionError: 1 != 0

    The results for this code snippet:

            # Test handling of non-ascii data
            if sys.getfilesystemencoding() != 'ascii':
                command = "assert(ord('\xe9') == 0xe9)"
                self.assertEqual(
                    self.exit_code('-c', command),
                    0)

    are:

    % python.exe 
    Python 3.0rc3 (r30rc3:67312, Nov 21 2008, 14:20:38) 
    [GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import sys
    >>> sys.getfilesystemencoding()
    'utf-8'
    >>> ord('\xe9') == 0xe9
    True

    @MrJean1 MrJean1 mannequin added tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error labels Nov 22, 2008
    @mdickinson
    Copy link
    Member

    This seems to have something to do with the current locale.
    On OS X 10.4.11/PPC I have:

    $ echo $LANG
    C

    and the test fails. On OS X 10.5.5:

    $ echo $LANG
    en_GB.UTF-8

    and test_cmd_line.py passes. Moreover, after doing:

    $ export LANG=C

    test_cmd_line.py fails on OS X 10.5 too in the same way.

    @mdickinson
    Copy link
    Member

    Here's a minimal failing example, which I believe captures the cause of
    the test_cmd_line failure. After "export LANG=C", on OS X 10.5, I get:

    Python 3.0rc3+ (py3k:67335, Nov 22 2008, 09:11:58) 
    [GCC 4.0.1 (Apple Inc. build 5488)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import sys, posix
    >>> sys.getfilesystemencoding()
    'utf-8'
    >>> posix.execv(sys.executable, [sys.executable, '-c', "ord('\xe9')"])
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    TypeError: ord() expected a character, but string of length 2 found

    Clearly the single '\xe9' character is being encoded in utf8 as
    b'\xc3\xa9', and the python interpreter invoked by the execv ends up
    receiving two characters here instead of one.

    The encoding happens at around line 2988 of posixmodule.c, in posix_execv.

    @mdickinson
    Copy link
    Member

    I'm not competent enough in this area to judge how serious this bug is, or
    determine what to do about it, but it seems as though it might potentially
    be a release blocker.

    Martin, would you be able to take a look?

    @amauryfa
    Copy link
    Member

    There is some inconsistency in the conversions with the "command line":

    • on input, sys.argv decodes with mbstowcs
    • on output, os.system uses utf-8, os.execv uses the
      FileSystemDefaultEncoding.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Nov 22, 2008

    The locale machinery on OSX is flaky. The question is what people really
    pass for command line arguments. It would be useful to find out what
    happens in these two cases:

    1. Somebody runs "a.py ภาษาไทย" in a Terminal.app window. Most likely,
      the terminal encoding is applied, which we should assume to be UTF-8
      (although it might be different on some systems).

    2. Somebody creates a file japanese_コンテンツ in the finder, then uses
      shell completion to pass this to a Python script. Here I expect that
      UTF-8 is used even if the terminal's encoding is not UTF-8.

    I don't know whether it's possible to launch Python scripts from Finder,
    for given files, if so, it would also be interesting to find out what
    encoding will be used there.

    Without actual testing, I would assume that command line arguments are
    typically encoded in UTF-8 on OSX. We should use that for argument
    processing, regardless of mbstowcs.

    @loewis loewis mannequin removed their assignment Nov 22, 2008
    @mdickinson
    Copy link
    Member

    It looks like your conjectures are right in both cases.

    I tried adding a few lines to Modules/python.c to print out the argv
    entries as byte strings, before they're passed to mbstowcs. Results
    on OS X 10.5:

    1. Somebody runs "a.py ภาษาไทย" in a Terminal.app window. Most likely,
      the terminal encoding is applied, which we should assume to be UTF-8
      (although it might be different on some systems).

    Yes, it appears that the terminal encoding is applied, if I'm reading
    the results right. Trying

    ./python.exe a.py é

    with the terminal character encoding set to "Unicode (UTF-8)", Python
    receives the third argument as bytes([195, 169]). With the terminal
    encoding set to "Western (ISO Latin 1)" instead, Python receives
    bytes([233]).

    1. Somebody creates a file japanese_コンテンツ in the finder, then uses
      shell completion to pass this to a Python script. Here I expect that
      UTF-8 is used even if the terminal's encoding is not UTF-8.

    Yes. Python seems to receive the same string regardless of terminal
    encoding. (With the terminal encoding set to latin1, the tab-completed
    filename looks like garbage within Terminal, of course.)

    @MrJean1
    Copy link
    Mannequin Author

    MrJean1 mannequin commented Nov 22, 2008

    The test was originally run with

    % echo $LANG
    tcsh: LANG: Undefined variable.

    The same failure occurs with LANG set to C

    % env LANG=C ../Python-3.0rc3/python.exe Lib/test/test_cmd_line.py
    test_directories (__main__.CmdLineTest) ... ok
    ....
    FAILED (failures=1)
    Traceback (most recent call last):
      File "Lib/test/test_cmd_line.py", line 151, in <module>
        test_main()
      File "Lib/test/test_cmd_line.py", line 147, in test_main
        test.support.run_unittest(CmdLineTest)
      File "/Users/jean/Desktop/Python-3.0rc3/Lib/test/support.py", line 
    698, in run_unittest
        _run_suite(suite)
      File "/Users/jean/Desktop/Python-3.0rc3/Lib/test/support.py", line 
    681, in _run_suite
        raise TestFailed(err)
    test.support.TestFailed: Traceback (most recent call last):
      File "Lib/test/test_cmd_line.py", line 143, in test_run_code
        0)
    AssertionError: 1 != 0

    But the test passes in both these cases:

    % env LANG=en_US.UTF-8 ../Python-3.0rc3/python.exe
    Lib/test/test_cmd_line.py
    Lib/test/test_cmd_line.py
    ....
    test_run_code (main.CmdLineTest) ... ok
    ....
    OK

    % env LANG=en_GB.UTF-8 ../Python-3.0rc3/python.exe
    Lib/test/test_cmd_line.py
    ....
    test_run_code (main.CmdLineTest) ... ok
    ....
    OK

    @MrJean1
    Copy link
    Mannequin Author

    MrJean1 mannequin commented Nov 22, 2008

    The results from this script

      import os, sys
      print('Python %s' % sys.version.split()[0])
      print('env[LANG]: %s' % os.environ.get('LANG', '<not set>'))
      print('default encoding: %s' % sys.getdefaultencoding())
      print('filesystem encoding: %s' % sys.getfilesystemencoding())

    are with Python 3.0:

    Python 3.0rc3
    env[LANG]: <not set>
    default encoding: utf-8
    filesystem encoding: utf-8

    but for Python 2.3 thru 2.6:

    Python 2.6
    env[LANG]: <not set>
    default encoding: ascii
    filesystem encoding: utf-8

    All with Python built from source on MacOS X 10.4.11 (Intel).

    @mdickinson
    Copy link
    Member

    So the obvious quick fix is, on OS X only, to set the locale to e.g.
    "en_US.UTF-8" instead of "" just before the mbstowcs call.

    Here's a patch that does this.

    I don't like this much, though. For one thing, I don't have any reason
    to believe that the particular locale "en_US.UTF-8" will be supported on
    any given installation of OS X.

    Anyone have any better suggestions?

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Nov 29, 2008

    I don't like this much, though. For one thing, I don't have any reason
    to believe that the particular locale "en_US.UTF-8" will be supported on
    any given installation of OS X.

    I'm opposed to this patch for the same reason.

    Anyone have any better suggestions?

    We should manually decode the command line arguments with UTF-8 on OSX;
    this will require yet another UTF-8 implementation (this time to
    wchar_t).

    Regards,
    Martin

    @mdickinson
    Copy link
    Member

    I'm now very confused.

    In trying to follow things of type wchar_t* around the Python source, I
    discovered PyUnicode_FromWideChar in unicodebject.c. For OS X, the
    conversion lands in the following code, where w is the incoming WideChar
    array, declared as wchar_t *.

    register Py_UNICODE \*u;
    register Py_ssize_t i;
    u = PyUnicode_AS_UNICODE(unicode);
    for (i = size; i \> 0; i--)
        \*u++ = \*w++;
    

    But this looks wrong: on OS X, sizeof(wchar_t) is 4 and I think w is
    encoded in UTF-32. So I was expecting to see some kind of explicit
    conversion from UTF-32 to UCS-2 here. Instead, it looks as though the
    incoming values are implicitly truncated from 32 bits to 16. Doesn't this
    do the wrong thing for characters outside the BMP?

    Should I open an issue for this, or am I simply misunderstanding?

    @mdickinson
    Copy link
    Member

    conversion from UTF-32 to UCS-2 here

    That 'UCS-2' should be 'UTF-16', of course.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Nov 30, 2008

    Should I open an issue for this, or am I simply misunderstanding?

    I think you are right. However, conversion to/from wchar_t is/was
    rarely used, and so are non-BMP characters; it's very likely that
    the problem hasn't occurred in practice (and I doubt it would occur
    in 3.0 if not fixed - there are more severe problems around).

    @mdickinson
    Copy link
    Member

    it's very likely that
    the problem hasn't occurred in practice (and I doubt it would occur
    in 3.0 if not fixed - there are more severe problems around).

    Okay. So it's an issue, but not a blocker. Opened bpo-4474 for this.

    Thanks, Martin.

    @rpetrov
    Copy link
    Mannequin

    rpetrov mannequin commented Nov 30, 2008

    "C locale (alias POSIX, ANSI_X3.4-1968) define is 7-bit char-set.
    It is expected mbstowcs to return error is a byte sequence contain a
    byte > 128.

    After quick check into code
    (http://svn.python.org/view/python/branches/py3k/Lib/test/test_cmd_line.py?rev=67193&view=auto)
    I guess that failure is from command "assert(ord('\xe9') == 0xe9)" (test
    is run only on mac os platforms). For the "C" program run is ascii(C,..)
    locale is expected conversion of byte \xe9 to wchar_t to return error.

    @MrJean1
    Copy link
    Mannequin Author

    MrJean1 mannequin commented Jun 14, 2009

    This test still fails and is the only failure with Python 3.1rc2 on MacOS
    X 10.4.11 Tiger (Intel).

    @slmnhq
    Copy link
    Mannequin

    slmnhq mannequin commented Dec 24, 2009

    Confirming that the test fails on r77044.

    Tested on Mac OS X 10.4.11 (Intel).

    running build_scripts
    test_cmd_line
    test_directories (test.test_cmd_line.CmdLineTest) ... ok
    test_large_PYTHONPATH (test.test_cmd_line.CmdLineTest) ... ok
    test_optimize (test.test_cmd_line.CmdLineTest) ... ok
    test_q (test.test_cmd_line.CmdLineTest) ... ok
    test_run_code (test.test_cmd_line.CmdLineTest) ... FAIL
    test_run_module (test.test_cmd_line.CmdLineTest) ... ok
    test_run_module_bug1764407 (test.test_cmd_line.CmdLineTest) ... ok
    test_site_flag (test.test_cmd_line.CmdLineTest) ... ok
    test_unbuffered_input (test.test_cmd_line.CmdLineTest) ... ok
    test_unbuffered_output (test.test_cmd_line.CmdLineTest) ... ok
    test_usage (test.test_cmd_line.CmdLineTest) ... ok
    test_verbose (test.test_cmd_line.CmdLineTest) ... ok
    test_version (test.test_cmd_line.CmdLineTest) ... ok

    ======================================================================
    FAIL: test_run_code (test.test_cmd_line.CmdLineTest)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File 
    "/Users/salman/svn/python/branches/py3k/Lib/test/test_cmd_line.py", line 
    132, in test_run_code
        0)
    AssertionError: 1 != 0

    Ran 13 tests in 2.235s

    FAILED (failures=1)
    test test_cmd_line failed -- Traceback (most recent call last):
      File 
    "/Users/salman/svn/python/branches/py3k/Lib/test/test_cmd_line.py", line 
    132, in test_run_code
        0)
    AssertionError: 1 != 0

    1 test failed:
    test_cmd_line

    @voidspace
    Copy link
    Contributor

    I still see this failure on Python 3 trunk with Mac OS X 10.6.

    @voidspace
    Copy link
    Contributor

    This passes for me in Mac OS X Terminal (a UTF8 terminal) but fails in iTerm (an ascii terminal) on both 31-maint and py3k.

    @vstinner
    Copy link
    Member

    This issue is specific to Mac OS X because the file system encoding is hardcoded to UTF-8 on this OS. As written in msg76244, the problem is that the encoding is different for input (sys.argv) and output arguments (arguments of child processes). As written in msg76255, program arguments are encoded to the locale (terminal) encoding.

    Finally, the problem is that subprocess, os.exec*(), etc. encode command line arguments with the file system encoding instead of the locale encoding.

    On Linux, it just work because the file system encoding is the locale encoding.

    @vstinner
    Copy link
    Member

    I created to related issues:

    • bpo-8775: Use locale encoding to decode sys.argv, not the file system encoding
    • bpo-8776: Bytes version of sys.argv

    If bpo-8775 is fixed, it should fix this issue too.

    @amauryfa
    Copy link
    Member

    What if os.system(), os.execvp() and friends used "wcstombs" (or locale.preferredencoding) to convert arguments from unicode to bytes? this would at least guarantee round-trip when spawning another python interpreter.

    An interesting test is to compare the effects of os.unlink(filename) and os.system('rm "%s"' % filename), where filename is non-ascii. Does it work today?

    @smontanaro
    Copy link
    Contributor

    Any progress on this? Is the best thing to just set LANG?

    @ixokai
    Copy link
    Mannequin

    ixokai mannequin commented Oct 8, 2010

    FWIW, this still happens on the latest of /branches/py3k, when LANG does not match up to the enforced fs encoding-- which for me happened when I ran the buildslave under launchd.

    I was finally able to reproduce it, and after doing so, verified that cmdline_encoding-2.patch on bpo-9992 fixed it.

    @vstinner
    Copy link
    Member

    vstinner commented Oct 8, 2010

    FWIW, this still happens on the latest of /branches/py3k,
    when LANG does not match up to the enforced fs encoding

    ixokai has the bug on Snow Leopard x86.

    @pitrou
    Copy link
    Member

    pitrou commented Oct 8, 2010

    For the record, this can be now reproduced under Linux by forcing different locale and filesystem encodings:

    $ PYTHONFSENCODING=utf8 LANG=ISO-8859-1 ./python -m test.regrtest test_cmd_line
    [1/1] test_cmd_line
    test test_cmd_line failed -- Traceback (most recent call last):
      File "/home/antoine/py3k/__svn__/Lib/test/test_cmd_line.py", line 109, in test_run_code
        assert_python_ok('-c', command)
      File "/home/antoine/py3k/__svn__/Lib/test/script_helper.py", line 35, in assert_python_ok
        return _assert_python(True, *args)
      File "/home/antoine/py3k/__svn__/Lib/test/script_helper.py", line 31, in _assert_python
        "stderr follows:\n%s" % (rc, err.decode('ascii', 'ignore')))
    AssertionError: Process return code is 1, stderr follows:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    TypeError: ord() expected a character, but string of length 2 found

    @vstinner
    Copy link
    Member

    vstinner commented Oct 9, 2010

    For the record, this can be now reproduced under Linux by forcing different
    locale and filesystem encodings:

    $ PYTHONFSENCODING=utf8 LANG=ISO-8859-1 ./python -m test.regrtest
    test_cmd_line

    I opened a separated issue for Linux, bpo-9992, because some Mac OS X users say
    that this issue looks like a Mac OS X bug and the fix may be different.

    Extract of msg111432 (bpo-8775): "To be honest, I'd say the behavior of OSX 10.4
    is a bug and we might add a workaround on that platform that uses
    CFStringGetSystemEncoding() to fetch the actual system encoding when LANG=C."

    @vstinner
    Copy link
    Member

    This issue should be fixed by r85435 (OSX: decode command line arguments from utf-8), see bpo-9992.

    I will watch for the OSX buildbots.

    @vstinner
    Copy link
    Member

    This issue should be fixed by r85435 ...
    I will watch for the OSX buildbots.

    I don't know if it fixes the issue, but it introduces a regression. r85442 reverts it.

    ---

    Revert r85435 (and r85440): decode command line arguments from utf-8

    Python exits with a fatal error if the command line contains an undecodable argument. PyUnicode_FromString() fails at the first undecodable byte because it calls the error handler, but error handlers are not ready before Python initialization.

    ---

    The problem is to get a function to decode a bytes string from utf-8 in main() (before Python initialization). Possibilities:

    • Use PyUnicode_DecodeUTF8Stateful() and tell it to not call the error handler but exit immediatly (return NULL). Eg. check a flag (function argument or global variable?) to check if we should call the error handler or not
    • Use _Py_char2wchar() and set temporary the locale to an utf-8 locale. The problem is to get an utf-8 locale. Is there an utf-8 locale which is always available?
    • Another solution?

    I prefer the _Py_char2wchar() solution because I'm sure that it works before Python initialization.

    @vstinner
    Copy link
    Member

    osx_utf8_cmdline.patch: copy of r85435.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Oct 14, 2010

    One solution would be to duplicate the UTF-8 decoder for OSX, incorporating surrogate escape. This should be much shorter than the full UTF-8 codec, and perhaps at least utf8_code_length could be shared.

    @vstinner
    Copy link
    Member

    One solution would be to duplicate the UTF-8 decoder for OSX,
    incorporating surrogate escape. This should be much shorter
    than the full UTF-8 codec, and perhaps at least utf8_code_length
    could be shared.

    Good idea, implemented in the attached patch [osx_utf8_cmdline-3.patch]. I tested the patch on x86 Snow Leopard 3.x and it looks like it fixes the test_cmd_line failure (I modified some tests to remove manually LC_ALL, LC_CTYPE and LANG environment variables).

    @vstinner
    Copy link
    Member

    _Py_DecodeUTF8_surrogateescape() is a simplified version of
    PyUnicode_DecodeUTF8Stateful():

    • no "consumed" argument
    • only support surrogateescape error handler
    • no optimization
    • don't resize the buffer at exit

    Hum, resize the buffer is maybe a good idea to not waste memory.

    @vstinner
    Copy link
    Member

    I commited my patch to Python 3.2 (r85765), with a specific test in test_cmd_line. Reopen the issue if the bug is not fixed.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    tests Tests in the Lib/test dir topic-unicode type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants