This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author eryksun
Recipients LCY, eryksun, paul.moore, steve.dower, tim.golden, zach.ware
Date 2017-04-07.16:56:52
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1491584213.07.0.818860062036.issue30015@psf.upfronthosting.co.za>
In-reply-to
Content
> Windows also treats full-width spaces as a delimiter when parsing 
> command line arguments.

CreateProcess has to parse the beginning of the command-line string if the lpApplicationName parameter is omitted. According to the documentation, it treats "white space" as a delimiter, but it doesn't actually say which characters are in that set. We know for an unquoted name like "python{character}spam" that it will try to execute python.exe if "{character}" is parsed as a space. Otherwise we expect CreateProcess to fail with ERROR_FILE_NOT_FOUND because it's looking for the non-existent file "python{character}spam". Here's a test that checks all characters that Unicode considers to be whitespace, which includes "ideographic space" (U+3000):

    import os
    import sys
    import subprocess

    space_chars = [chr(c) for c in range(sys.maxunicode) if chr(c).isspace()]
    assert '\N{IDEOGRAPHIC SPACE}' in space_chars # U+3000

    def get_create_delims():
        assert not os.path.exists('spam')
        filename = 'python{}spam'
        basepath = os.path.dirname(sys.executable)
        delims = []
        for space in space_chars:
            path = os.path.join(basepath, filename.format(space))
            assert not os.path.exists(path)
            try:
                subprocess.check_output(path, stderr=subprocess.STDOUT)
            except FileNotFoundError:
                pass # not a delimiter
            except subprocess.CalledProcessError:
                delims.append(space)
            else:
                assert False, 'python.exe should have failed'
        return delims


    >>> get_create_delims()
    ['\t', ' ']

CreateProcess considers only space and horizontal tab as white-space delimiters, at least on this Windows 10 system.

Otherwise Windows itself doesn't care about the command line. It's up to each application to parse its command line however it wants. subprocess.list2cmdline assumes an application uses argv from Microsoft's C runtime. The Windows shell function CommandLineToArgvW is supposed to follow the same rules. The following calls CommandLineToArgvW on a test command-line string for each character in the space_chars set:

    import ctypes
    from ctypes import wintypes

    shell32 = ctypes.WinDLL('shell32', use_last_error=True) 

    PLPWSTR = ctypes.POINTER(wintypes.LPWSTR)
    shell32.CommandLineToArgvW.restype = PLPWSTR

    def cmdline2argv(cmdline):
        argc = ctypes.c_int()
        pargv = shell32.CommandLineToArgvW(cmdline, ctypes.byref(argc))
        if not pargv:
            raise ctypes.WinError(ctypes.get_last_error())
        return pargv[:argc.value]

    def get_argv_delims():
        cmdline = 'test{}space'
        delims = []
        for space in space_chars:
            if len(cmdline2argv(cmdline.format(space))) > 1:
                delims.append(space)
        return delims


    >>> get_argv_delims()
    ['\t', '\n', '\x0b', '\x0c', '\r', '\x1c', '\x1d', '\x1e', '\x1f', ' ']

In addition to space and horizontal tab, CommandLineToArgvW also considers line feed, vertical tab, form feed, carriage return, file separator, group separator, record separator, and unit separator to be white-space delimiters. This disagrees with [1], which says it should be limited to space and horizontal tab, like CreateProcess. Let's test this as well:

    def get_msvc_argv_delims():
        template = '"{}" -c "import sys;print(len(sys.argv))" test{}space'
        delims = []
        for space in space_chars:
            cmdline = template.format(sys.executable, space)
            out = subprocess.check_output(cmdline)
            argc = int(out)
            if argc > 2:
                delims.append(space)
        return delims


    >>> get_msvc_argv_delims()
    ['\t', ' ']

Apparently CommandLineToArgvW is inconsistent with the C runtime in this case.

On my Windows 10 system, ideographic space (U+3000) is not generally a command-line delimiter. That's not to say that some applications (and maybe localized CRTs?) don't use it that way. But I don't think it's the place of the subprocess module to handle it.

[1]: https://msdn.microsoft.com/en-us/library/17w5ykft
History
Date User Action Args
2017-04-07 16:56:53eryksunsetrecipients: + eryksun, paul.moore, tim.golden, zach.ware, steve.dower, LCY
2017-04-07 16:56:53eryksunsetmessageid: <1491584213.07.0.818860062036.issue30015@psf.upfronthosting.co.za>
2017-04-07 16:56:53eryksunlinkissue30015 messages
2017-04-07 16:56:52eryksuncreate