Title: subprocess.list2cmdline() should not escape wrapping single/double quotes
Type: behavior Stage:
Components: Windows Versions: Python 3.9
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: eryksun, kejxu, paul.moore, steve.dower, tim.golden, zach.ware
Priority: normal Keywords:

Created on 2019-07-23 17:17 by kejxu, last changed 2019-07-24 18:47 by steve.dower.

Messages (3)
msg348342 - (view) Author: James Xu (kejxu) Date: 2019-07-23 17:17
While working on our project, we have noticed that for `subprocess.Popen(command, ...)`, when `command` is a string that contains escaped double quote, for example, `command = '"path to executable" --flag arg'`, this works fine. However, when command is changed to `shlex.split(command, posix=False)`, the Popen command fails. Looking a bit into the source code, it seems that for the command above,

>>> shlex.split('"path to executable" --flag arg', posix=False)
['"path to executable"', '--flag', 'arg']

and when this array of strings gets passed into `Popen`, the escaped double quote gets escaped again, since `subprocess.list2cmdline` does not check if a pair of double quote or single quote are wrapping the entire string `arg`. And this is the same behavior for both py2 and py3, As a result, upon execution the command becomes, `'"\\"path to executable\\"" --flag arg'`

>>> sp.list2cmdline(['"do things"'])
'"\\"do things\\""'
>>> sp.list2cmdline(['do things'])
'"do things"'
msg348343 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2019-07-23 18:22
The behavior of list2cmdline with double quotes is intentional. It supports passing literal quote characters in the command line for applications that use VC++ argv parsing, WINAPI CommandLineToArgvW, or in general any application that adheres to these rules when parsing its command line [1]. (Not all do -- such as cmd.exe -- in which case we have to pass a custom command line instead of relying on listcmdline.) 

If we have a command line already, then generally the best thing to do in Windows is pass it as is. Don't split it and rebuild it via list2mdline.

The problem I see is using shlex.split in Windows. posix=False doesn't  mean it can handle Windows command lines properly. The shlex module is meant to tokenize a command line like a Unix shell. With posix=False, quote characters aren't stripped out, i.e. it preserves the double quotes in '"spam"'. But with posix=True it's just as wrong for Windows because it tokenizes "'spam & eggs'" as ['spam & eggs']. This is wrong because single quotes generally have no special meaning in Windows command lines (certainly not for CreateProcessW, CommandLineToArgvW, and VC++ argv handling). They should be retained as literal characters. Thus the proper result in Windows is "'spam & eggs'" -> ["'spam", '&', "eggs'"].

msg348400 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-07-24 18:47
Maybe we need to finally make shlex.split() at least be consistent with list2cmdline() on Windows so it can round-trip, perhaps with a "windows=True" parameter.

I don't think it's unreasonable to aim for round-tripability. That isn't near as hard as "correctly quoting arguments in the absence of a specification for how to quote arguments".
Date User Action Args
2019-07-24 18:47:58steve.dowersetmessages: + msg348400
2019-07-23 18:22:23eryksunsetnosy: + eryksun
messages: + msg348343
2019-07-23 17:17:40kejxucreate