Author klausman
Recipients docs@python, klausman
Date 2014-01-22.14:25:04
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1390400705.04.0.503333173013.issue20344@psf.upfronthosting.co.za>
In-reply-to
Content
The subprocess docs state that the first argument can be either a string or an iterable that contains the program and arguments to run. It also points out that using shell=True allows for shell constructs. It does not mention a subtlety that is introduced by Python's use of sh -c (on Unix):

Using a string that contains the full command line with args and shell=False breaks as expected:

>>> subprocess.check_output("ls foo", shell=False)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.3/subprocess.py", line 576, in check_output
    with Popen(*popenargs, stdout=PIPE, **kwargs) as process:
  File "/usr/lib64/python3.3/subprocess.py", line 824, in __init__
    restore_signals, start_new_session)
  File "/usr/lib64/python3.3/subprocess.py", line 1448, in _execute_child
    raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: 'ls foo'

Using shell=True instead works as expected (since foo does not exist, ls exits with non-0 and a different Exception is raised. This, too is to spec and exactly what the docs describe.

>>> subprocess.check_output("ls foo", shell=True)
ls: cannot access foo: No such file or directory
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.3/subprocess.py", line 589, in check_output
    raise CalledProcessError(retcode, process.args, output=output)
subprocess.CalledProcessError: Command 'ls foo' returned non-zero exit status 2

Using shell=False and making the command a list of command and args does the same thing:

>>> subprocess.check_output(["ls", "foo"], shell=False)
ls: cannot access foo: No such file or directory
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.3/subprocess.py", line 589, in check_output
    raise CalledProcessError(retcode, process.args, output=output)
subprocess.CalledProcessError: Command '['ls', 'foo']' returned non-zero exit status 2

But using an iterable _and_ requesting a shell results in something very broken:

>>> subprocess.check_output(["ls", "foo"], shell=True)
[contents of my homedir are returned]
>>>

Note that the argument "foo" completely disappears, apparently. strace() reveals that "foo" is never added to the call to "ls". Instead, it becomes $0 in the shell that ls runs in. This is exactly to spec (i.e. bash is not broken). "bash -c foo bar baz" will start a shell that sets $0 to "bar", $1 to "baz" and then executes "foo". Whereas "bash -c 'foo bar baz'" will run 'foo bar baz'.

I think there are three possible fixes here:
1- Make check_output(list, shell=True) run something like "bash -c '%s'" % " ".join(list)
2- Change the docs to warn of the unintended consequences of combining lists with shell=True
3- Make check_output() throw an Exception if the first argument is a list and shell=True

The first option would make it easiest for people to "just use it", but I fear the string manipulation may become the source of bugs with security implications in the future.

The second option keeps the status quo, but people tend to not read docs, so it may not be as effective as one would like, especially since shell-True already has warnings and people tend to go "yeah I know about unverified input, but it's not a problem for me" and then ignore both warnings.

Option 3 has the disadvantage that existing scripts that _expect_ the behavior may break.
History
Date User Action Args
2014-01-22 14:25:05klausmansetrecipients: + klausman, docs@python
2014-01-22 14:25:05klausmansetmessageid: <1390400705.04.0.503333173013.issue20344@psf.upfronthosting.co.za>
2014-01-22 14:25:04klausmanlinkissue20344 messages
2014-01-22 14:25:04klausmancreate