classification
Title: subprocess.check_output() docs misrepresent what shell=True does
Type: behavior Stage: patch review
Components: Documentation, Library (Lib) Versions: Python 3.6, Python 3.4, Python 3.5, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Tuomas.Savolainen, docs@python, klausman, martin.panter, r.david.murray
Priority: normal Keywords: patch

Created on 2014-01-22 14:25 by klausman, last changed 2015-06-20 06:25 by martin.panter.

Files
File name Uploaded Description Edit
issue20344.patch Tuomas.Savolainen, 2014-03-21 08:11 Patch to issue 20344 review
documentation20344.patch Tuomas.Savolainen, 2014-03-25 19:08 review
20344_3.patch Tuomas.Savolainen, 2014-03-27 13:05 review
20344_4.patch martin.panter, 2015-06-20 06:25 review
Messages (12)
msg208811 - (view) Author: Tobias Klausmann (klausman) Date: 2014-01-22 14:25
The subprocess docs state that the first argument can be either a string or an iterable that contains the program and arguments to run. It also points out that using shell=True allows for shell constructs. It does not mention a subtlety that is introduced by Python's use of sh -c (on Unix):

Using a string that contains the full command line with args and shell=False breaks as expected:

>>> subprocess.check_output("ls foo", shell=False)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.3/subprocess.py", line 576, in check_output
    with Popen(*popenargs, stdout=PIPE, **kwargs) as process:
  File "/usr/lib64/python3.3/subprocess.py", line 824, in __init__
    restore_signals, start_new_session)
  File "/usr/lib64/python3.3/subprocess.py", line 1448, in _execute_child
    raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: 'ls foo'

Using shell=True instead works as expected (since foo does not exist, ls exits with non-0 and a different Exception is raised. This, too is to spec and exactly what the docs describe.

>>> subprocess.check_output("ls foo", shell=True)
ls: cannot access foo: No such file or directory
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.3/subprocess.py", line 589, in check_output
    raise CalledProcessError(retcode, process.args, output=output)
subprocess.CalledProcessError: Command 'ls foo' returned non-zero exit status 2

Using shell=False and making the command a list of command and args does the same thing:

>>> subprocess.check_output(["ls", "foo"], shell=False)
ls: cannot access foo: No such file or directory
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.3/subprocess.py", line 589, in check_output
    raise CalledProcessError(retcode, process.args, output=output)
subprocess.CalledProcessError: Command '['ls', 'foo']' returned non-zero exit status 2

But using an iterable _and_ requesting a shell results in something very broken:

>>> subprocess.check_output(["ls", "foo"], shell=True)
[contents of my homedir are returned]
>>>

Note that the argument "foo" completely disappears, apparently. strace() reveals that "foo" is never added to the call to "ls". Instead, it becomes $0 in the shell that ls runs in. This is exactly to spec (i.e. bash is not broken). "bash -c foo bar baz" will start a shell that sets $0 to "bar", $1 to "baz" and then executes "foo". Whereas "bash -c 'foo bar baz'" will run 'foo bar baz'.

I think there are three possible fixes here:
1- Make check_output(list, shell=True) run something like "bash -c '%s'" % " ".join(list)
2- Change the docs to warn of the unintended consequences of combining lists with shell=True
3- Make check_output() throw an Exception if the first argument is a list and shell=True

The first option would make it easiest for people to "just use it", but I fear the string manipulation may become the source of bugs with security implications in the future.

The second option keeps the status quo, but people tend to not read docs, so it may not be as effective as one would like, especially since shell-True already has warnings and people tend to go "yeah I know about unverified input, but it's not a problem for me" and then ignore both warnings.

Option 3 has the disadvantage that existing scripts that _expect_ the behavior may break.
msg208814 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-01-22 14:47
This was discussed in issue 6760.  There I advocated (and still advocate) option (3).  Unfortunately it looks like our doc update in that issue actually *lost* the documentation for your option (2) that used to be there, though it wasn't exactly clear what it meant when it was there.

So IMO we should do a doc update to add that info back in a clearer form, and also do a deprecation cycle to disallow shell=True with a sequence.  I haven't ever gotten consensus from the other devs on the latter, though.
msg214332 - (view) Author: Tuomas Savolainen (Tuomas.Savolainen) * Date: 2014-03-21 08:11
Made a patch that throws exception as suggested:
"3- Make check_output() throw an Exception if the first argument is a list and shell=True"
msg214364 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-03-21 13:38
Thanks Tuomas, but we don't have any consensus that that kind of change will be accepted.  It's just my opinion that it should be...and if it was, it would have to start with a deprecation, not raising an exception.

What we need as a patch for this issue is a documentation patch explaining why using shell=True with a list is a bad idea in the 'frequently used arguments' section.  The paragraph about 'args' in the frequently used arguments section should say something like 'using shell=True with a list containing more than one element does not currently raise an error, but it will not do what you expect; only the first element of the list will be executed on unix, and on windows the quoting is likely to be incorrect.  See the Popen constructor documentation below for a more precise description.'

(You can see why I think this usage should be deprecated, I hope ;)
msg214845 - (view) Author: Tuomas Savolainen (Tuomas.Savolainen) * Date: 2014-03-25 19:08
Created a patch that adds notice of using shell=True and iterable to the documentation. Please do comment if the formatting is wrong (this my first documentation patch).
msg214943 - (view) Author: Tobias Klausmann (klausman) Date: 2014-03-27 08:48
Hi! 

On Tue, 25 Mar 2014, Tuomas Savolainen wrote:
> Created a patch that adds notice of using shell=True and iterable to the documentation. Please do comment if the formatting is wrong (this my first documentation patch).

I'd use articles, i.e. "and a list" and "does not raise an error" 

Also, s/except/expect/

Regards,
Tobias
msg214949 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-03-27 13:03
Also, the "see below" sentence is missing.
msg214950 - (view) Author: Tuomas Savolainen (Tuomas.Savolainen) * Date: 2014-03-27 13:05
Corrected the spelling of the patch.
msg243632 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-05-19 22:51
“Only the first element . . . will be executed on Unix” is a bit simplistic. The behaviour is already described more fully in the “shell” documentation under <https://docs.python.org/dev/library/subprocess.html#subprocess.Popen>.

Also, “args” is allowed to be a _sequence_, but an _iterable_ is a more general concept.
msg243638 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-05-20 01:30
Can you suggest a better phrasing that is still succinct? (Maybe just 'appear to be executed' would make it accurate enough?)  But, that's why there needs to be a reference of *some* sort to the full explanation.

I'm not sure what the issue is with iterable/sequence.  check_output/Popen accepts an iterable.
msg243642 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-05-20 05:09
Actually David I didn’t notice your suggested wording <https://bugs.python.org/issue20344#msg214364> before. Adding that last sentence, pointing to the full description of the Popen constructor, would be fine.

My complaint about mentioning “iterable” is that iterables are not mentioned anywhere else in the documentation. I would rather leave it out or change it to “sequence”, unless the rest of the documentation is made consistent, e.g. “args . . . should be a string, or a sequence of program arguments” would also need fixing.
msg245542 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-06-20 06:25
I think it might be better to leave the platform-specific details in the full Popen description, not under Frequently Used Arguments. I suggest to use 20344_4.patch:

* Move existing pointer to Popen constructor details up to top of section
* Explain the escaping and quoting of a sequence only applies to Windows
* Remove claim that a string simply names a program without arguments; appears to be only half true under Windows
* Clarify shell=True command is normally a string, not a sequence
* Drop new description of Unix shell argument sequence in favour of existing description under Popen
* Move warning about Windows shell argument sequence under Popen
History
Date User Action Args
2017-04-11 10:31:41martin.panterlinkissue30041 superseder
2015-10-19 01:21:59martin.panterlinkissue13195 superseder
2015-06-20 06:25:51martin.pantersetfiles: + 20344_4.patch

stage: needs patch -> patch review
messages: + msg245542
versions: + Python 3.5, Python 3.6, - Python 3.3
2015-05-20 05:09:42martin.pantersetmessages: + msg243642
2015-05-20 01:30:25r.david.murraysetmessages: + msg243638
2015-05-19 22:51:09martin.pantersetnosy: + martin.panter
messages: + msg243632
2014-03-27 13:05:51Tuomas.Savolainensetfiles: + 20344_3.patch

messages: + msg214950
2014-03-27 13:03:40r.david.murraysetmessages: + msg214949
2014-03-27 08:48:33klausmansetmessages: + msg214943
2014-03-25 19:08:03Tuomas.Savolainensetfiles: + documentation20344.patch

messages: + msg214845
2014-03-21 13:38:22r.david.murraysetmessages: + msg214364
2014-03-21 08:11:51Tuomas.Savolainensetfiles: + issue20344.patch

nosy: + Tuomas.Savolainen
messages: + msg214332

keywords: + patch
2014-01-22 14:47:52r.david.murraysetversions: + Python 2.7, Python 3.4
nosy: + r.david.murray

messages: + msg208814

stage: needs patch
2014-01-22 14:25:05klausmancreate