This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Subprocess using list vs string
Type: behavior Stage:
Components: Versions: Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: jweese, nik-sm
Priority: normal Keywords:

Created on 2020-02-20 00:11 by nik-sm, last changed 2022-04-11 14:59 by admin.

Messages (6)
msg362294 - (view) Author: Niklas Smedemark-Margulies (nik-sm) Date: 2020-02-20 00:11
Most (all?) of the functions in subprocess (run, Popen, etc) are supposed to accept either list or string, but the behavior when passing a list differs (and appears to be wrong).

For example, see below - invoking the command "exit 1" should give a return code of 1, but when using a list, the return code is 0.


```
>>> import subprocess


>>> # Example using run
>>> res1 = subprocess.run('exit 1', shell=True)
>>> res1.returncode
1
>>> res2 = subprocess.run('exit 1'.split(), shell=True)
>>> res2.returncode
0


>>> # Example using Popen
>>> p1 = subprocess.Popen('exit 1', shell=True)
>>> p1.communicate()
(None, None)
>>> p1.returncode
1
>>> p2 = subprocess.Popen('exit 1'.split(), shell=True)
>>> p2.communicate()
(None, None)
>>> p2.returncode
0
```
msg362323 - (view) Author: Jonny Weese (jweese) Date: 2020-02-20 15:39
I believe this behavior is expected (at least in posix-land).

Lib/subprocess.py L1702 shows that whenever shell=True, the args that are constructed are [unix_shell, "-c"] + args.

And so we can reproduce your behavior just using a regular shell. (This is Darwin but with a recent bash from homebrew):

$ bash -c 'exit 1'  # like subprocess string case
$ echo $?
1
$ bash -c exit 1  # like subprocess list case (note args are separated)
$ echo $?
0
msg362333 - (view) Author: Niklas Smedemark-Margulies (nik-sm) Date: 2020-02-20 17:09
Thanks very much for getting back to me so quickly, and for identifying the reason for the difference in behavior.

Sorry to harp on a relatively small behavior, but it cost me a few hours and it might cause confusion for others as well.

It still seems like an oversight that the body of a program invoked by `bash -c` would not be quoted. Consider the following two examples:

$ bash -c echo my critical data > file.txt
$ cat file.txt 

$ # My data was lost!

Or again in Python:

>>> import subprocess
>>> res1 = subprocess.run(['echo', 'my', 'critical', 'data', '>', 'file.txt'], shell=True, capture_output=True)
>>> res1.returncode
0
>>> exit()
$ cat file.txt
cat: file.txt: No such file or directory
$ # The file is not even created!



I know that the subsequent args are stored as bash arguments to the first executable/quoted program, for example:

$ bash -c 'echo $0' foo
foo

or

>>> res1 = subprocess.run(['echo $0', 'foo'], shell=True, capture_output=True)
>>> res1.stdout
b'foo\n'


However, it seems strange/wrong to invoke an executable via "bash -c executable arg1 arg2", rather than just "executable arg1 arg2"! In other words, the combination of `shell=True` with a sequence of args appears to behave surprisingly/wrong.


---


Here's the only part of the docs I could find that discuss the interaction between `shell=True` and args.:
"""
The shell argument (which defaults to False) specifies whether to use the shell as the program to execute. If shell is True, it is recommended to pass args as a string rather than as a sequence.
"""



I think there are ~2 cases here:

1) If there exist use cases for setting `shell=True` and doing "bash -c my_executable arg2 arg3", then the documentation should say something like the following:
"""
Using `shell=True` invokes the sequence of args via `bash -c`. In this case, the first argument MUST be an executable, and the subsequent arguments will be stored as bash parameters for that executable (`$0`, `$1`, etc).
"""

2) The body of the program invoked with `bash -c` should always be quoted. In this case, there should either be a code fix to quote the body, or a `ValueError` when `shell=True` and args is a sequence.


How does this sound from your perspective?
msg362337 - (view) Author: Jonny Weese (jweese) Date: 2020-02-20 17:39
> it seems strange/wrong to invoke an executable via "bash -c executable arg1 arg2", rather than just "executable arg1 arg2"!

I agree it's strange to invoke a single executable that way, but remember that -c allows a string of arbitrary bash code. (It just happens that bash code that consists of a single executable calls it -- useful behavior in a shell.)

Consider:

$ bash -c 'f() { printf "%s\n" "$@"; }; f "$@"' - foo bar baz
foo
bar
baz

> 1) If there exist use cases for setting `shell=True` and doing "bash -c my_executable arg2 arg3", then the documentation should say something like the following:
> """
> Using `shell=True` invokes the sequence of args via `bash -c`. In this case, the first argument MUST be an executable, and the subsequent arguments will be stored as bash parameters for that executable (`$0`, `$1`, etc).
> """

I'd be okay with clearer docs, but the given language is not quite right. For example, the actual shell call is /bin/sh (and depends on the platform). And, as described above, I think it would be too restrictive to say the first argument must be a single executable.

On the other hand, I disagree with option 2. I think raising an error would be very restrictive, and secretly quoting the argument could be surprising for (the few) people who understand the underlying shell mechanism.
msg362340 - (view) Author: Niklas Smedemark-Margulies (nik-sm) Date: 2020-02-20 18:35
Good point - the phrasing I suggested there is not accurate, and there is more complicated behavior available than simply specifying a single executable. Here's the bash manual's info about "-c" flag:

"""
If the -c option is present, then commands are read from the first non-option argument command_string.   If  there  are arguments  after  the  command_string, the first argument is assigned to $0 and any remaining arguments are assigned to the positional parameters.  The assignment to $0 sets the name of the shell, which is used in warning  and  error  messages.
"""

So the command_string provided (the first word or the first quoted expression) is interpreted as a shell program, and this program is invoked with the remaining words as its arguments. As you point out, this command_string can be a terminal expression like `true`, a function definition like you provided, an executable, or other possibilities, but in any case it will be executed with the remaining args.

(This also matches how the library code assigns `executable`: https://github.com/python/cpython/blob/master/Lib/subprocess.py#L1707)

As you say, simply slapping quotes around all the args produces a subtle difference: the arg in the position of `$0` is used as an actual positional parameter in one case, and as the shell name in the other case:

$ bash -c 'f() { printf "%s\n" "$@"; }; f "$@"' - foo bar baz
foo
bar
baz
 $ bash -c 'f() { printf "%s\n" "$@"; }; f "$@" - foo bar baz'
-
foo
bar
baz

(Unless I am misunderstanding the behavior here).

It's a bit frustrating that this approach would not work to simplify the usage, but (assuming my explanation is correct) I concede that code might certainly be depending on this behavior and setting the shell name with args[1] (and they would not want this to become a positional parameter instead).


Improving on my first attempt, here's another possible phrasing for the docs:
"""
Using `shell=True` invokes the sequence of args via `<SHELL> -c` where <SHELL> is the chosen system shell (described elsewhere on this page). In this case, the item at args[0] is a shell program, that will be invoked on the subsequent args. The item at args[1] will be stored in the shell variable `$0`, and used as the name of the shell. The subsequent items at args[2:] will be stored as shell parameters (`$1`, `$2`, etc) and available as positional parameters (e.g. using `echo $@`).
"""

I would certainly be happy to defer on giving a precise and thorough statement for the docs, but clarifying/highlighting this behavior definitely seems useful.

Thanks again
msg362342 - (view) Author: Jonny Weese (jweese) Date: 2020-02-20 18:46
> So the command_string provided (the first word or the first quoted expression) is interpreted as a shell program, and this program is invoked with the remaining words as its arguments.

Correct.

> As you say, simply slapping quotes around all the args produces a subtle difference: the arg in the position of `$0` is used as an actual positional parameter in one case, and as the shell name in the other case

It is not quite just a shifting of the positional args.

$ bash -c 'f() { printf "%s\n"; }; f "$@"' - foo bar baz
=> "From a string, read this bash script, which defines a function f and then invokes f on all of its arguments. Now invoke that script with an executable name of "-" and the arguments "foo" "bar" and "baz".

$ bash -c 'f() { printf "%s\n"; }; f "$@" - foo bar baz'
=> "From a string, read this bash script, which defines f and then invokes f on all the script arguments as well as "-" "foo" "bar" and "baz". Then invoke that script with no other arguments."
History
Date User Action Args
2022-04-11 14:59:26adminsetgithub: 83873
2020-02-20 18:46:18jweesesetmessages: + msg362342
2020-02-20 18:35:47nik-smsetmessages: + msg362340
2020-02-20 17:39:45jweesesetmessages: + msg362337
2020-02-20 17:09:17nik-smsetmessages: + msg362333
2020-02-20 15:39:50jweesesetnosy: + jweese
messages: + msg362323
2020-02-20 00:11:43nik-smcreate