Issue 39692: Subprocess using list vs string

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/83873

classification

Title:	Subprocess using list vs string
Type:	behavior	Stage:
Components:		Versions:	Python 3.8, Python 3.7

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	jweese, nik-sm
Priority:	normal	Keywords:

Created on 2020-02-20 00:11 by nik-sm, last changed 2022-04-11 14:59 by admin.

Messages (6)
msg362294 - (view)	Author: Niklas Smedemark-Margulies (nik-sm)	Date: 2020-02-20 00:11
Most (all?) of the functions in subprocess (run, Popen, etc) are supposed to accept either list or string, but the behavior when passing a list differs (and appears to be wrong). For example, see below - invoking the command "exit 1" should give a return code of 1, but when using a list, the return code is 0. ``` >>> import subprocess >>> # Example using run >>> res1 = subprocess.run('exit 1', shell=True) >>> res1.returncode 1 >>> res2 = subprocess.run('exit 1'.split(), shell=True) >>> res2.returncode 0 >>> # Example using Popen >>> p1 = subprocess.Popen('exit 1', shell=True) >>> p1.communicate() (None, None) >>> p1.returncode 1 >>> p2 = subprocess.Popen('exit 1'.split(), shell=True) >>> p2.communicate() (None, None) >>> p2.returncode 0 ```
msg362323 - (view)	Author: Jonny Weese (jweese)	Date: 2020-02-20 15:39
I believe this behavior is expected (at least in posix-land). Lib/subprocess.py L1702 shows that whenever shell=True, the args that are constructed are [unix_shell, "-c"] + args. And so we can reproduce your behavior just using a regular shell. (This is Darwin but with a recent bash from homebrew): $ bash -c 'exit 1' # like subprocess string case $ echo $? 1 $ bash -c exit 1 # like subprocess list case (note args are separated) $ echo $? 0
msg362333 - (view)	Author: Niklas Smedemark-Margulies (nik-sm)	Date: 2020-02-20 17:09
Thanks very much for getting back to me so quickly, and for identifying the reason for the difference in behavior. Sorry to harp on a relatively small behavior, but it cost me a few hours and it might cause confusion for others as well. It still seems like an oversight that the body of a program invoked by `bash -c` would not be quoted. Consider the following two examples: $ bash -c echo my critical data > file.txt $ cat file.txt $ # My data was lost! Or again in Python: >>> import subprocess >>> res1 = subprocess.run(['echo', 'my', 'critical', 'data', '>', 'file.txt'], shell=True, capture_output=True) >>> res1.returncode 0 >>> exit() $ cat file.txt cat: file.txt: No such file or directory $ # The file is not even created! I know that the subsequent args are stored as bash arguments to the first executable/quoted program, for example: $ bash -c 'echo $0' foo foo or >>> res1 = subprocess.run(['echo $0', 'foo'], shell=True, capture_output=True) >>> res1.stdout b'foo\n' However, it seems strange/wrong to invoke an executable via "bash -c executable arg1 arg2", rather than just "executable arg1 arg2"! In other words, the combination of `shell=True` with a sequence of args appears to behave surprisingly/wrong. --- Here's the only part of the docs I could find that discuss the interaction between `shell=True` and args.: """ The shell argument (which defaults to False) specifies whether to use the shell as the program to execute. If shell is True, it is recommended to pass args as a string rather than as a sequence. """ I think there are ~2 cases here: 1) If there exist use cases for setting `shell=True` and doing "bash -c my_executable arg2 arg3", then the documentation should say something like the following: """ Using `shell=True` invokes the sequence of args via `bash -c`. In this case, the first argument MUST be an executable, and the subsequent arguments will be stored as bash parameters for that executable (`$0`, `$1`, etc). """ 2) The body of the program invoked with `bash -c` should always be quoted. In this case, there should either be a code fix to quote the body, or a `ValueError` when `shell=True` and args is a sequence. How does this sound from your perspective?
msg362337 - (view)	Author: Jonny Weese (jweese)	Date: 2020-02-20 17:39
> it seems strange/wrong to invoke an executable via "bash -c executable arg1 arg2", rather than just "executable arg1 arg2"! I agree it's strange to invoke a single executable that way, but remember that -c allows a string of arbitrary bash code. (It just happens that bash code that consists of a single executable calls it -- useful behavior in a shell.) Consider: $ bash -c 'f() { printf "%s\n" "$@"; }; f "$@"' - foo bar baz foo bar baz > 1) If there exist use cases for setting `shell=True` and doing "bash -c my_executable arg2 arg3", then the documentation should say something like the following: > """ > Using `shell=True` invokes the sequence of args via `bash -c`. In this case, the first argument MUST be an executable, and the subsequent arguments will be stored as bash parameters for that executable (`$0`, `$1`, etc). > """ I'd be okay with clearer docs, but the given language is not quite right. For example, the actual shell call is /bin/sh (and depends on the platform). And, as described above, I think it would be too restrictive to say the first argument must be a single executable. On the other hand, I disagree with option 2. I think raising an error would be very restrictive, and secretly quoting the argument could be surprising for (the few) people who understand the underlying shell mechanism.
msg362340 - (view)	Author: Niklas Smedemark-Margulies (nik-sm)	Date: 2020-02-20 18:35
Good point - the phrasing I suggested there is not accurate, and there is more complicated behavior available than simply specifying a single executable. Here's the bash manual's info about "-c" flag: """ If the -c option is present, then commands are read from the first non-option argument command_string. If there are arguments after the command_string, the first argument is assigned to $0 and any remaining arguments are assigned to the positional parameters. The assignment to $0 sets the name of the shell, which is used in warning and error messages. """ So the command_string provided (the first word or the first quoted expression) is interpreted as a shell program, and this program is invoked with the remaining words as its arguments. As you point out, this command_string can be a terminal expression like `true`, a function definition like you provided, an executable, or other possibilities, but in any case it will be executed with the remaining args. (This also matches how the library code assigns `executable`: https://github.com/python/cpython/blob/master/Lib/subprocess.py#L1707) As you say, simply slapping quotes around all the args produces a subtle difference: the arg in the position of `$0` is used as an actual positional parameter in one case, and as the shell name in the other case: $ bash -c 'f() { printf "%s\n" "$@"; }; f "$@"' - foo bar baz foo bar baz $ bash -c 'f() { printf "%s\n" "$@"; }; f "$@" - foo bar baz' - foo bar baz (Unless I am misunderstanding the behavior here). It's a bit frustrating that this approach would not work to simplify the usage, but (assuming my explanation is correct) I concede that code might certainly be depending on this behavior and setting the shell name with args[1] (and they would not want this to become a positional parameter instead). Improving on my first attempt, here's another possible phrasing for the docs: """ Using `shell=True` invokes the sequence of args via `<SHELL> -c` where <SHELL> is the chosen system shell (described elsewhere on this page). In this case, the item at args[0] is a shell program, that will be invoked on the subsequent args. The item at args[1] will be stored in the shell variable `$0`, and used as the name of the shell. The subsequent items at args[2:] will be stored as shell parameters (`$1`, `$2`, etc) and available as positional parameters (e.g. using `echo $@`). """ I would certainly be happy to defer on giving a precise and thorough statement for the docs, but clarifying/highlighting this behavior definitely seems useful. Thanks again
msg362342 - (view)	Author: Jonny Weese (jweese)	Date: 2020-02-20 18:46
> So the command_string provided (the first word or the first quoted expression) is interpreted as a shell program, and this program is invoked with the remaining words as its arguments. Correct. > As you say, simply slapping quotes around all the args produces a subtle difference: the arg in the position of `$0` is used as an actual positional parameter in one case, and as the shell name in the other case It is not quite just a shifting of the positional args. $ bash -c 'f() { printf "%s\n"; }; f "$@"' - foo bar baz => "From a string, read this bash script, which defines a function f and then invokes f on all of its arguments. Now invoke that script with an executable name of "-" and the arguments "foo" "bar" and "baz". $ bash -c 'f() { printf "%s\n"; }; f "$@" - foo bar baz' => "From a string, read this bash script, which defines f and then invokes f on all the script arguments as well as "-" "foo" "bar" and "baz". Then invoke that script with no other arguments."

History
Date	User	Action	Args
2022-04-11 14:59:26	admin	set	github: 83873
2020-02-20 18:46:18	jweese	set	messages: + msg362342
2020-02-20 18:35:47	nik-sm	set	messages: + msg362340
2020-02-20 17:39:45	jweese	set	messages: + msg362337
2020-02-20 17:09:17	nik-sm	set	messages: + msg362333
2020-02-20 15:39:50	jweese	set	nosy: + jweese messages: + msg362323
2020-02-20 00:11:43	nik-sm	create