Issue 27050: Demote run() below the high level APIs in subprocess docs

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/71237

classification

Title:	Demote run() below the high level APIs in subprocess docs
Type:	enhancement	Stage:	needs patch
Components:	Documentation	Versions:	Python 3.6, Python 3.5

process

Status:	closed	Resolution:	not a bug
Dependencies:		Superseder:
Assigned To:	docs@python	Nosy List:	akira, docs@python, gregory.p.smith, ncoghlan, r.david.murray, takluyver
Priority:	high	Keywords:

Created on 2016-05-18 05:11 by ncoghlan, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (12)
msg265806 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2016-05-18 05:11
The new subprocess.run() API is a swiss-army-knife API like subprocess.Popen before it. 1. You have to pass a boolean toggle to indicate whether or not you want the return code checked 2. You have to pass magic constants to keyword arguments to indicate whether or not you want output captured 3. You have to understand and deconstruct a complex object in order to get useful information from it By contrast, the actual high-level API encodes all those requests in the name of the function you call. (This isn't a request to change anything functional, it's a request to undo the harm done to the subprocess documentation by backing away from the claim that this is a high level API on par with call, check_call and check_output - it's not, it's just not quite as low level as subprocess.Popen)
msg265808 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2016-05-18 05:29
Likely revised structure: * Move the high level convenience API back to the first section under a subheading like "Common operations" * Give the new run() function its own subheading like "Flexible command invocation" * Move the exception definitions to the Exceptions subheading * give the stream constants their own subheading after "Frequently Used Arguments" * consider whether or not the "Using the subprocess module" heading level is worthwhile, or whether it makes more sense to pull all its subheading up one level
msg265819 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2016-05-18 07:48
FWIW i consider the whole subprocess module doc to be pretty unapproachable and in need of refactoring. Your suggestions sound like good ones. We should sit down and make it sane at pycon.
msg265820 - (view)	Author: Thomas Kluyver (takluyver) *	Date: 2016-05-18 08:08
I'm obviously biased, but I find the 'high level convenience API' less convenient than the run() function: there are three different functions for the same basic operation, they're not clearly named (check_output is nothing to do with checking output), and there are things that should be simple but they can only do awkwardly (i.e. capturing both output and the exit code). Once I can depend on Python >= 3.5, I hope to never use call/check_call/check_output again. Using run() might make code slightly longer, but I think it also makes it clearer. I accept that the trio can probably never be removed, but this is why I demoted them a long way down the docs. Unfortunately I won't be at PyCon this year to discuss this.
msg265828 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2016-05-18 14:23
Just as a data point, I always found it confusing when viewing the subprocess docs that I had to page past the 'convenience functions' to get to what I consider to be the "real" docs, the docs for Popen. I rarely use the convenience functions; I find it harder to remember how they work than just using Popen, and in any case I usually end up writing an application-specific 'cmd' function that uses Popen. I haven't tried using run yet, so I don't really have an opinion on that. I suppose its like carrying a swiss army knife in your pocket rather than carrying a steak knife, a pair of scissors, and a screwdriver. If you were carrying a toolbox with all the tools, it might be different, but your choices are three tools or the swiss army knife, and to make matters worse those individual tools are really just the swiss army knife in disguise, but it looks different enough to be confusing. Maybe having table of links at the start of the doc would be the most helpful, regardless of how the docs are re-organized.
msg265841 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2016-05-19 01:36
For reference, http://bugs.python.org/issue13237 covers the original change to emphasise the convenience functions over the full flexibility of Popen. The intent behind that reorganisation was to make it straightforward to get started with the module without needing to first learn what a pipe is (or much of anything about command invocation other than the value of breaking things up into lists to ensure correct quoting). The direct link to the Popen docs in the introductory text was intended to make it easy for advanced users to skip directly to the full Popen docs. In that context, the three convenience operations are "I don't really care if this works or not", "complain loudly if this doesn't work" and "capture the command output, while complaining loudly if it doesn't work". (Not coincidentally, these are roughly comparable to the operations supported by Perl's system() command and backtick expressions, only with exceptions taking the place of the "$?" magic variable) Once a user understands what a pipe is well enough to have an opinion on the usefulness of operations other than the 3 basic ones, then I agree run() makes for a nice improvement over using Popen objects directly (thank you Thomas!), but it's still lower level than the convenience APIs (since you need more prior knowledge in order to use it effectively). I do like David's suggestion of an introductory table providing quick links to the rest of the documentation. Something in the style of the itertools docs would likely be most suitable: https://docs.python.org/3/library/itertools.html Also +1 on thrashing out the details at the PyCon US sprints, although I'll aim to put an initial draft together before then (so Thomas has a chance to comment on it)
msg265842 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2016-05-19 01:41
As long as I'm refactoring these docs, I may also take a look at adding some notes about text vs binary handling, since that's still problematic as described in http://bugs.python.org/issue6135 The default behaviour is binary pipes, and setting "universal_newlines=True" switches to UTF-8 encoded text pipes, but anything else involves using binary pipes with separate encoding and decoding steps.
msg267866 - (view)	Author: Akira Li (akira) *	Date: 2016-06-08 15:57
> setting "universal_newlines=True" switches to UTF-8 encoded text pipes It uses locale.getpreferredencoding(False) encoding -- something like cp1252,cp1251,etc on Windows, and UTF-8 on nix with proper locale settings. It is ASCII (C/POSIX locale default) if the locale is not set in cron, ssh, init.d scripts, etc. If you need a different character encoding, you could use (instead of universal_newlines=True): pipe = io.TextIOWrapper(process.stdout, encoding=character_encoding) A better spelling for universal_newlines=True would be text_mode=True. A summary table (like in itertools' module) would be nice. check_output() name is unfortunate but it does the right thing and it is not hard to use for a beginner -- once somebody discovers it e.g., via "Running shell command from Python and capturing the output" Stack Overflow question http://stackoverflow.com/questions/4760215/running-shell-command-from-python-and-capturing-the-output Compare: output = check_output([sys.executable, '-c', 'print("abc")']) and output = run([sys.executable, '-c', 'print("abc)'], stdout=PIPE).stdout The latter command doesn't raise exception if the child process fails. A beginner has to know about check=True to do the right thing: output = run([sys.executable, '-c', 'print("abc")'], stdout=PIPE, check=True).stdout It is easier to refer to check_output() if someone asks "How do I get command's output in Python?" I wish call() did what check_call() does and the current call() behavior would be achieved by the opposite parameter e.g. no_raise_on_status=False being the default: rc = call(command, no_raise_on_status=True) If we can't change the interface then check_call() is the answer to "Calling an external command in Python" question http://stackoverflow.com/questions/89228/calling-an-external-command-in-python - check_call(command) -- run command, raise if it fails - output = check_output(command) -- get command's output, raise if it fails. To pass data* to the command via its standard input, pass input=data. To get/pass text (Unicode) instead of bytes, pass universal_newlines=True - check_call("a -- *.jpg \| b 2>&1 >output \| c", shell=True) -- run a shell command as is It is a pity that a list argument such as ["ls", "-l"] is allowed with shell=True These cover the most common operations with a subprocess. Henceforth, run() is more convenient if we don't need to interact with the child process while it is running. For example, if we introduce the word PIPE (a magic object in the kernel that connects processes) then to capture both standard and error streams of the command: cp = run(command, stdout=PIPE, stderr=PIPE) output, errors = cp.stdout, cp.stderr run() allows to get the output and to get the exit status easily: cp.returncode. Explicit cp.stdout_text, cp.stdout_bytes regardless the text mode would be nice. To interact with a child process while it is running, Popen() have to be used directly. There could be buffering and other issues (tty vs. pipe), see "Q: Why not just use a pipe (popen())?" http://pexpect.readthedocs.io/en/latest/FAQ.html#whynotpipe Working with both stdout/stderr or a non-blocking read require threads or asyncio, fcntl, etc. A couple of words should be said about killing a command started with shell=True. (to kill a process tree: set start_new_session=True parameter and call os.killpg()). timeout option doesn't work in this case (it uses Popen.kill()). check_output() unlike check_call() may wait for grandchildren if they inherit the pipe. Mention Job object on Windows e.g., http://stackoverflow.com/questions/23434842/python-how-to-kill-child-processes-when-parent-dies
msg280729 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2016-11-14 05:13
The introduction of run() and its presentation as the preferred interface has effectively reversed much of the progress that had been made in actually making the subprocess module approachable for the simplest use cases like https://twitter.com/fuzzychef/status/798025538237382656 (i.e. the exact case that "subprocess.call()" handles) It does make sense to have run() as an intermediate tier of complexity between the base trio of call/check_call/check_output, and the full configurability of Popen, so it isn't the introduction of the API itself that's problematic, just the way we're currently presenting it.
msg280733 - (view)	Author: Thomas Kluyver (takluyver) *	Date: 2016-11-14 05:47
I still feel that having one function with various options is easier to explain than three separate functions with awkward names and limited use cases (e.g. no capturing output without checking the exit code). The tweeter you replied to said he didn't like subprocess.call(). If you really think the trio is a better starting point, though, you're the one with the power to change the docs ;-) There's more awkwardness in the subprocess API; I suspect that what that tweeter wants is something built around an event loop - like Node - so you can handle output incrementally using events. That's not something that we can easily fix in subprocess, because we don't have a default event loop to attach subprocesses to.
msg280749 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2016-11-14 10:53
Indeed, further discussion showed that what they were after was something more along the lines of sarge.Capture: http://sarge.readthedocs.io/en/latest/overview.html#main-features That is, the ability to start the subprocess running in the background, and access the output line-by-line, but with the precise mechanics of how that works being a hidden implementation detail (just as concurrent.futures hides the details of the inter-process communication for function calls).
msg282388 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2016-12-05 07:05
I've switched to using Vinay Sajip's "sarge" for my own subprocess invocation needs, which uses a similar all-in-one "run" API (albeit quite a different approach to capturing output). Accordingly, I don't think there's anything useful to be done specifically in the context of this issue - any energy that might be spent here would likely be spent more constructively on things like making the pipe encoding easier to configure, or reducing the risk of deadlocks due to OS-level buffer size limits.

History
Date	User	Action	Args
2022-04-11 14:58:31	admin	set	github: 71237
2016-12-05 07:05:09	ncoghlan	set	status: open -> closed resolution: not a bug messages: + msg282388
2016-11-14 10:53:18	ncoghlan	set	messages: + msg280749
2016-11-14 05:47:55	takluyver	set	messages: + msg280733
2016-11-14 05:13:35	ncoghlan	set	priority: normal -> high messages: + msg280729
2016-06-08 15:57:08	akira	set	nosy: + akira messages: + msg267866
2016-05-19 01:41:08	ncoghlan	set	messages: + msg265842
2016-05-19 01:36:51	ncoghlan	set	messages: + msg265841
2016-05-18 14:23:40	r.david.murray	set	nosy: + r.david.murray messages: + msg265828
2016-05-18 08:08:41	takluyver	set	nosy: + takluyver messages: + msg265820
2016-05-18 07:48:10	gregory.p.smith	set	nosy: + gregory.p.smith messages: + msg265819
2016-05-18 05:29:52	ncoghlan	set	assignee: docs@python type: enhancement components: + Documentation versions: + Python 3.5, Python 3.6 nosy: + docs@python messages: + msg265808 stage: needs patch
2016-05-18 05:11:45	ncoghlan	create