Issue 6135: subprocess seems to use local encoding and give no choice

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/50385

classification

Title:	subprocess seems to use local encoding and give no choice
Type:	enhancement	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.6

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	steve.dower	Nosy List:	Arfrever, amaury.forgeotdarc, andrewclegg, berwin22, chris.jerdonek, davispuh, eric.araujo, eryksun, gregory.p.smith, mark, martin.panter, mightyiam, ncoghlan, pitrou, python-dev, r.david.murray, segfaulthunter, srid, steve.dower, vstinner
Priority:	normal	Keywords:	patch

Created on 2009-05-28 07:08 by mark, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
subprocess.patch	segfaulthunter, 2009-06-13 16:12	Add encoding and errors to subprocess.Popen. Based against trunk.
subprocess3.patch	segfaulthunter, 2009-06-13 16:12	Add encoding and errors to subprocess.Popen. Based against py3k.	review
test_subprocess3.py.patch	segfaulthunter, 2009-06-13 16:46	unittests for encoding parameter	review
subProcessTest.py	berwin22, 2013-01-17 22:14	Workaround example. Calls itself as a sub-process.
6135_1.patch	steve.dower, 2016-09-06 00:33		review
6135_2.patch	steve.dower, 2016-09-06 01:30		review
6135_3.patch	steve.dower, 2016-09-06 17:47		review
6135_4.patch	steve.dower, 2016-09-06 22:19		review
6135_5.patch	steve.dower, 2016-09-06 23:25		review

Pull Requests
URL	Status	Linked	Edit
PR 5564	merged	brice.gros, 2018-02-07 00:45
PR 5572	merged	miss-islington, 2018-02-07 00:47
PR 5573	merged	miss-islington, 2018-02-07 00:48

Messages (66)
msg88466 - (view)	Author: Mark Summerfield (mark) *	Date: 2009-05-28 07:08
When I start a process with subprocess.Popen() and pipe the stdin and stdout, it always seems to use the local 8-bit encoding. I tried setting process.stdin.encoding = "utf8" and the same for stdout (where process is the subprocess object), but to no avail. I also tried using shell=True since on Mac, Terminal.app is fine with Unicode, but that didn't work. So basically, I have programs that output Unicode and run fine on the Mac terminal, but that cannot be executed by subprocess because subprocess uses the mac_roman encoding instead of Unicode. I wish it were possible to specify the stdin and stdout encoding that is used; then I could use the same one on all platforms. (But perhaps it is possible, and I just haven't figured out how?)
msg89094 - (view)	Author: Sridhar Ratnakumar (srid)	Date: 2009-06-08 18:08
Related discussion thread: https://answers.launchpad.net/bzr/+question/63601
msg89293 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2009-06-12 18:10
I propose to add two parameters (encoding, error) to the subprocess.Popen function. - python 2.x could build and return codecs.StreamReader objects - python 3.x would just pass these parameters to io.TextIOWrapper I'll try to come with a patch.
msg89322 - (view)	Author: Florian Mayer (segfaulthunter)	Date: 2009-06-13 12:07
I wrote a patch to add encoding and error to subprocess.Popen in Python 2.7 (trunk).
msg89325 - (view)	Author: Florian Mayer (segfaulthunter)	Date: 2009-06-13 12:32
Cosmetic update.
msg89332 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-06-13 16:07
Two things: 1. The argument should be called `errors` for consistency with open() and TextIOWrapper(), not `error` 2. You should add some unit tests.
msg89333 - (view)	Author: Florian Mayer (segfaulthunter)	Date: 2009-06-13 16:18
Should we also cover the unusual case where stdout, stderr and stdin have different encodings, because now we are assuming the are all the same.
msg97090 - (view)	Author: Mark Summerfield (mark) *	Date: 2009-12-31 12:58
I agree with Florian Mayer that the encoding handling should be stream-specific. You could easily be reading the stdout of some third party program that uses, say, latin1, but want to do your own output in, say, utf-8. One solution that builds on what Amaury Forgeot d'Arc has done (i.e., the encoding and errors parameters) by allowing those parameters to accept either a single string (as now), or a dict with keys 'stdin', 'stdout', 'stderr'. Of course it is possible that the client might not specify all the dict's keys in which case those would use the normal default (local 8-bit etc.)
msg97092 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2009-12-31 13:30
I don't understand. How is the subprocess stdout related to the main program output? Stream-specific encoding could be useful for subprocesses that expect latin-1 from stdin but write utf-8 to stdout. I'm not sure we should support this.
msg97093 - (view)	Author: Mark Summerfield (mark) *	Date: 2009-12-31 13:43
On Thu, Dec 31, 2009 at 1:30 PM, Amaury Forgeot d'Arc <report@bugs.python.org> wrote: > > Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment: > > I don't understand. How is the subprocess stdout related to the main > program output? > Stream-specific encoding could be useful for subprocesses that expect > latin-1 from stdin but write utf-8 to stdout. I'm not sure we should > support this. Yes, you're right. (What I had in mind was a scenario where you read one process's stdout and wrote to another process's stdin; but of course using your errors & encoding arguments this will work because there'll be two separate process objects each of which can have its encoding and errors set separately.)
msg111181 - (view)	Author: Mark Lawrence (BreamoreBoy) *	Date: 2010-07-22 15:32
Ran new unit test before and after patching subprocess on Windows Vista against 3.1 debug maintenance release, all ok apart from this at end of latter. File "test\test_subprocess.py", line 568, in test_encoded_stderr self.assertEqual(p.stderr.read(), send) AssertionError: 'ï[32943 refs]\r\n' != 'ï' I'm sure I've seen a ref to this somewhere, can anyone remember where?
msg111466 - (view)	Author: Mark Lawrence (BreamoreBoy) *	Date: 2010-07-24 12:22
In 2.7 and py3k test_subprocess has a class BaseTestCase which has a assertStderrEqual method. These don't exist in 3.1. I believe that the py3k code will need to be backported to 3.1. Can this be done on this issue, or do we need a new one to keep things clean?
msg123020 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-12-01 23:14
About the topic: > subprocess seems to use local 8-bit encoding and gives no choice I don't understand that: by default, Python 2 and Python 3 use byte strings, so there is no encoding (nor error handler). I don't see how you can get unicode from a process only using subprocess. But with Python 3, you can get unicode if you set universal_newlines option to True. So for Python 2, it's a new feature (get unicode), and for Python 3, it's a new option to specify the encoding. The title should be changed to something like "subprocess: add an option to specify stdin, stdout and/or stderr encoding and errors" and the type should be changed to "feature request". Or am I completly wrong?
msg123024 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-12-01 23:40
> ... it always seems to use the local 8-bit encoding The locale encoding is not necessary a 8-bit encoding, it can by a multibyte like... UTF-8 :-) -- subprocess.patch: You should maybe use io.open(process.stdout.fileno(), encoding=..., errors=...) instead of codecs.getreader/getwriter. The code will be closer to Python 3. I think that the io module has a better support of unicode than codec reader/writer objects and a nicer API. See: http://bugs.python.org/issue8796#msg106339 -- > ... allowing [encoding and errors] to accept either a single string > (as now), or a dict with keys 'stdin', 'stdout', 'stderr' I like this idea. But what about other TextIOWrapper (or other file classes) options: buffer size, newline, line_buffering, etc.? Why not using a dict for existing stdin, stdout and stderr arguments? Dummy example: process = Popen( command, stdin={'file': PIPE, 'encoding': 'iso-8859-1', 'newline': False}, stdout={'file': PIPE', 'encoding': 'utf-8', 'buffering': 0, 'line_buffering': False}, ...) If stdin, stdout or stderr is a dict: the default value of its 'file' key can be set to PIPE. I don't think that it's possible to choose the encoding, buffer size, or anything else if stdin, stdout or stderr is not a pipe. With this solution, you cannot specify the encoding for stdin, stdout and stderr at once. You have at least to repeat the encoding for stdin and stdout (and use stderr=STDOUT). -- I still hesitate to accept this feature request. Is it really needed to add extra arguments for TextIOWrapper? Can't the developer create its own TextIOWrapper object with all interesting options? In Python 3, be able to specify stdout encoding is an interesting feature. Control the buffers size is also an interesting option. My problem is maybe the usage of a dict to specify various options. I'm not sure that it is extensible to support future needs.
msg148025 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2011-11-21 01:56
I discovered this same problem recently when updating the subprocess docs, and also in working on the improved shell invocation support I am proposing for 3.3 (#13238). I initially posted an earlier variant this suggestion as a new issue (#13442), but Victor redirected me here. Firstly, I don't think it makes any sense to set encoding information globally for the Popen object. As a simple example, consider using Python to write a test suite for the iconv command line tool: there's only one Popen instance (for the iconv call), but different encodings for stdin and stdout. Really, we want to be able to make full use of Python 3's layered I/O model, but we want the subprocess pipe instances to be slotted in at the lowest layer rather than creating them ourselves. The easiest way to do that is to have a separate class that specifies the additional options for pipe creation and does the wrapping: class TextPipe: def __init__(self, args, kwds): self.args = args self.kwds = kwds def wrap_pipe(self, pipe): return io.TextIOWrapper(pipe, self.args, **self.kwds) The stream creation process would then include a new "wrap = getattr(stream_arg, 'wrap_pipe', None)" check that is similar to the existing check for subprocess.PIPE, but invokes the method to wrap the pipe after creating it. So to read UTF-8 encoded data from a subprocess, you could just do: data = check_stdout(cmd, stdout=TextPipe('utf-8'), stderr=STDOUT)
msg148066 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2011-11-21 17:31
> Firstly, I don't think it makes any sense to set encoding information > globally for the Popen object. As a simple example, consider using > Python to write a test suite for the iconv command line tool: there's > only one Popen instance (for the iconv call), but different encodings > for stdin and stdout. Isn't that the exception rather than the rule? I think it actually makes sense, in at least 99.83% of cases ;-), to have a common encoding setting for all streams. (I'm not sure about the "errors" setting, though: should we use strict for stdin/stdout and backslashreplace for stderr, as the interpreter does?) Perhaps the common case should be made extra easy.
msg148484 - (view)	Author: Éric Araujo (eric.araujo) *	Date: 2011-11-28 14:36
Is subprocess affected by PYTHONIOENCODING?
msg148494 - (view)	Author: STINNER Victor (vstinner) *	Date: 2011-11-28 15:47
> Is subprocess affected by PYTHONIOENCODING? Yes, as any Python process.
msg148495 - (view)	Author: Éric Araujo (eric.araujo) *	Date: 2011-11-28 15:48
So the users can control the encoding, and this is a doc bug.
msg148496 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2011-11-28 16:00
If you decide this is only a doc bug, please see also related issue 12832.
msg148498 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2011-11-28 16:06
> So the users can control the encoding, and this is a doc bug. Not really. People can control the encoding in the child process (and only if it's a Python 3 process of course). They can't control the encoding in the parent's subprocess pipes and that's what the request (& patch) is about.
msg168191 - (view)	Author: Chris Jerdonek (chris.jerdonek) *	Date: 2012-08-14 11:12
> > only one Popen instance (for the iconv call), but different encodings > > for stdin and stdout. > Isn't that the exception rather than the rule? I think it actually makes > sense, in at least 99.83% of cases ;-), to have a common encoding > setting for all streams. FWIW, I recently encountered a scenario (albeit in a test situation) where the ability to set different encodings for stdout and stderr would have been useful to me. It was while creating a test case for issue 15595. I was changing the locale encoding for stdout, but I also wanted to leave it unchanged for stderr because there didn't seem to be a way to control the encoding that the child used for stderr.
msg168213 - (view)	Author: Chris Jerdonek (chris.jerdonek) *	Date: 2012-08-14 16:40
To my previous comment, issue 15648 shows the case where I was able to change the encoding for stdout in the child process but not stderr (which would require supporting two encodings in Popen to handle).
msg180157 - (view)	Author: Joseph Perry (berwin22)	Date: 2013-01-17 22:14
I've found a workaround by specifying the enviroment variable: my_env = os.environ my_env['PYTHONIOENCODING'] = 'utf-8' p = subprocess.Popen(shlex.split(command), stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE, env=my_env) I've attached an example script for testing. It calls itself recursively 10 times. Pleased note the 'fix' variable.
msg265785 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-05-17 22:43
This seems like a request for extra feature(s), rather than a bug report. FWIW I agree with Victor’s hesitation in <https://bugs.python.org/issue6135#msg123024>. I doubt it is worth adding special support to have multiple pipes that all use text mode (universal_newlines=True) but with different encoding settings. If you really want this, just add an external TextIOWrapper, or string encoding: process = Popen(command, stdin=PIPE, stdout=PIPE, ...) # Manually wrap with TextIOWrapper. Probably risks deadlocking in the general case, due to buffering, so this is a bad idea IMO. input_writer = io.TextIOWrapper(process.stdin, "iso-8859-1") output_reader = io.TextIOWrapper(process.stdout, "utf-8") # Better: use communicate(), or a custom select() loop or similar: input_string = input_string.encode("iso-8859-1") [output_string, _] = process.communicate(input_string) output_string = output_string.decode("utf-8") Also I’m not enthusiastic about adding encoding etc parameters for a single PIPE scenario. What is wrong with leaving it in binary mode in the Popen call, and encoding, decoding, or using TextIOWrapper as a separate step?
msg265793 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-05-17 23:58
I added some review comments for Florian’s patches. Also, I suspect the code will not affect the text encoding when communicate() is used with multiple PIPE streams.
msg265812 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2016-05-18 06:08
Since this issue was first filed, the "opener" parameter was added to the open() builtin: https://docs.python.org/3/library/functions.html#open I mention that, as the problem there is similar to the problem here: wanting to construct a buffered text-mode interface, but with a different stream at the lowest layer (whatever "opener" returns for the builtin open(), the implicitly created pipes for the subprocess module). To answer Martin's question about "Why not just wrap things manually?", the main issue with that is that it doesn't work for the check_output helper function - to get text out of that, currently your only option is to enable universal newlines and trust that the output is encoded as UTF-8 (which is fortunately often a valid assumption on modern systems).
msg265822 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-05-18 08:27
Here is a tested illustration of how I would suggest to manually handle encoding with check_output(): >>> text_input = "©" >>> args = ("iconv", "--from-code", "ISO_8859-1", "--to-code", "UTF-8") >>> bytes_output = check_output(args, input=text_input.encode("iso-8859-1")) >>> bytes_output b'\xc2\xa9' >>> bytes_output.decode("utf-8") '©' If you wanted actual universal newline translation (which is more than plain text encoding), it would be more complicated, but still possible. But I am not hugely against adding new “encoding” and “errors” parameters. The reasons against are that I don’t see it as necessary, and there are already a lot of subprocess-specific options to wade through.
msg274467 - (view)	Author: Steve Dower (steve.dower) *	Date: 2016-09-05 23:26
I'll prepare a patch for the following: * add encoding and errors parameters to subprocess functions * add 'oem' encoding to make it easy to choose (but don't try and guess when it is used) EITHER * allow string or 2-tuple (stdin, stdout/err) or 3-tuple (stdin, stdout, stderr) for both encoding and errors OR * only allow single specified encoding/errors * add stdin_opener/stdout_opener/stderr_opener parameters to produce the text-based reader Any thoughts?
msg274468 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-09-05 23:29
> * add 'oem' encoding to make it easy to choose (but don't try and guess when it is used) I suggest to open a separated issue for that. By the way, you might also add "ansi"? See also the aliasmbcs() function of the site module. Note: I never liked that aliasmbcs() is implemented in Python and is "optional" (not done with using python3 -S).
msg274469 - (view)	Author: Steve Dower (steve.dower) *	Date: 2016-09-05 23:37
'mbcs' is exactly equivalent to what 'ansi' would be, so that's just a matter of knowing the name. I'm okay with adding an alias for it though.
msg274481 - (view)	Author: Steve Dower (steve.dower) *	Date: 2016-09-06 00:33
Initial patch attached - tests to follow.
msg274489 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-09-06 01:05
I'm not sure that about "advanced API" to specify an encoding per stream, or even change other parameters like buffering or newlines. I suggest to start with the least controversal part: add encoding and errors and only accept a string. More patches can come later.
msg274492 - (view)	Author: Steve Dower (steve.dower) *	Date: 2016-09-06 01:30
Added tests.
msg274497 - (view)	Author: Steve Dower (steve.dower) *	Date: 2016-09-06 02:17
You may be right about leaving out the opener API. The only use of it right now is for separate encodings, but I don't know how valuable that is. I'll pull it out tomorrow and just leave the encoding parameter.
msg274509 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-09-06 03:54
Would be nice to see tests for getstatusoutput() and the errors parameter. Also you need more error handling e.g. if the encoding is unsupported, I think the internal pipe files won’t be cleaned up. Also, should encoding=... or errors=... be an error if universal_newlines=False (the default)?
msg274510 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2016-09-06 04:09
A couple of high level questions: a. How do folks feel about providing a new "text" parameter to replace the cryptic "universal_newlines=True" that would explicitly be equivalent to "universal newlines with sys.getdefaultencoding()"? b. Given (a), what if the new "text" parameter also accepted a new "subprocess.TextConfig" object in addition to the base behaviour of doing a plain bool(text) check to activate the defaults? One of the biggest problems we have with subprocess.Popen is that it still attempts to use a flat configuration model for a complex operation involving 4-7 underlying elements (starting the subprocess, configuring stdin, configuring stdout, configuring stderr, and potentially local counterparts for stdin/stdout/stderr if pipes are used), and that situations unlikely to ever improve unless we start introducing some hierarchical elements to the configuration that better mirror the structure of the underlying system capabilities.
msg274517 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-09-06 04:40
Steve: > You may be right about leaving out the opener API. The only use of it right now is for separate encodings, but I don't know how valuable that is. My proposal is: Popen(cmd, stdin={'encoding': 'oem'}, stdout={'encoding': 'ansi'}) The dict would just be passed to TextIOWrapper, so you can set even more arguments: * encoding * errors * newline * line_buffering * write_through But I still think that simple encoding + errors arguments should be added for the common case : Popen(cmd, encoding='utf8'). You can combine options: Popen(cmd, stdin={'encoding': 'oem'}, stdout=subprocess.PIPE, stderr=subprocess.PIPE, encoding='ansi'): stdout and stderr use the ANSI code page.
msg274518 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-09-06 04:44
Martin Panter added the comment: > Also, should encoding=... or errors=... be an error if universal_newlines=False (the default)? Right. But if encoding or errors is used, universal_newlines value should be set automatically to True. For example, I expect Unicode when writing: output = subprocess.call(cmd, stdout=subprocess.PIPE, encoding='utf8').stdout
msg274519 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-09-06 04:47
Nick Coghlan added the comment: > a. How do folks feel about providing a new "text" parameter to replace the cryptic "universal_newlines=True" that would explicitly be equivalent to "universal newlines with sys.getdefaultencoding()"? If it's just text=True, I don't see the point of having two options with the same purpose. > b. Given (a), what if the new "text" parameter also accepted a new "subprocess.TextConfig" object in addition to the base behaviour of doing a plain bool(text) check to activate the defaults? Can you give an example? I don't see how the API would be used.
msg274561 - (view)	Author: Steve Dower (steve.dower) *	Date: 2016-09-06 15:11
Given specifying an encoding will do the same thing as universal_newlines would have, should I just "hide" references to universal_newlines in the doc? (i.e. only mention it under a versionchanged banner, rather than front-and-centre)
msg274562 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2016-09-06 15:52
Ah, excellent point about "encoding='utf-8'" already being a less cryptic replacement for universal_newlines=True, so consider my questions withdrawn :) As far as universal_newlines goes, that needs to remain documented for the sake of folks reading code that uses it, and folks that need compatibility with older versions, but +1 for recommending specifying an encoding over using that flag for new 3.6+ code.
msg274566 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-09-06 16:37
Steve Dower added the comment: > Given specifying an encoding will do the same thing as universal_newlines would have, should I just "hide" references to universal_newlines in the doc? (i.e. only mention it under a versionchanged banner, rather than front-and-centre) No. Don't hide universal_newlines, it's different than encoding. universal_newlines uses the locale encoding which is a good choice in most cases.
msg274574 - (view)	Author: Steve Dower (steve.dower) *	Date: 2016-09-06 17:47
> universal_newlines uses the locale encoding which is a good choice in most cases. Well, some cases, and only really by accident. I won't hide the parameter, but it is promoted as "frequently used" and I'll make sure to document encoding before universal_newlines. Call it a very weak deprecation. The new patch also removes the openers, which I think were solving a problem we don't have right now.
msg274642 - (view)	Author: Steve Dower (steve.dower) *	Date: 2016-09-06 22:19
Addressed more feedback.
msg274644 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-09-06 22:34
6135_4.patch: Hum, what if you only set errors? I suggest to use TextIOWrapper but use the locale encoding: if (universal_newlines or errors) and not encoding: encoding = getpreferredencoding(False)
msg274646 - (view)	Author: Steve Dower (steve.dower) *	Date: 2016-09-06 22:39
Sure, that's easy enough. Any other concerns once I've made that change?
msg274648 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-09-06 22:46
Steve Dower added the comment: > Sure, that's easy enough. Any other concerns once I've made that change? If errors enables Unicode and the doc is updated, the patch will LTGM :-)
msg274649 - (view)	Author: Eryk Sun (eryksun) *	Date: 2016-09-06 22:50
Why do you need to call getpreferredencoding()? Isn't that already the default if you call TextIOWrapper with encoding as None? For example: text_mode = encoding or errors or universal_newlines self.stdin = io.open(p2cwrite, 'wb', bufsize) if text_mode: self.stdin = io.TextIOWrapper(self.stdin, write_through=True, line_buffering=(bufsize == 1), encoding=encoding, errors=errors)
msg274651 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-09-06 22:51
> Why do you need to call getpreferredencoding()? I proposed to do that, but I prefer your simple flag: > text_mode = encoding or errors or universal_newlines Steve: please use this simpler flag to avoid TextIOWrapper details in subprocess.py ;-)
msg274653 - (view)	Author: Steve Dower (steve.dower) *	Date: 2016-09-06 22:57
> Steve: please use this simpler flag to avoid TextIOWrapper details in subprocess.py The TextIOWrapper details are already specified in multiple places in the documentation. Should we remove all of those and write it more like: "if encoding, errors or universal_newlines are specified, the streams will be opened in text mode using TextIOWrapper." That way we make the class part of the interface, but not the interpretation of the arguments. (Currently it's a big mess of encodings and newline transformations.)
msg274665 - (view)	Author: Steve Dower (steve.dower) *	Date: 2016-09-06 23:25
More doc updates - shouldn't be a difficult review this time, but I always like getting multiple opinions on doc changes.
msg274699 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-09-07 01:51
Maybe good to adjust the other mentions of universal_newlines, e.g. for check_output(). The Posix version of the multiple-pipe _communicate() method probably needs adjusting too. Test case: >>> check_output(("true",), encoding="ascii", input="") # Should be text b'' >>> check_output(("true",), encoding="ascii") # Correct result ''
msg274736 - (view)	Author: Roundup Robot (python-dev)	Date: 2016-09-07 03:17
New changeset 720f0cf580e2 by Steve Dower in branch 'default': Issue #6135: Adds encoding and errors parameters to subprocess https://hg.python.org/cpython/rev/720f0cf580e2
msg274737 - (view)	Author: Steve Dower (steve.dower) *	Date: 2016-09-07 03:18
I also added more tests, specifically for ansi and oem encodings on Windows and the errors argument on all platforms.
msg282291 - (view)	Author: Dāvis (davispuh) *	Date: 2016-12-03 18:50
Still can't specify encoding for getstatusoutput and getoutput. Also it uses wrong encoding by default, most of time it will be console's codepage instead of ANSI codepage which is used now.
msg282294 - (view)	Author: Dāvis (davispuh) *	Date: 2016-12-03 19:19
And looks like only way to specify console's codepage is with `encoding=os.device_encoding(2)` which will have to be used for 99% of cases. I don't see other way...
msg282298 - (view)	Author: Steve Dower (steve.dower) *	Date: 2016-12-03 19:47
> looks like only way to specify console's codepage is with `encoding=os.device_encoding(2)` which will have to be used for 99% of cases That's true, but it only applies when the subprocess is using the same call (GetConsoleCP) to determine what encoding to use to write to something other than the console, which is incorrect. The defaults in this cause ought to be ACP, but it depends entirely on the target application and there is no good "99%" default. To see this demonstrated, check the difference between these two commands (with 3.5 or earlier): python -c "import sys; print(sys.stdin.encoding)" python -c "import sys; print(sys.stdin.encoding)" < textfile.txt When stdin is redirected, the default encoding is not the console encoding (unless it happens to match your ACP). And ACP is not the correct encoding for a file anyway - it just happens to be the default for TextIOWrapper - so to do it properly you need to detect that it's not a console (sys.stdin.isatty()) and then use the raw stream rather than the default wrapper. (Or you specify that your program requires ACP files, or that PYTHONIOENCODING should be set, or that it requires any other encoding and you explicitly switch to that one.) In short, character encoding is hard, and the only way to get it right is for the developer to be aware of it. No good defaults exist.
msg304049 - (view)	Author: Andrew Clegg (andrewclegg) *	Date: 2017-10-10 14:45
The commit for this bug (720f0cf580e2) introduces encoding and errors arguments but doesn't actually document what the values of these should be. In the case of the encoding it could be reasonably guessed, but the only way to determine what the value of 'errors' should be is to make the logical leap and look at the TextIOWrapper doc page. And in reply to message #274510, there certainly should be a 'text=True' argument added. There are countless use cases where text rather than bytes is the expected behaviour, and currently this has to be triggered by using a more obscure option. In any case, it could be argued that text should be returned when shell=True since this mimics the behaviour of any shell program. (Sorry if adding a comment to a closed bug is poor etiquette; it seemed like the best place to put this comment)
msg304060 - (view)	Author: Steve Dower (steve.dower) *	Date: 2017-10-10 17:06
> The commit for this bug (720f0cf580e2) introduces encoding and errors arguments but doesn't actually document what the values of these should be Do you mean it doesn't document that they take the normal encoding/errors arguments that all the other functions that do character encoding take? Or they don't specify which encoding you should use? The latter is not feasible, and if you mean the former then feel free to open a new issue for documentation. Having a single "text" parameter is essentially what we had before - a magic option that modifies your data without giving you any ability to do it correctly. This change brought Popen in line with all the other places where we translate OS-level bytes into Python str, and it's done it in exactly the same way. Also see msg274562 where Nick withdraws his own suggestion of a "text" parameter.
msg304118 - (view)	Author: Andrew Clegg (andrewclegg) *	Date: 2017-10-11 08:41
I meant the former; I'll look a bit more at the documentation and submit an issue/patch. As regards the 'text' flag - universal_newlines is actually exactly that already. I've just checked the code of subprocess.py and the universal_newlines argument is read in only two places: * as one of the deciding factors in whether to use the text mode * in a backwards compatibility clause in check_output So subprocess already has a text mode and a 'magic option' to trigger it. It works well, and is useful in most cases. When the default encoding guess is incorrect, it can easily be corrected by supplying the correct encoding. This is a good situation! What is not so good is the API. I'm teaching a Python course for scientists at the moment. Retrieving text from external processes is an extremely common use case. I would rather not teach them to just have to use 'encoding=utf-8', because arguably teaching a user to supply an encoding without knowing if it's correct is worse than the system guessing. Equally, teaching 'universal_newlines=True' is a bit obscure. The way forward seems to be: * Add a text=True/False argument that is essentially the same as universal_newlines, to improve the API. * Clearly document that this implies that the encoding will be guessed, and that an explicit encoding can be given if the guess is wrong * Optionally, and I have no strong feelings either way on this, remove/deprecate the universal_newlines argument
msg304119 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2017-10-11 08:55
This discussion should probably be submitted as a new RFE (requesting "text" as a more obvious and beginner friendly alias for universal_newlines), but I'll also add a note regarding precedent for a simple binary/text toggle: the mode settings for open(). There, the default is text (with the default encoding determined by the system), and you have to include "b" in the mode settings to say "I want raw binary". For the subprocess APIs, the default is different (i.e. binary), so the natural counterpart to the "b" mode flag in open() would be an explicit "text" mode flag.
msg304126 - (view)	Author: Andrew Clegg (andrewclegg) *	Date: 2017-10-11 10:17
RFE submitted as issue31756 , thanks
msg311756 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2018-02-07 00:46
New changeset fc1ce810f1da593648b4d19e7d582a235ec1dd37 by Gregory P. Smith (Brice Gros) in branch 'master': bpo-6135: Fix subprocess.check_output doc to mention changes in 3.6 (GH-5564) https://github.com/python/cpython/commit/fc1ce810f1da593648b4d19e7d582a235ec1dd37
msg311757 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2018-02-07 01:12
New changeset 4e7a964aaf4374fa2f6b45cf5161fa6cd53aec19 by Gregory P. Smith (Miss Islington (bot)) in branch '3.7': bpo-6135: Fix subprocess.check_output doc to mention changes in 3.6 (GH-5564) (GH-5572) https://github.com/python/cpython/commit/4e7a964aaf4374fa2f6b45cf5161fa6cd53aec19
msg311760 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2018-02-07 02:11
New changeset 7f95c8c319c1ee593b130d0eb1d4947d9d7e008a by Gregory P. Smith (Miss Islington (bot)) in branch '3.6': bpo-6135: Fix subprocess.check_output doc to mention changes in 3.6 (GH-5564) (GH-5573) https://github.com/python/cpython/commit/7f95c8c319c1ee593b130d0eb1d4947d9d7e008a

History
Date	User	Action	Args
2022-04-11 14:56:49	admin	set	github: 50385
2018-02-07 02:11:33	gregory.p.smith	set	messages: + msg311760
2018-02-07 01:12:12	gregory.p.smith	set	messages: + msg311757
2018-02-07 00:48:37	miss-islington	set	pull_requests: + pull_request5391
2018-02-07 00:47:38	miss-islington	set	pull_requests: + pull_request5390
2018-02-07 00:46:33	gregory.p.smith	set	nosy: + gregory.p.smith messages: + msg311756
2018-02-07 00:45:18	brice.gros	set	pull_requests: + pull_request5389
2017-10-11 10:17:58	andrewclegg	set	messages: + msg304126
2017-10-11 08:55:25	ncoghlan	set	messages: + msg304119
2017-10-11 08:41:48	andrewclegg	set	messages: + msg304118
2017-10-10 17:06:38	steve.dower	set	messages: + msg304060
2017-10-10 14:45:38	andrewclegg	set	nosy: + andrewclegg messages: + msg304049
2016-12-03 19:47:19	steve.dower	set	messages: + msg282298
2016-12-03 19:19:37	davispuh	set	messages: + msg282294
2016-12-03 18:50:40	davispuh	set	nosy: + davispuh messages: + msg282291
2016-09-08 02:08:48	ncoghlan	link	issue22555 dependencies
2016-09-07 03:18:30	steve.dower	set	status: open -> closed resolution: fixed messages: + msg274737 stage: patch review -> resolved
2016-09-07 03:17:34	python-dev	set	nosy: + python-dev messages: + msg274736
2016-09-07 01:51:19	martin.panter	set	messages: + msg274699
2016-09-06 23:25:07	steve.dower	set	files: + 6135_5.patch messages: + msg274665
2016-09-06 22:57:22	steve.dower	set	messages: + msg274653
2016-09-06 22:51:53	vstinner	set	messages: + msg274651
2016-09-06 22:50:07	eryksun	set	nosy: + eryksun messages: + msg274649
2016-09-06 22:46:58	vstinner	set	messages: + msg274648
2016-09-06 22:39:29	steve.dower	set	messages: + msg274646
2016-09-06 22:34:21	vstinner	set	messages: + msg274644
2016-09-06 22:19:38	steve.dower	set	files: + 6135_4.patch messages: + msg274642
2016-09-06 17:47:46	steve.dower	set	files: + 6135_3.patch messages: + msg274574
2016-09-06 16:37:04	vstinner	set	messages: + msg274566
2016-09-06 15:52:58	ncoghlan	set	messages: + msg274562
2016-09-06 15:11:49	steve.dower	set	messages: + msg274561
2016-09-06 04:47:40	vstinner	set	messages: + msg274519
2016-09-06 04:44:05	vstinner	set	messages: + msg274518
2016-09-06 04:40:03	vstinner	set	messages: + msg274517
2016-09-06 04:09:57	ncoghlan	set	messages: + msg274510
2016-09-06 03:54:19	martin.panter	set	messages: + msg274509
2016-09-06 02:17:19	steve.dower	set	messages: + msg274497
2016-09-06 01:30:53	steve.dower	set	files: + 6135_2.patch messages: + msg274492
2016-09-06 01:05:45	vstinner	set	messages: + msg274489
2016-09-06 00:33:31	steve.dower	set	files: + 6135_1.patch messages: + msg274481
2016-09-05 23:37:32	steve.dower	set	messages: + msg274469
2016-09-05 23:29:46	vstinner	set	messages: + msg274468
2016-09-05 23:26:53	steve.dower	set	assignee: steve.dower messages: + msg274467 nosy: + steve.dower
2016-09-05 23:24:48	steve.dower	link	issue27179 superseder
2016-05-18 08:27:03	martin.panter	set	messages: + msg265822
2016-05-18 06:08:56	ncoghlan	set	messages: + msg265812
2016-05-17 23:58:56	martin.panter	set	messages: + msg265793
2016-05-17 22:43:51	martin.panter	set	type: behavior -> enhancement messages: + msg265785 versions: + Python 3.6, - Python 2.7, Python 3.2, Python 3.3
2013-12-31 23:22:13	martin.panter	set	nosy: + martin.panter
2013-01-17 22:14:24	berwin22	set	files: + subProcessTest.py nosy: + berwin22 messages: + msg180157
2012-08-14 16:40:29	chris.jerdonek	set	messages: + msg168213
2012-08-14 11:12:05	chris.jerdonek	set	nosy: + chris.jerdonek messages: + msg168191
2011-11-28 16:06:47	pitrou	set	messages: + msg148498 versions: + Python 3.3
2011-11-28 16:00:44	r.david.murray	set	nosy: + r.david.murray messages: + msg148496
2011-11-28 15:48:50	eric.araujo	set	messages: + msg148495 title: subprocess seems to use local 8-bit encoding and gives no choice -> subprocess seems to use local encoding and give no choice
2011-11-28 15:47:44	vstinner	set	messages: + msg148494
2011-11-28 14:36:07	eric.araujo	set	messages: + msg148484
2011-11-26 23:36:24	Arfrever	set	nosy: + Arfrever
2011-11-22 15:16:07	eric.araujo	set	nosy: + eric.araujo
2011-11-21 17:31:13	pitrou	set	messages: + msg148066
2011-11-21 01:56:36	ncoghlan	set	nosy: + ncoghlan messages: + msg148025
2011-11-21 01:32:25	ncoghlan	link	issue13442 superseder
2010-12-01 23:40:24	vstinner	set	messages: + msg123024
2010-12-01 23:14:03	vstinner	set	messages: + msg123020
2010-11-21 07:15:20	ned.deily	set	nosy: + vstinner, - BreamoreBoy
2010-07-24 12:22:46	BreamoreBoy	set	messages: + msg111466
2010-07-22 15:32:10	BreamoreBoy	set	nosy: + BreamoreBoy messages: + msg111181
2009-12-31 13:43:18	mark	set	messages: + msg97093
2009-12-31 13:30:46	amaury.forgeotdarc	set	messages: + msg97092
2009-12-31 12:58:13	mark	set	messages: + msg97090
2009-12-31 12:29:22	mightyiam	set	nosy: + mightyiam
2009-06-13 16:46:43	segfaulthunter	set	files: + test_subprocess3.py.patch
2009-06-13 16:18:47	segfaulthunter	set	messages: + msg89333
2009-06-13 16:12:58	segfaulthunter	set	files: + subprocess3.patch
2009-06-13 16:12:25	segfaulthunter	set	files: + subprocess.patch
2009-06-13 16:11:58	segfaulthunter	set	files: - subprocess3.patch
2009-06-13 16:11:53	segfaulthunter	set	files: - subprocess.patch
2009-06-13 16:07:05	pitrou	set	versions: + Python 2.7, Python 3.2, - Python 2.6, Python 3.0 nosy: + pitrou messages: + msg89332 stage: needs patch -> patch review
2009-06-13 13:03:33	segfaulthunter	set	files: - subprocess3.patch
2009-06-13 13:03:29	segfaulthunter	set	files: + subprocess3.patch
2009-06-13 12:59:57	segfaulthunter	set	files: + subprocess3.patch
2009-06-13 12:32:37	segfaulthunter	set	files: - subprocess.patch
2009-06-13 12:32:32	segfaulthunter	set	files: + subprocess.patch messages: + msg89325
2009-06-13 12:07:13	segfaulthunter	set	files: + subprocess.patch nosy: + segfaulthunter messages: + msg89322 keywords: + patch
2009-06-12 18:10:21	amaury.forgeotdarc	set	nosy: + amaury.forgeotdarc messages: + msg89293 stage: needs patch
2009-06-08 18:08:01	srid	set	messages: + msg89094
2009-06-08 18:03:20	srid	set	versions: + Python 2.6
2009-06-08 18:03:06	srid	set	nosy: + srid
2009-05-28 07:08:42	mark	create