classification
Title: subprocess seems to use local encoding and give no choice
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.3, Python 3.2, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, DawnLight, amaury.forgeotdarc, berwin22, chris.jerdonek, eric.araujo, haypo, mark, ncoghlan, pitrou, r.david.murray, segfaulthunter, srid, vadmium
Priority: normal Keywords: patch

Created on 2009-05-28 07:08 by mark, last changed 2013-12-31 23:22 by vadmium.

Files
File name Uploaded Description Edit
subprocess.patch segfaulthunter, 2009-06-13 16:12 Add encoding and errors to subprocess.Popen. Based against trunk.
subprocess3.patch segfaulthunter, 2009-06-13 16:12 Add encoding and errors to subprocess.Popen. Based against py3k. review
test_subprocess3.py.patch segfaulthunter, 2009-06-13 16:46 unittests for encoding parameter review
subProcessTest.py berwin22, 2013-01-17 22:14 Workaround example. Calls itself as a sub-process.
Messages (24)
msg88466 - (view) Author: Mark Summerfield (mark) Date: 2009-05-28 07:08
When I start a process with subprocess.Popen() and pipe the stdin and
stdout, it always seems to use the local 8-bit encoding.

I tried setting process.stdin.encoding = "utf8" and the same for stdout
(where process is the subprocess object), but to no avail.

I also tried using shell=True since on Mac, Terminal.app is fine with
Unicode, but that didn't work.

So basically, I have programs that output Unicode and run fine on the
Mac terminal, but that cannot be executed by subprocess because
subprocess uses the mac_roman encoding instead of Unicode.

I wish it were possible to specify the stdin and stdout encoding that is
used; then I could use the same one on all platforms. (But perhaps it is
possible, and I just haven't figured out how?)
msg89094 - (view) Author: Sridhar Ratnakumar (srid) Date: 2009-06-08 18:08
Related discussion thread: https://answers.launchpad.net/bzr/+question/63601
msg89293 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-06-12 18:10
I propose to add two parameters (encoding, error) to the
subprocess.Popen function.
- python 2.x could build and return codecs.StreamReader objects
- python 3.x would just pass these parameters to io.TextIOWrapper

I'll try to come with a patch.
msg89322 - (view) Author: Florian Mayer (segfaulthunter) Date: 2009-06-13 12:07
I wrote a patch to add encoding and error to subprocess.Popen in Python
2.7 (trunk).
msg89325 - (view) Author: Florian Mayer (segfaulthunter) Date: 2009-06-13 12:32
Cosmetic update.
msg89332 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-06-13 16:07
Two things:
1. The argument should be called `errors` for consistency with open()
and TextIOWrapper(), not `error`
2. You should add some unit tests.
msg89333 - (view) Author: Florian Mayer (segfaulthunter) Date: 2009-06-13 16:18
Should we also cover the unusual case where stdout, stderr and stdin
have different encodings, because now we are assuming the are all the same.
msg97090 - (view) Author: Mark Summerfield (mark) Date: 2009-12-31 12:58
I agree with Florian Mayer that the encoding handling should be
stream-specific. You could easily be reading the stdout of some third
party program that uses, say, latin1, but want to do your own output in,
say, utf-8.

One solution that builds on what Amaury Forgeot d'Arc has done (i.e.,
the encoding and errors parameters) by allowing those parameters to
accept either a single string (as now), or a dict with keys 'stdin',
'stdout', 'stderr'. Of course it is possible that the client might not
specify all the dict's keys in which case those would use the normal
default (local 8-bit etc.)
msg97092 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-12-31 13:30
I don't understand. How is the subprocess stdout related to the main
program output?
Stream-specific encoding could be useful for subprocesses that expect
latin-1 from stdin but write utf-8 to stdout. I'm not sure we should
support this.
msg97093 - (view) Author: Mark Summerfield (mark) Date: 2009-12-31 13:43
On Thu, Dec 31, 2009 at 1:30 PM, Amaury Forgeot d'Arc
<report@bugs.python.org> wrote:
>
> Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment:
>
> I don't understand. How is the subprocess stdout related to the main
> program output?
> Stream-specific encoding could be useful for subprocesses that expect
> latin-1 from stdin but write utf-8 to stdout. I'm not sure we should
> support this.

Yes, you're right.

(What I had in mind was a scenario where you read one process's stdout
and wrote to another process's stdin; but of course using your errors
& encoding arguments this will work because there'll be two separate
process objects each of which can have its encoding and errors set
separately.)
msg111181 - (view) Author: Mark Lawrence (BreamoreBoy) Date: 2010-07-22 15:32
Ran new unit test before and after patching subprocess on Windows Vista against 3.1 debug maintenance release, all ok apart from this at end of latter.

  File "test\test_subprocess.py", line 568, in test_encoded_stderr
    self.assertEqual(p.stderr.read(), send)
AssertionError: 'ï[32943 refs]\r\n' != 'ï'

I'm sure I've seen a ref to this somewhere, can anyone remember where?
msg111466 - (view) Author: Mark Lawrence (BreamoreBoy) Date: 2010-07-24 12:22
In 2.7 and py3k test_subprocess has a class BaseTestCase which has a assertStderrEqual method.  These don't exist in 3.1.  I believe that the py3k code will need to be backported to 3.1.  Can this be done on this issue, or do we need a new one to keep things clean?
msg123020 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-12-01 23:14
About the topic:
> subprocess seems to use local 8-bit encoding and gives no choice
I don't understand that: by default, Python 2 and Python 3 use byte strings, so there is no encoding (nor error handler).

I don't see how you can get unicode from a process only using subprocess. But with Python 3, you can get unicode if you set universal_newlines option to True.

So for Python 2, it's a new feature (get unicode), and for Python 3, it's a new option to specify the encoding. The title should be changed to something like "subprocess: add an option to specify stdin, stdout and/or stderr encoding and errors" and the type should be changed to "feature request".

Or am I completly wrong?
msg123024 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-12-01 23:40
> ... it always seems to use the local 8-bit encoding

The locale encoding is not necessary a 8-bit encoding, it can by a multibyte like... UTF-8 :-)

--

subprocess.patch: You should maybe use io.open(process.stdout.fileno(), encoding=..., errors=...) instead of codecs.getreader/getwriter. The code will be closer to Python 3. I think that the io module has a better support of unicode than codec reader/writer objects and a nicer API. See:
http://bugs.python.org/issue8796#msg106339

--

> ... allowing [encoding and errors] to accept either a single string
> (as now), or a dict with keys 'stdin', 'stdout', 'stderr'

I like this idea. But what about other TextIOWrapper (or other file classes) options: buffer size, newline, line_buffering, etc.?

Why not using a dict for existing stdin, stdout and stderr arguments? Dummy example:

process = Popen(
   command,
   stdin={'file': PIPE, 'encoding': 'iso-8859-1', 'newline': False},
   stdout={'file': PIPE', 'encoding': 'utf-8', 'buffering': 0, 'line_buffering': False},
   ...)

If stdin, stdout or stderr is a dict: the default value of its 'file' key can be set to PIPE. I don't think that it's possible to choose the encoding, buffer size, or anything else if stdin, stdout or stderr is not a pipe.

With this solution, you cannot specify the encoding for stdin, stdout and stderr at once. You have at least to repeat the encoding for stdin and stdout (and use stderr=STDOUT).

--

I still hesitate to accept this feature request. Is it really needed to add extra arguments for TextIOWrapper? Can't the developer create its own TextIOWrapper object with all interesting options?

In Python 3, be able to specify stdout encoding is an interesting feature. Control the buffers size is also an interesting option.

My problem is maybe the usage of a dict to specify various options. I'm not sure that it is extensible to support future needs.
msg148025 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-11-21 01:56
I discovered this same problem recently when updating the subprocess docs, and also in working on the improved shell invocation support I am proposing for 3.3 (#13238).

I initially posted an earlier variant this suggestion as a new issue (#13442), but Victor redirected me here.

Firstly, I don't think it makes any sense to set encoding information globally for the Popen object. As a simple example, consider using Python to write a test suite for the iconv command line tool: there's only one Popen instance (for the iconv call), but different encodings for stdin and stdout.

Really, we want to be able to make full use of Python 3's layered I/O model, but we want the subprocess pipe instances to be slotted in at the lowest layer rather than creating them ourselves.

The easiest way to do that is to have a separate class that specifies the additional options for pipe creation and does the wrapping:

    class TextPipe:
        def __init__(self, *args, **kwds):
            self.args = args
            self.kwds = kwds
        def wrap_pipe(self, pipe):
            return io.TextIOWrapper(pipe, *self.args, **self.kwds)

The stream creation process would then include a new "wrap = getattr(stream_arg, 'wrap_pipe', None)" check that is similar to the existing check for subprocess.PIPE, but invokes the method to wrap the pipe after creating it.

So to read UTF-8 encoded data from a subprocess, you could just do:

    data = check_stdout(cmd, stdout=TextPipe('utf-8'), stderr=STDOUT)
msg148066 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-11-21 17:31
> Firstly, I don't think it makes any sense to set encoding information
> globally for the Popen object. As a simple example, consider using
> Python to write a test suite for the iconv command line tool: there's
> only one Popen instance (for the iconv call), but different encodings
> for stdin and stdout.

Isn't that the exception rather than the rule? I think it actually makes
sense, in at least 99.83% of cases ;-), to have a common encoding
setting for all streams.
(I'm not sure about the "errors" setting, though: should we use strict
for stdin/stdout and backslashreplace for stderr, as the interpreter
does?)

Perhaps the common case should be made extra easy.
msg148484 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-11-28 14:36
Is subprocess affected by PYTHONIOENCODING?
msg148494 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-11-28 15:47
> Is subprocess affected by PYTHONIOENCODING?

Yes, as any Python process.
msg148495 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-11-28 15:48
So the users can control the encoding, and this is a doc bug.
msg148496 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-11-28 16:00
If you decide this is only a doc bug, please see also related issue 12832.
msg148498 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-11-28 16:06
> So the users can control the encoding, and this is a doc bug.

Not really. People can control the encoding in the child process (and only if it's a Python 3 process of course).
They can't control the encoding in the parent's subprocess pipes and that's what the request (& patch) is about.
msg168191 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-08-14 11:12
> > only one Popen instance (for the iconv call), but different encodings
> > for stdin and stdout.
> Isn't that the exception rather than the rule? I think it actually makes
> sense, in at least 99.83% of cases ;-), to have a common encoding
> setting for all streams.

FWIW, I recently encountered a scenario (albeit in a test situation) where the ability to set different encodings for stdout and stderr would have been useful to me.  It was while creating a test case for issue 15595.  I was changing the locale encoding for stdout, but I also wanted to leave it unchanged for stderr because there didn't seem to be a way to control the encoding that the child used for stderr.
msg168213 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-08-14 16:40
To my previous comment, issue 15648 shows the case where I was able to change the encoding for stdout in the child process but not stderr (which would require supporting two encodings in Popen to handle).
msg180157 - (view) Author: Joseph Perry (berwin22) Date: 2013-01-17 22:14
I've found a workaround by specifying the enviroment variable:

my_env = os.environ
my_env['PYTHONIOENCODING'] = 'utf-8'
p = subprocess.Popen(shlex.split(command), stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE, env=my_env)

I've attached an example script for testing. It calls itself recursively 10 times.
Pleased note the 'fix' variable.
History
Date User Action Args
2013-12-31 23:22:13vadmiumsetnosy: + vadmium
2013-01-17 22:14:24berwin22setfiles: + subProcessTest.py
nosy: + berwin22
messages: + msg180157

2012-08-14 16:40:29chris.jerdoneksetmessages: + msg168213
2012-08-14 11:12:05chris.jerdoneksetnosy: + chris.jerdonek
messages: + msg168191
2011-11-28 16:06:47pitrousetmessages: + msg148498
versions: + Python 3.3
2011-11-28 16:00:44r.david.murraysetnosy: + r.david.murray
messages: + msg148496
2011-11-28 15:48:50eric.araujosetmessages: + msg148495
title: subprocess seems to use local 8-bit encoding and gives no choice -> subprocess seems to use local encoding and give no choice
2011-11-28 15:47:44hayposetmessages: + msg148494
2011-11-28 14:36:07eric.araujosetmessages: + msg148484
2011-11-26 23:36:24Arfreversetnosy: + Arfrever
2011-11-22 15:16:07eric.araujosetnosy: + eric.araujo
2011-11-21 17:31:13pitrousetmessages: + msg148066
2011-11-21 01:56:36ncoghlansetnosy: + ncoghlan
messages: + msg148025
2011-11-21 01:32:25ncoghlanlinkissue13442 superseder
2010-12-01 23:40:24hayposetmessages: + msg123024
2010-12-01 23:14:03hayposetmessages: + msg123020
2010-11-21 07:15:20ned.deilysetnosy: + haypo, - BreamoreBoy
2010-07-24 12:22:46BreamoreBoysetmessages: + msg111466
2010-07-22 15:32:10BreamoreBoysetnosy: + BreamoreBoy
messages: + msg111181
2009-12-31 13:43:18marksetmessages: + msg97093
2009-12-31 13:30:46amaury.forgeotdarcsetmessages: + msg97092
2009-12-31 12:58:13marksetmessages: + msg97090
2009-12-31 12:29:22DawnLightsetnosy: + DawnLight
2009-06-13 16:46:43segfaulthuntersetfiles: + test_subprocess3.py.patch
2009-06-13 16:18:47segfaulthuntersetmessages: + msg89333
2009-06-13 16:12:58segfaulthuntersetfiles: + subprocess3.patch
2009-06-13 16:12:25segfaulthuntersetfiles: + subprocess.patch
2009-06-13 16:11:58segfaulthuntersetfiles: - subprocess3.patch
2009-06-13 16:11:53segfaulthuntersetfiles: - subprocess.patch
2009-06-13 16:07:05pitrousetversions: + Python 2.7, Python 3.2, - Python 2.6, Python 3.0
nosy: + pitrou

messages: + msg89332

stage: needs patch -> patch review
2009-06-13 13:03:33segfaulthuntersetfiles: - subprocess3.patch
2009-06-13 13:03:29segfaulthuntersetfiles: + subprocess3.patch
2009-06-13 12:59:57segfaulthuntersetfiles: + subprocess3.patch
2009-06-13 12:32:37segfaulthuntersetfiles: - subprocess.patch
2009-06-13 12:32:32segfaulthuntersetfiles: + subprocess.patch

messages: + msg89325
2009-06-13 12:07:13segfaulthuntersetfiles: + subprocess.patch

nosy: + segfaulthunter
messages: + msg89322

keywords: + patch
2009-06-12 18:10:21amaury.forgeotdarcsetnosy: + amaury.forgeotdarc

messages: + msg89293
stage: needs patch
2009-06-08 18:08:01sridsetmessages: + msg89094
2009-06-08 18:03:20sridsetversions: + Python 2.6
2009-06-08 18:03:06sridsetnosy: + srid
2009-05-28 07:08:42markcreate