|
msg88466 - (view) |
Author: Mark Summerfield (mark) |
Date: 2009-05-28 07:08 |
When I start a process with subprocess.Popen() and pipe the stdin and
stdout, it always seems to use the local 8-bit encoding.
I tried setting process.stdin.encoding = "utf8" and the same for stdout
(where process is the subprocess object), but to no avail.
I also tried using shell=True since on Mac, Terminal.app is fine with
Unicode, but that didn't work.
So basically, I have programs that output Unicode and run fine on the
Mac terminal, but that cannot be executed by subprocess because
subprocess uses the mac_roman encoding instead of Unicode.
I wish it were possible to specify the stdin and stdout encoding that is
used; then I could use the same one on all platforms. (But perhaps it is
possible, and I just haven't figured out how?)
|
|
msg89094 - (view) |
Author: Sridhar Ratnakumar (srid) |
Date: 2009-06-08 18:08 |
Related discussion thread: https://answers.launchpad.net/bzr/+question/63601
|
|
msg89293 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *  |
Date: 2009-06-12 18:10 |
I propose to add two parameters (encoding, error) to the
subprocess.Popen function.
- python 2.x could build and return codecs.StreamReader objects
- python 3.x would just pass these parameters to io.TextIOWrapper
I'll try to come with a patch.
|
|
msg89322 - (view) |
Author: Florian Mayer (segfaulthunter) |
Date: 2009-06-13 12:07 |
I wrote a patch to add encoding and error to subprocess.Popen in Python
2.7 (trunk).
|
|
msg89325 - (view) |
Author: Florian Mayer (segfaulthunter) |
Date: 2009-06-13 12:32 |
Cosmetic update.
|
|
msg89332 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2009-06-13 16:07 |
Two things:
1. The argument should be called `errors` for consistency with open()
and TextIOWrapper(), not `error`
2. You should add some unit tests.
|
|
msg89333 - (view) |
Author: Florian Mayer (segfaulthunter) |
Date: 2009-06-13 16:18 |
Should we also cover the unusual case where stdout, stderr and stdin
have different encodings, because now we are assuming the are all the same.
|
|
msg97090 - (view) |
Author: Mark Summerfield (mark) |
Date: 2009-12-31 12:58 |
I agree with Florian Mayer that the encoding handling should be
stream-specific. You could easily be reading the stdout of some third
party program that uses, say, latin1, but want to do your own output in,
say, utf-8.
One solution that builds on what Amaury Forgeot d'Arc has done (i.e.,
the encoding and errors parameters) by allowing those parameters to
accept either a single string (as now), or a dict with keys 'stdin',
'stdout', 'stderr'. Of course it is possible that the client might not
specify all the dict's keys in which case those would use the normal
default (local 8-bit etc.)
|
|
msg97092 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *  |
Date: 2009-12-31 13:30 |
I don't understand. How is the subprocess stdout related to the main
program output?
Stream-specific encoding could be useful for subprocesses that expect
latin-1 from stdin but write utf-8 to stdout. I'm not sure we should
support this.
|
|
msg97093 - (view) |
Author: Mark Summerfield (mark) |
Date: 2009-12-31 13:43 |
On Thu, Dec 31, 2009 at 1:30 PM, Amaury Forgeot d'Arc
<report@bugs.python.org> wrote:
>
> Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment:
>
> I don't understand. How is the subprocess stdout related to the main
> program output?
> Stream-specific encoding could be useful for subprocesses that expect
> latin-1 from stdin but write utf-8 to stdout. I'm not sure we should
> support this.
Yes, you're right.
(What I had in mind was a scenario where you read one process's stdout
and wrote to another process's stdin; but of course using your errors
& encoding arguments this will work because there'll be two separate
process objects each of which can have its encoding and errors set
separately.)
|
|
msg111181 - (view) |
Author: Mark Lawrence (BreamoreBoy) |
Date: 2010-07-22 15:32 |
Ran new unit test before and after patching subprocess on Windows Vista against 3.1 debug maintenance release, all ok apart from this at end of latter.
File "test\test_subprocess.py", line 568, in test_encoded_stderr
self.assertEqual(p.stderr.read(), send)
AssertionError: 'ï[32943 refs]\r\n' != 'ï'
I'm sure I've seen a ref to this somewhere, can anyone remember where?
|
|
msg111466 - (view) |
Author: Mark Lawrence (BreamoreBoy) |
Date: 2010-07-24 12:22 |
In 2.7 and py3k test_subprocess has a class BaseTestCase which has a assertStderrEqual method. These don't exist in 3.1. I believe that the py3k code will need to be backported to 3.1. Can this be done on this issue, or do we need a new one to keep things clean?
|
|
msg123020 - (view) |
Author: STINNER Victor (haypo) *  |
Date: 2010-12-01 23:14 |
About the topic:
> subprocess seems to use local 8-bit encoding and gives no choice
I don't understand that: by default, Python 2 and Python 3 use byte strings, so there is no encoding (nor error handler).
I don't see how you can get unicode from a process only using subprocess. But with Python 3, you can get unicode if you set universal_newlines option to True.
So for Python 2, it's a new feature (get unicode), and for Python 3, it's a new option to specify the encoding. The title should be changed to something like "subprocess: add an option to specify stdin, stdout and/or stderr encoding and errors" and the type should be changed to "feature request".
Or am I completly wrong?
|
|
msg123024 - (view) |
Author: STINNER Victor (haypo) *  |
Date: 2010-12-01 23:40 |
> ... it always seems to use the local 8-bit encoding
The locale encoding is not necessary a 8-bit encoding, it can by a multibyte like... UTF-8 :-)
--
subprocess.patch: You should maybe use io.open(process.stdout.fileno(), encoding=..., errors=...) instead of codecs.getreader/getwriter. The code will be closer to Python 3. I think that the io module has a better support of unicode than codec reader/writer objects and a nicer API. See:
http://bugs.python.org/issue8796#msg106339
--
> ... allowing [encoding and errors] to accept either a single string
> (as now), or a dict with keys 'stdin', 'stdout', 'stderr'
I like this idea. But what about other TextIOWrapper (or other file classes) options: buffer size, newline, line_buffering, etc.?
Why not using a dict for existing stdin, stdout and stderr arguments? Dummy example:
process = Popen(
command,
stdin={'file': PIPE, 'encoding': 'iso-8859-1', 'newline': False},
stdout={'file': PIPE', 'encoding': 'utf-8', 'buffering': 0, 'line_buffering': False},
...)
If stdin, stdout or stderr is a dict: the default value of its 'file' key can be set to PIPE. I don't think that it's possible to choose the encoding, buffer size, or anything else if stdin, stdout or stderr is not a pipe.
With this solution, you cannot specify the encoding for stdin, stdout and stderr at once. You have at least to repeat the encoding for stdin and stdout (and use stderr=STDOUT).
--
I still hesitate to accept this feature request. Is it really needed to add extra arguments for TextIOWrapper? Can't the developer create its own TextIOWrapper object with all interesting options?
In Python 3, be able to specify stdout encoding is an interesting feature. Control the buffers size is also an interesting option.
My problem is maybe the usage of a dict to specify various options. I'm not sure that it is extensible to support future needs.
|
|
msg148025 - (view) |
Author: Nick Coghlan (ncoghlan) *  |
Date: 2011-11-21 01:56 |
I discovered this same problem recently when updating the subprocess docs, and also in working on the improved shell invocation support I am proposing for 3.3 (#13238).
I initially posted an earlier variant this suggestion as a new issue (#13442), but Victor redirected me here.
Firstly, I don't think it makes any sense to set encoding information globally for the Popen object. As a simple example, consider using Python to write a test suite for the iconv command line tool: there's only one Popen instance (for the iconv call), but different encodings for stdin and stdout.
Really, we want to be able to make full use of Python 3's layered I/O model, but we want the subprocess pipe instances to be slotted in at the lowest layer rather than creating them ourselves.
The easiest way to do that is to have a separate class that specifies the additional options for pipe creation and does the wrapping:
class TextPipe:
def __init__(self, *args, **kwds):
self.args = args
self.kwds = kwds
def wrap_pipe(self, pipe):
return io.TextIOWrapper(pipe, *self.args, **self.kwds)
The stream creation process would then include a new "wrap = getattr(stream_arg, 'wrap_pipe', None)" check that is similar to the existing check for subprocess.PIPE, but invokes the method to wrap the pipe after creating it.
So to read UTF-8 encoded data from a subprocess, you could just do:
data = check_stdout(cmd, stdout=TextPipe('utf-8'), stderr=STDOUT)
|
|
msg148066 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2011-11-21 17:31 |
> Firstly, I don't think it makes any sense to set encoding information
> globally for the Popen object. As a simple example, consider using
> Python to write a test suite for the iconv command line tool: there's
> only one Popen instance (for the iconv call), but different encodings
> for stdin and stdout.
Isn't that the exception rather than the rule? I think it actually makes
sense, in at least 99.83% of cases ;-), to have a common encoding
setting for all streams.
(I'm not sure about the "errors" setting, though: should we use strict
for stdin/stdout and backslashreplace for stderr, as the interpreter
does?)
Perhaps the common case should be made extra easy.
|
|
msg148484 - (view) |
Author: Éric Araujo (eric.araujo) *  |
Date: 2011-11-28 14:36 |
Is subprocess affected by PYTHONIOENCODING?
|
|
msg148494 - (view) |
Author: STINNER Victor (haypo) *  |
Date: 2011-11-28 15:47 |
> Is subprocess affected by PYTHONIOENCODING?
Yes, as any Python process.
|
|
msg148495 - (view) |
Author: Éric Araujo (eric.araujo) *  |
Date: 2011-11-28 15:48 |
So the users can control the encoding, and this is a doc bug.
|
|
msg148496 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2011-11-28 16:00 |
If you decide this is only a doc bug, please see also related issue 12832.
|
|
msg148498 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2011-11-28 16:06 |
> So the users can control the encoding, and this is a doc bug.
Not really. People can control the encoding in the child process (and only if it's a Python 3 process of course).
They can't control the encoding in the parent's subprocess pipes and that's what the request (& patch) is about.
|
|
msg168191 - (view) |
Author: Chris Jerdonek (chris.jerdonek) *  |
Date: 2012-08-14 11:12 |
> > only one Popen instance (for the iconv call), but different encodings
> > for stdin and stdout.
> Isn't that the exception rather than the rule? I think it actually makes
> sense, in at least 99.83% of cases ;-), to have a common encoding
> setting for all streams.
FWIW, I recently encountered a scenario (albeit in a test situation) where the ability to set different encodings for stdout and stderr would have been useful to me. It was while creating a test case for issue 15595. I was changing the locale encoding for stdout, but I also wanted to leave it unchanged for stderr because there didn't seem to be a way to control the encoding that the child used for stderr.
|
|
msg168213 - (view) |
Author: Chris Jerdonek (chris.jerdonek) *  |
Date: 2012-08-14 16:40 |
To my previous comment, issue 15648 shows the case where I was able to change the encoding for stdout in the child process but not stderr (which would require supporting two encodings in Popen to handle).
|
|
msg180157 - (view) |
Author: Joseph Perry (berwin22) |
Date: 2013-01-17 22:14 |
I've found a workaround by specifying the enviroment variable:
my_env = os.environ
my_env['PYTHONIOENCODING'] = 'utf-8'
p = subprocess.Popen(shlex.split(command), stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE, env=my_env)
I've attached an example script for testing. It calls itself recursively 10 times.
Pleased note the 'fix' variable.
|
|
| Date |
User |
Action |
Args |
| 2013-01-17 22:14:24 | berwin22 | set | files:
+ subProcessTest.py nosy:
+ berwin22 messages:
+ msg180157
|
| 2012-08-14 16:40:29 | chris.jerdonek | set | messages:
+ msg168213 |
| 2012-08-14 11:12:05 | chris.jerdonek | set | nosy:
+ chris.jerdonek messages:
+ msg168191
|
| 2011-11-28 16:06:47 | pitrou | set | messages:
+ msg148498 versions:
+ Python 3.3 |
| 2011-11-28 16:00:44 | r.david.murray | set | nosy:
+ r.david.murray messages:
+ msg148496
|
| 2011-11-28 15:48:50 | eric.araujo | set | messages:
+ msg148495 title: subprocess seems to use local 8-bit encoding and gives no choice -> subprocess seems to use local encoding and give no choice |
| 2011-11-28 15:47:44 | haypo | set | messages:
+ msg148494 |
| 2011-11-28 14:36:07 | eric.araujo | set | messages:
+ msg148484 |
| 2011-11-26 23:36:24 | Arfrever | set | nosy:
+ Arfrever
|
| 2011-11-22 15:16:07 | eric.araujo | set | nosy:
+ eric.araujo
|
| 2011-11-21 17:31:13 | pitrou | set | messages:
+ msg148066 |
| 2011-11-21 01:56:36 | ncoghlan | set | nosy:
+ ncoghlan messages:
+ msg148025
|
| 2011-11-21 01:32:25 | ncoghlan | link | issue13442 superseder |
| 2010-12-01 23:40:24 | haypo | set | messages:
+ msg123024 |
| 2010-12-01 23:14:03 | haypo | set | messages:
+ msg123020 |
| 2010-11-21 07:15:20 | ned.deily | set | nosy:
+ haypo, - BreamoreBoy |
| 2010-07-24 12:22:46 | BreamoreBoy | set | messages:
+ msg111466 |
| 2010-07-22 15:32:10 | BreamoreBoy | set | nosy:
+ BreamoreBoy messages:
+ msg111181
|
| 2009-12-31 13:43:18 | mark | set | messages:
+ msg97093 |
| 2009-12-31 13:30:46 | amaury.forgeotdarc | set | messages:
+ msg97092 |
| 2009-12-31 12:58:13 | mark | set | messages:
+ msg97090 |
| 2009-12-31 12:29:22 | DawnLight | set | nosy:
+ DawnLight
|
| 2009-06-13 16:46:43 | segfaulthunter | set | files:
+ test_subprocess3.py.patch |
| 2009-06-13 16:18:47 | segfaulthunter | set | messages:
+ msg89333 |
| 2009-06-13 16:12:58 | segfaulthunter | set | files:
+ subprocess3.patch |
| 2009-06-13 16:12:25 | segfaulthunter | set | files:
+ subprocess.patch |
| 2009-06-13 16:11:58 | segfaulthunter | set | files:
- subprocess3.patch |
| 2009-06-13 16:11:53 | segfaulthunter | set | files:
- subprocess.patch |
| 2009-06-13 16:07:05 | pitrou | set | versions:
+ Python 2.7, Python 3.2, - Python 2.6, Python 3.0 nosy:
+ pitrou
messages:
+ msg89332
stage: needs patch -> patch review |
| 2009-06-13 13:03:33 | segfaulthunter | set | files:
- subprocess3.patch |
| 2009-06-13 13:03:29 | segfaulthunter | set | files:
+ subprocess3.patch |
| 2009-06-13 12:59:57 | segfaulthunter | set | files:
+ subprocess3.patch |
| 2009-06-13 12:32:37 | segfaulthunter | set | files:
- subprocess.patch |
| 2009-06-13 12:32:32 | segfaulthunter | set | files:
+ subprocess.patch
messages:
+ msg89325 |
| 2009-06-13 12:07:13 | segfaulthunter | set | files:
+ subprocess.patch
nosy:
+ segfaulthunter messages:
+ msg89322
keywords:
+ patch |
| 2009-06-12 18:10:21 | amaury.forgeotdarc | set | nosy:
+ amaury.forgeotdarc
messages:
+ msg89293 stage: needs patch |
| 2009-06-08 18:08:01 | srid | set | messages:
+ msg89094 |
| 2009-06-08 18:03:20 | srid | set | versions:
+ Python 2.6 |
| 2009-06-08 18:03:06 | srid | set | nosy:
+ srid
|
| 2009-05-28 07:08:42 | mark | create | |