classification
Title: subprocess.getstatusoutput can fail with utf8 UnicodeDecodeError
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.3, Python 3.4
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: tim.golden Nosy List: astrand, cvrebert, debatem1, gregory.p.smith, ncoghlan, ned.deily, pitrou, rosslagerwall, tim.golden, vstinner
Priority: normal Keywords: patch

Created on 2010-09-22 22:06 by debatem1, last changed 2013-11-11 15:13 by tim.golden. This issue is now closed.

Files
File name Uploaded Description Edit
issue9922-py3k.patch ned.deily, 2010-09-23 10:19
issue9922-31-rev1.patch ned.deily, 2010-12-12 08:16 31 backport (revision 1)
issue9922.34.diff tim.golden, 2013-11-03 17:23 review
Messages (12)
msg117156 - (view) Author: geremy condra (debatem1) Date: 2010-09-22 22:06
It looks like subprocess.getstatusoutput on 3.2a1 tries to coerce to UTF-8, which fails when dealing with bytes. This demonstrates the behavior nearly all the time for me on Ubuntu 10.04:

>>> import subprocess
>>> subprocess.getstatusoutput('dd if=/dev/random bs=1K count=1')

Here's the tracebacks from a few runs:

>>> subprocess.getstatusoutput('dd if=/dev/random bs=1K count=1')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.2/subprocess.py", line 585, in getstatusoutput
    text = pipe.read()
  File "/usr/local/lib/python3.2/codecs.py", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xcb in position 3: invalid continuation byte
>>> subprocess.getstatusoutput('dd if=/dev/random bs=1K count=1')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.2/subprocess.py", line 585, in getstatusoutput
    text = pipe.read()
  File "/usr/local/lib/python3.2/codecs.py", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 2: invalid continuation byte
>>> subprocess.getstatusoutput('dd if=/dev/random bs=1K count=1')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.2/subprocess.py", line 585, in getstatusoutput
    text = pipe.read()
  File "/usr/local/lib/python3.2/codecs.py", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xac in position 0: invalid start byte
>>> subprocess.getstatusoutput('dd if=/dev/random bs=1K count=1')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.2/subprocess.py", line 585, in getstatusoutput
    text = pipe.read()
  File "/usr/local/lib/python3.2/codecs.py", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xf1 in position 0: invalid continuation byte
>>> 

And here's the version info:

Python 3.2a1 (r32a1:83318, Aug 13 2010, 22:32:03) 
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
msg117176 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2010-09-23 10:19
In Python 3, subprocess.Popen returns stdout as "bytes" rather than "string" so it seems reasonable that subprocess.getstatusoutput should do the same.

>>> subprocess.Popen(['dd if=/dev/random bs=1024 count=1'], shell=True, stdout=subprocess.PIPE).communicate()[0]
1+0 records in
1+0 records out
1024 bytes transferred in 0.000142 secs (7218432 bytes/sec)
b'\x11\xfb\xe1w ...

The problem is reproducible on 3.1 and py3k.

The attached patch for py3k (with a backport to 3.1) corrects getstatusoutput to return bytes.  It also includes a test case and updates the docs to show byte output.
msg123831 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2010-12-12 08:16
Update 31 backport patch to reflect revert of unittest method names prior to 3.1.3 release.
msg129976 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-03-03 15:48
I think that it is now too late to change getstatusoutput() output type (str->bytes). I prefer Unicode and I think that most users will have to decode bytes to Unicode anyway. So the right solution is to be able to configure encoding and errors used by TextIOWrapper: see issue #6135.
msg130006 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2011-03-03 22:08
I don't have a strong feeling about it but it seems to me that getstatusoutput is broken now so something should needs to be changed.  If I understand your suggestion, adding a universal_newlines option to getstatusoutput similar to Popen, with a True (default) to return str False to return bytes, should provide an upwards compatible compromise.  And whatever solution is adopted for Issue6135 should be able to be applied here as well.

On the other hand, getstatusoutput was a carryover from the commands module in Python 2 and I'm not sure why it was not just removed in Python 3 as it seems to duplicate what can be done with Popen.  Perhaps it should just be deprecated and removed?
msg150072 - (view) Author: Ross Lagerwall (rosslagerwall) (Python committer) Date: 2011-12-22 06:07
getstatusoutput() is broken given that it doesn't work on windows yet it should.

I'd also recommend leaving the behavior as is and deprecating the function (and getoutput() while we're at it).

A related bug (#10197) also recommends deprecating getstatusoutput.
msg150091 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-12-22 12:11
Well, getoutput and getstatusoutput are arguably ugly.

However, since they are very high-level functions meant to quickly execute commands, returning str makes sense. You can't do anything smart with the output anyway, since stdout and stderr are intermingled: any binary data will be ruined by accompanying stderr output (e.g. warnings).

If you want to process the output data instead of displaying it to the user, use check_call() instead.

We could perhaps use the "surrogateescape" error handler in getoutput and getstatusoutput, but that's really putting lipstick on a pig.
msg150093 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-12-22 14:04
It is indeed unfortunate that these two functions weren't killed off in the Py3k transition instead of being migrated from the commands module to subprocess (their current implementation runs counter to the entire subprocess security design by implicitly invoking the shell).

Probably the most straightforward way to make them better behaved is to move them over to being based on subprocess.Popen:

def getstatusoutput(cmd):
    try:
        data = check_output(cmd, shell=True, universal_newlines=True, stderr=STDOUT)
        status = 0
    except CalledProcessError as ex:
        data = ex.output
        status = ex.returncode
    return status, data

def getoutput(cmd):
    return getstatusoutput(cmd)[1]
msg202035 - (view) Author: Tim Golden (tim.golden) * (Python committer) Date: 2013-11-03 16:28
This has nearly been fixed by the rewrite of the get[status]output code under issue10197. There is an issue, though, with the use there of universal_newlines mode, which forces check_output to return a string rather than bytes.
msg202039 - (view) Author: Tim Golden (tim.golden) * (Python committer) Date: 2013-11-03 17:19
The code I've just committed to issue10197 means that the get[status]output functions now pass their (few) tests on all platforms. More by chance than judgement, that change employed universal_newlines which has the effect of forcing the output of check_output to string rather than bytes.

Having just re-read all the comments, we have three positions:

a) Do nothing: these are outdated functions and anyone who has a problem with undecodable bytes will have to use one of the other functions where they have more control.

b) Use the surrogateescape encoding in every case to produce *some* kind of output rather than an exception.

c) Tweak the code to produce bytes in every case. This is actually simple: removing universal_newlines support will do this. (I already have working code for this).

I think (b) is more trouble than it's worth. So (a) Do Nothing; or (c) Produce Bytes.

Going by the law of "Status Quo wins", if no-one chimes in with a strong case for switching to bytes in a few days, I propose to close this as Won't Fix.
msg202045 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2013-11-03 17:41
If anybody using them in Python 3.3 is already depending upon them returning strings, changing it to return bytes will break their code... so that ship as unfortunately sailed.  Changing that only in 3.4 would just cause churn and make code using this more difficult to be compatible across both.

Updating the documentation to state the utf-8 decoding behavior and the caveats of that is important.  That documentation note should also mention what people should use instead if they want binary data instead of text.
msg202631 - (view) Author: Tim Golden (tim.golden) * (Python committer) Date: 2013-11-11 15:13
Closing this as won't fix. The code has been reimplemented and additional documentation has been added over at issue10197. Given that these are legacy functions, I don't propose to do any more here.
History
Date User Action Args
2013-11-11 15:13:58tim.goldensetstatus: open -> closed
resolution: wont fix
messages: + msg202631
2013-11-03 17:41:31gregory.p.smithsetmessages: + msg202045
2013-11-03 17:23:51tim.goldensetfiles: + issue9922.34.diff
2013-11-03 17:19:48tim.goldensetassignee: tim.golden
versions: + Python 3.3, Python 3.4, - Python 3.1, Python 3.2
2013-11-03 17:19:21tim.goldensetmessages: + msg202039
2013-11-03 16:28:18tim.goldensetnosy: + tim.golden
messages: + msg202035
2011-12-27 19:07:00cvrebertsetnosy: + cvrebert
2011-12-22 14:04:44ncoghlansetmessages: + msg150093
2011-12-22 12:11:29pitrousetnosy: + pitrou, ncoghlan
messages: + msg150091
2011-12-22 06:07:02rosslagerwallsetnosy: + rosslagerwall
messages: + msg150072
2011-03-03 22:08:14ned.deilysetnosy: gregory.p.smith, astrand, vstinner, ned.deily, debatem1
messages: + msg130006
2011-03-03 15:48:49vstinnersetnosy: + vstinner
messages: + msg129976
2010-12-12 08:16:40ned.deilysetfiles: - issue9922-31.patch
2010-12-12 08:16:28ned.deilysetfiles: + issue9922-31-rev1.patch

messages: + msg123831
2010-09-23 11:03:25pitrousetnosy: + gregory.p.smith
2010-09-23 10:20:03ned.deilysetfiles: + issue9922-31.patch
2010-09-23 10:19:42ned.deilysetfiles: + issue9922-py3k.patch


title: subprocess.getstatusoutput and bytes -> subprocess.getstatusoutput can fail with utf8 UnicodeDecodeError
keywords: + patch
nosy: + astrand, ned.deily
versions: + Python 3.1
messages: + msg117176
stage: patch review
2010-09-22 22:06:38debatem1create