classification
Title: subprocess.getstatusoutput can fail with utf8 UnicodeDecodeError
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.2, Python 3.1
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: astrand, cvrebert, debatem1, gregory.p.smith, haypo, ncoghlan, ned.deily, pitrou, rosslagerwall
Priority: normal Keywords: patch

Created on 2010-09-22 22:06 by debatem1, last changed 2011-12-27 19:07 by cvrebert.

Files
File name Uploaded Description Edit
issue9922-py3k.patch ned.deily, 2010-09-23 10:19
issue9922-31-rev1.patch ned.deily, 2010-12-12 08:16 31 backport (revision 1)
Messages (8)
msg117156 - (view) Author: geremy condra (debatem1) Date: 2010-09-22 22:06
It looks like subprocess.getstatusoutput on 3.2a1 tries to coerce to UTF-8, which fails when dealing with bytes. This demonstrates the behavior nearly all the time for me on Ubuntu 10.04:

>>> import subprocess
>>> subprocess.getstatusoutput('dd if=/dev/random bs=1K count=1')

Here's the tracebacks from a few runs:

>>> subprocess.getstatusoutput('dd if=/dev/random bs=1K count=1')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.2/subprocess.py", line 585, in getstatusoutput
    text = pipe.read()
  File "/usr/local/lib/python3.2/codecs.py", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xcb in position 3: invalid continuation byte
>>> subprocess.getstatusoutput('dd if=/dev/random bs=1K count=1')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.2/subprocess.py", line 585, in getstatusoutput
    text = pipe.read()
  File "/usr/local/lib/python3.2/codecs.py", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 2: invalid continuation byte
>>> subprocess.getstatusoutput('dd if=/dev/random bs=1K count=1')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.2/subprocess.py", line 585, in getstatusoutput
    text = pipe.read()
  File "/usr/local/lib/python3.2/codecs.py", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xac in position 0: invalid start byte
>>> subprocess.getstatusoutput('dd if=/dev/random bs=1K count=1')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.2/subprocess.py", line 585, in getstatusoutput
    text = pipe.read()
  File "/usr/local/lib/python3.2/codecs.py", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xf1 in position 0: invalid continuation byte
>>> 

And here's the version info:

Python 3.2a1 (r32a1:83318, Aug 13 2010, 22:32:03) 
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
msg117176 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2010-09-23 10:19
In Python 3, subprocess.Popen returns stdout as "bytes" rather than "string" so it seems reasonable that subprocess.getstatusoutput should do the same.

>>> subprocess.Popen(['dd if=/dev/random bs=1024 count=1'], shell=True, stdout=subprocess.PIPE).communicate()[0]
1+0 records in
1+0 records out
1024 bytes transferred in 0.000142 secs (7218432 bytes/sec)
b'\x11\xfb\xe1w ...

The problem is reproducible on 3.1 and py3k.

The attached patch for py3k (with a backport to 3.1) corrects getstatusoutput to return bytes.  It also includes a test case and updates the docs to show byte output.
msg123831 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2010-12-12 08:16
Update 31 backport patch to reflect revert of unittest method names prior to 3.1.3 release.
msg129976 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-03-03 15:48
I think that it is now too late to change getstatusoutput() output type (str->bytes). I prefer Unicode and I think that most users will have to decode bytes to Unicode anyway. So the right solution is to be able to configure encoding and errors used by TextIOWrapper: see issue #6135.
msg130006 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2011-03-03 22:08
I don't have a strong feeling about it but it seems to me that getstatusoutput is broken now so something should needs to be changed.  If I understand your suggestion, adding a universal_newlines option to getstatusoutput similar to Popen, with a True (default) to return str False to return bytes, should provide an upwards compatible compromise.  And whatever solution is adopted for Issue6135 should be able to be applied here as well.

On the other hand, getstatusoutput was a carryover from the commands module in Python 2 and I'm not sure why it was not just removed in Python 3 as it seems to duplicate what can be done with Popen.  Perhaps it should just be deprecated and removed?
msg150072 - (view) Author: Ross Lagerwall (rosslagerwall) (Python committer) Date: 2011-12-22 06:07
getstatusoutput() is broken given that it doesn't work on windows yet it should.

I'd also recommend leaving the behavior as is and deprecating the function (and getoutput() while we're at it).

A related bug (#10197) also recommends deprecating getstatusoutput.
msg150091 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-12-22 12:11
Well, getoutput and getstatusoutput are arguably ugly.

However, since they are very high-level functions meant to quickly execute commands, returning str makes sense. You can't do anything smart with the output anyway, since stdout and stderr are intermingled: any binary data will be ruined by accompanying stderr output (e.g. warnings).

If you want to process the output data instead of displaying it to the user, use check_call() instead.

We could perhaps use the "surrogateescape" error handler in getoutput and getstatusoutput, but that's really putting lipstick on a pig.
msg150093 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-12-22 14:04
It is indeed unfortunate that these two functions weren't killed off in the Py3k transition instead of being migrated from the commands module to subprocess (their current implementation runs counter to the entire subprocess security design by implicitly invoking the shell).

Probably the most straightforward way to make them better behaved is to move them over to being based on subprocess.Popen:

def getstatusoutput(cmd):
    try:
        data = check_output(cmd, shell=True, universal_newlines=True, stderr=STDOUT)
        status = 0
    except CalledProcessError as ex:
        data = ex.output
        status = ex.returncode
    return status, data

def getoutput(cmd):
    return getstatusoutput(cmd)[1]
History
Date User Action Args
2011-12-27 19:07:00cvrebertsetnosy: + cvrebert
2011-12-22 14:04:44ncoghlansetmessages: + msg150093
2011-12-22 12:11:29pitrousetnosy: + pitrou, ncoghlan
messages: + msg150091
2011-12-22 06:07:02rosslagerwallsetnosy: + rosslagerwall
messages: + msg150072
2011-03-03 22:08:14ned.deilysetnosy: gregory.p.smith, astrand, haypo, ned.deily, debatem1
messages: + msg130006
2011-03-03 15:48:49hayposetnosy: + haypo
messages: + msg129976
2010-12-12 08:16:40ned.deilysetfiles: - issue9922-31.patch
2010-12-12 08:16:28ned.deilysetfiles: + issue9922-31-rev1.patch

messages: + msg123831
2010-09-23 11:03:25pitrousetnosy: + gregory.p.smith
2010-09-23 10:20:03ned.deilysetfiles: + issue9922-31.patch
2010-09-23 10:19:42ned.deilysetfiles: + issue9922-py3k.patch


title: subprocess.getstatusoutput and bytes -> subprocess.getstatusoutput can fail with utf8 UnicodeDecodeError
keywords: + patch
nosy: + astrand, ned.deily
versions: + Python 3.1
messages: + msg117176
stage: patch review
2010-09-22 22:06:38debatem1create