This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author davispuh
Recipients davispuh, ezio.melotti, paul.moore, steve.dower, tim.golden, vstinner, zach.ware
Date 2016-06-02.00:52:44
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1464828766.51.0.809606583575.issue27179@psf.upfronthosting.co.za>
In-reply-to
Content
subprocess uses wrong encoding on Windows.


On Windows 10 with Python 3.5.1
from Command Prompt (cmd.exe)
> chcp 65001
> python -c "import subprocess; subprocess.getstatusoutput('ā')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "P:\Python35\lib\subprocess.py", line 808, in getstatusoutput
    data = check_output(cmd, shell=True, universal_newlines=True, stderr=STDOUT)
  File "P:\Python35\lib\subprocess.py", line 629, in check_output
    **kwargs).stdout
  File "P:\Python35\lib\subprocess.py", line 698, in run
    stdout, stderr = process.communicate(input, timeout=timeout)
  File "P:\Python35\lib\subprocess.py", line 1055, in communicate
    stdout = self.stdout.read()
  File "P:\Python35\lib\encodings\cp1257.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 2: character maps to <undefined>


from PowerShell
> [Console]::OutputEncoding = [System.Text.Encoding]::UTF8
> python -c "import subprocess; subprocess.getstatusoutput('ā')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "P:\Python35\lib\subprocess.py", line 808, in getstatusoutput
    data = check_output(cmd, shell=True, universal_newlines=True, stderr=STDOUT)
  File "P:\Python35\lib\subprocess.py", line 629, in check_output
    **kwargs).stdout
  File "P:\Python35\lib\subprocess.py", line 698, in run
    stdout, stderr = process.communicate(input, timeout=timeout)
  File "P:\Python35\lib\subprocess.py", line 1055, in communicate
    stdout = self.stdout.read()
  File "P:\Python35\lib\encodings\cp1257.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 2: character maps to <undefined>



As you can see even if consoles encoding is UTF-8 it still uses Windows ANSI codepage 1257
this happens because io.TextIOWrapper is used with default encoding which is locale.getpreferredencoding(False)
but that's wrong because that's not console's encoding.
I've attached a patch which fixes this by using correct console encoding with sys.stdout.encoding

Only note that there's different bug that when python is executed inside PowerShell's group expression then sys.stdout.encoding will be wrong

> [Console]::OutputEncoding.EncodingName
Unicode (UTF-8)
> ([Console]::OutputEncoding.EncodingName)
Unicode (UTF-8)
> python -c "import sys; print(sys.stdout.encoding)"
cp65001
> (python -c "import sys; print(sys.stdout.encoding)")
cp1257

it still should be cp65001 and that's why in this case subprocess will still fail even with my patch, but this some different bug.
History
Date User Action Args
2016-06-02 00:52:46davispuhsetrecipients: + davispuh, paul.moore, vstinner, tim.golden, ezio.melotti, zach.ware, steve.dower
2016-06-02 00:52:46davispuhsetmessageid: <1464828766.51.0.809606583575.issue27179@psf.upfronthosting.co.za>
2016-06-02 00:52:45davispuhlinkissue27179 messages
2016-06-02 00:52:44davispuhcreate