This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author eryksun
Recipients davispuh, eryksun, ezio.melotti, martin.panter, paul.moore, steve.dower, tim.golden, vstinner, zach.ware
Date 2016-06-02.04:12:29
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1464840750.67.0.274821080849.issue27179@psf.upfronthosting.co.za>
In-reply-to
Content
There is no right encoding as far as I can see. 

If it's attached to a console (i.e. conhost.exe), then cmd.exe uses the console's output codepage when writing to a pipe or file, which is the scenario that your patch attempts to address. But if you pass creationflags=CREATE_NO_WINDOW, then the new console (created without a window) uses the OEM codepage, CP_OEMCP. And if you pass creationflags=DETACHED_PROCESS (i.e. no console), cmd uses the ANSI codepage, CP_ACP. There's also a "/u" option to force cmd to use the native Unicode encoding on Windows, UTF-16LE.

Note that the above only considers cmd.exe. Its child processes can write output using any encoding. You may end up with several different encodings present in the same stream. Many, if not most, programs don't use the console's current codepage when writing to a pipe or file. Commonly they default to OEM, ANSI, UTF-8, or UTF-16LE. For example, Windows Python uses ANSI for standard I/O that's not a console, unless you set PYTHONIOENCODING. 

Even if a called program cares about the console output codepage, your patch doesn't implement this robustly. It uses sys.stdout and sys.stderr, but those can be reassigned. Even sys.__stdout__ and sys.__stderr__ may be irrelevant. Python could be run via pythonw.exe for which the latter are None (unless it's started with non-NULL standard handles). Or python.exe could be run with standard I/O redirected to pipes or files, defaulting to ANSI. Also, the current program or called program could change the console encoding via chcp.com, which is just an indirect way of calling the WinAPI functions SetConsoleCP and SetConsoleOutputCP. 

There's no common default encoding for standard I/O on Windows, especially not a common UTF encoding, so universal_newlines=True, getoutput, and getstatusoutput may be of limited use. Preferably a calling program can set an option like cmd's "/u" or Python's PYTHONIOENCODING to force using a Unicode encoding, and then manually decode the output by wrapping stdout/stderr in instances of io.TextIOWrapper. It would help if subprocess.Popen had parameters for encoding and errors.
History
Date User Action Args
2016-06-02 04:12:30eryksunsetrecipients: + eryksun, paul.moore, vstinner, tim.golden, ezio.melotti, martin.panter, zach.ware, steve.dower, davispuh
2016-06-02 04:12:30eryksunsetmessageid: <1464840750.67.0.274821080849.issue27179@psf.upfronthosting.co.za>
2016-06-02 04:12:30eryksunlinkissue27179 messages
2016-06-02 04:12:29eryksuncreate