Author eryksun
Recipients Yoni Rozenshein, eryksun, giampaolo.rodola, paul.moore, steve.dower, tim.golden, vstinner, zach.ware
Date 2018-06-07.00:58:28
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1528333109.8.0.592728768989.issue33780@psf.upfronthosting.co.za>
In-reply-to
Content
> By default, the output of cmd is encoded with the "active" 
> codepage. In Python 3.6, you can decode this using 
> encoding='oem'.

FYI, the actual encoding is not necessarily "oem".

The console codepage may have been changed from the initial value by a SetConsoleCP call in the current process or another process (e.g. chcp.com, mode.com). For example, a batch script can switch to codepage 65001 to allow CMD to read a UTF-8 encoded batch file; or read UTF-8 from an external command in a `for /f` loop; or write UTF-8 to a disk file or pipe. 

(Only switch to codepage 65001 temporarily. Using UTF-8 for legacy console I/O is buggy. CMD, PowerShell, and Python 3.6+ aren't affected since they use the wide-character API for console I/O. But a legacy console application that uses the codepage implicitly with ReadFile and WriteFile for byte-based I/O may get invalid results such as reading a non-ASCII character as NUL, or the entire read failing, or writing garbage to the console after output that contains non-ASCII characters.)

To accommodate applications that use the current console codepage for standard I/O, Python could add two encodings that correspond to the current value of GetConsoleCP and GetConsoleOutputCP (e.g. named "conin" and "conout"). 

Additionally, we can't assume the console codepage is initially OEM. It depends on settings in the registry or the shell shortcut for the application that allocated the console. In particular, if a new console window is allocated by a process (either explicitly via AllocConsole or implicitly for a console app that either hasn't inherited a console or was created with the CREATE_NEW_CONSOLE or CREATE_NO_WINDOW creation flag), then the console loads custom settings from either the registry key "HKCU\Console\<window title>" or the shell shortcut (LNK file) that started the application. 

If the console uses the window-title registry key, it looks for a "CodePage" DWORD value. The key name is the normalized window title, which comes from the WindowTitle field of the process parameters. This can be set explicitly using the STARTUPINFO lpTitle field that's passed to CreateProcess. Otherwise the system uses the executable path as the default window title. The console normalizes the title string to create a valid key name by replacing backslash with underscore, and it also substitutes "%SystemRoot%" for the Windows directory, e.g. the default configuration key for CMD is "HKCU\Console\%SystemRoot%_system32_cmd.exe". 

The codepage can also be set in a shell shortcut (LNK file) [1]. When an application is started from a shell shortcut, the shell sets the STARTUPINFO flag STARTF_TITLEISLINKNAME and the lpTitle string to the fully-qualified path of the LNK file. In this case, the console reads the LNK file to load its settings, rather than using the window-title subkey in the registry. But the "HKCU\Console" root key is still used for the default settings.

Finally, if CMD is run without a console (i.e. using the DETACHED_PROCESS creation flag), the default codepage is ANSI, not OEM. This isn't hard-coded in CMD. It happens that GetConsoleCP returns 0 (i.e. CP_ACP) in this case.

[1]: https://msdn.microsoft.com/en-us/library/dd891330.aspx
History
Date User Action Args
2018-06-07 00:58:29eryksunsetrecipients: + eryksun, paul.moore, vstinner, giampaolo.rodola, tim.golden, zach.ware, steve.dower, Yoni Rozenshein
2018-06-07 00:58:29eryksunsetmessageid: <1528333109.8.0.592728768989.issue33780@psf.upfronthosting.co.za>
2018-06-07 00:58:29eryksunlinkissue33780 messages
2018-06-07 00:58:28eryksuncreate