This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Windows legacy I/O mode mistakenly ignores the device encoding
Type: behavior Stage: needs patch
Components: Interpreter Core, IO, Windows Versions: Python 3.10, Python 3.9, Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: eryksun, paul.moore, steve.dower, tim.golden, vstinner, zach.ware
Priority: normal Keywords:

Created on 2020-11-04 15:20 by eryksun, last changed 2022-04-11 14:59 by admin.

Messages (5)
msg380329 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-11-04 15:20
In Python 3.8+, legacy standard I/O mode uses the process code page from GetACP instead of the correct device encoding from GetConsoleCP and GetConsoleOutputCP. For example:

    C:\>chcp 850
    Active code page: 850
    C:\>set PYTHONLEGACYWINDOWSSTDIO=1

    C:\>py -3.7 -c "import sys; print(sys.stdin.encoding)"
    cp850
    C:\>py -3.8 -c "import sys; print(sys.stdin.encoding)"
    cp1252
    C:\>py -3.9 -c "import sys; print(sys.stdin.encoding)"
    cp1252

This is based on config_init_stdio_encoding() in Python/initconfig.c, which sets config->stdio_encoding via config_get_locale_encoding(). Cannot config->stdio_encoding be set to NULL for default behavior?

Computing this ahead of time would require separate encodings config->stdin_encoding, config->stdout_encoding, and config->stderr_encoding. And _Py_device_encoding would have to be duplicated as something like config_get_device_encoding(PyConfig *config, int fd, wchar_t **device_encoding).
msg380330 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-11-04 15:24
There's a related issue that affects opening duplicated file descriptors and opening "CON", "CONIN$", and "CONOUT$" in legacy I/O mode, but this case has always been broken. For Windows, _Py_device_encoding needs to be generalized to use _get_osfhandle and GetNumberOfConsoleInputEvents to detect and differentiate console input and output, instead of using isatty() and hard coding file descriptors 0-2.
msg380333 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-11-04 15:42
> This is based on config_init_stdio_encoding() in Python/initconfig.c, which sets config->stdio_encoding via config_get_locale_encoding(). Cannot config->stdio_encoding be set to NULL for default behavior?

I would like to get a PyConfig structure fully populated to make the Python initialization more deterministic and reliable. So PyConfig fully control used encodings.

The solution here is to fix config_init_stdio_encoding() to use GetConsoleCP() and GetConsoleOutputCP() to build a "cpXXX" string.

This issue seems to be a regression that I introduced in Python 3.8 with the PEP 587 (PyConfig). I didn't notice this subtle case during my refactoring. Relying on os.device_encoding() when the encoding is NULL is not obvious. That's why I prefer to get PyConfig full populated ;-)

It would be nice to get an unit test for this case.
msg380335 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-11-04 16:13
> The solution here is to fix config_init_stdio_encoding() to use 
> GetConsoleCP() and GetConsoleOutputCP() to build a "cpXXX" string.

But, as I mentioned, that's only possible by replacing config->stdio_encoding with three separate settings: config->stdin_encoding, config->stdout_encoding, and config->stderr_encoding.
msg380343 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-11-04 16:50
> It would be nice to get an unit test for this case.

The process code page from GetACP() is either an ANSI code page or CP_UTF8 (65001). It should never be a Western OEM code page such as 850. In that case, a reliable unit test would check that the configured encoding is a particular OEM code page. For example, spawn a new interpreter in a windowless console session (i.e. creationflags=CREATE_NO_WINDOW). Set the session's input code page to 850 via ctypes.WinDLL('kernel32').SetConsoleCP(850). Set os.environ['PYTHONLEGACYWINDOWSSTDIO'] = '1'. Then spawn [sys.executable, '-c', 'import sys; print(sys.stdin.encoding)'], and verify that the output is 'cp850'.
History
Date User Action Args
2022-04-11 14:59:37adminsetgithub: 86427
2020-11-04 16:50:47eryksunsetmessages: + msg380343
2020-11-04 16:13:08eryksunsetmessages: + msg380335
2020-11-04 15:42:24vstinnersetnosy: + vstinner
messages: + msg380333
2020-11-04 15:24:56eryksunsetmessages: + msg380330
2020-11-04 15:20:48eryksuncreate