Issue42261
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2020-11-04 15:20 by eryksun, last changed 2022-04-11 14:59 by admin.
Messages (5) | |||
---|---|---|---|
msg380329 - (view) | Author: Eryk Sun (eryksun) * | Date: 2020-11-04 15:20 | |
In Python 3.8+, legacy standard I/O mode uses the process code page from GetACP instead of the correct device encoding from GetConsoleCP and GetConsoleOutputCP. For example: C:\>chcp 850 Active code page: 850 C:\>set PYTHONLEGACYWINDOWSSTDIO=1 C:\>py -3.7 -c "import sys; print(sys.stdin.encoding)" cp850 C:\>py -3.8 -c "import sys; print(sys.stdin.encoding)" cp1252 C:\>py -3.9 -c "import sys; print(sys.stdin.encoding)" cp1252 This is based on config_init_stdio_encoding() in Python/initconfig.c, which sets config->stdio_encoding via config_get_locale_encoding(). Cannot config->stdio_encoding be set to NULL for default behavior? Computing this ahead of time would require separate encodings config->stdin_encoding, config->stdout_encoding, and config->stderr_encoding. And _Py_device_encoding would have to be duplicated as something like config_get_device_encoding(PyConfig *config, int fd, wchar_t **device_encoding). |
|||
msg380330 - (view) | Author: Eryk Sun (eryksun) * | Date: 2020-11-04 15:24 | |
There's a related issue that affects opening duplicated file descriptors and opening "CON", "CONIN$", and "CONOUT$" in legacy I/O mode, but this case has always been broken. For Windows, _Py_device_encoding needs to be generalized to use _get_osfhandle and GetNumberOfConsoleInputEvents to detect and differentiate console input and output, instead of using isatty() and hard coding file descriptors 0-2. |
|||
msg380333 - (view) | Author: STINNER Victor (vstinner) * | Date: 2020-11-04 15:42 | |
> This is based on config_init_stdio_encoding() in Python/initconfig.c, which sets config->stdio_encoding via config_get_locale_encoding(). Cannot config->stdio_encoding be set to NULL for default behavior? I would like to get a PyConfig structure fully populated to make the Python initialization more deterministic and reliable. So PyConfig fully control used encodings. The solution here is to fix config_init_stdio_encoding() to use GetConsoleCP() and GetConsoleOutputCP() to build a "cpXXX" string. This issue seems to be a regression that I introduced in Python 3.8 with the PEP 587 (PyConfig). I didn't notice this subtle case during my refactoring. Relying on os.device_encoding() when the encoding is NULL is not obvious. That's why I prefer to get PyConfig full populated ;-) It would be nice to get an unit test for this case. |
|||
msg380335 - (view) | Author: Eryk Sun (eryksun) * | Date: 2020-11-04 16:13 | |
> The solution here is to fix config_init_stdio_encoding() to use > GetConsoleCP() and GetConsoleOutputCP() to build a "cpXXX" string. But, as I mentioned, that's only possible by replacing config->stdio_encoding with three separate settings: config->stdin_encoding, config->stdout_encoding, and config->stderr_encoding. |
|||
msg380343 - (view) | Author: Eryk Sun (eryksun) * | Date: 2020-11-04 16:50 | |
> It would be nice to get an unit test for this case. The process code page from GetACP() is either an ANSI code page or CP_UTF8 (65001). It should never be a Western OEM code page such as 850. In that case, a reliable unit test would check that the configured encoding is a particular OEM code page. For example, spawn a new interpreter in a windowless console session (i.e. creationflags=CREATE_NO_WINDOW). Set the session's input code page to 850 via ctypes.WinDLL('kernel32').SetConsoleCP(850). Set os.environ['PYTHONLEGACYWINDOWSSTDIO'] = '1'. Then spawn [sys.executable, '-c', 'import sys; print(sys.stdin.encoding)'], and verify that the output is 'cp850'. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:37 | admin | set | github: 86427 |
2020-11-04 16:50:47 | eryksun | set | messages: + msg380343 |
2020-11-04 16:13:08 | eryksun | set | messages: + msg380335 |
2020-11-04 15:42:24 | vstinner | set | nosy:
+ vstinner messages: + msg380333 |
2020-11-04 15:24:56 | eryksun | set | messages: + msg380330 |
2020-11-04 15:20:48 | eryksun | create |