This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author eryksun
Recipients aroberge, docs@python, eryksun, jessevsilverman, methane, paul.moore, steve.dower, tim.golden, zach.ware
Date 2021-06-01.05:39:19
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1622525960.21.0.654739745293.issue44275@roundup.psfhosted.org>
In-reply-to
Content
> PS > [System.Console]::InputEncoding = $OutputEncoding

If changing the console input codepage to UTF-8 fixes the mojibake problem, then probably you're running Python in UTF-8 mode. pydoc.tempfilepager() encodes the temporary file with the preferred encoding, which normally would not be UTF-8. There are possible variations in how your system and the console are configured, so I can't say for sure.

tempfilepager() could temporarily set the console's input codepage to UTF-8 via SetConsoleCP(65001). However, if python.exe is terminated or crashes before it can reset the codepage, the console will be left in a bad state. By bad state, I mean that leaving the input code page set to UTF-8 is broken. Legacy console applications rely on the input codepage for reading input via ReadFile() and ReadConsoleA(), but the console host (conhost.exe or openconsole.exe) doesn't support reading input as UTF-8. It simply replaces each non-ASCII character (i.e. characters that require 2-4 bytes as UTF-8) with a null byte, e.g. "abĀcd" is read as "ab\x00cd". 

If you think the risk of crashing is negligible, and the downside of breaking legacy applications in the console session is trivial, then paging with full Unicode support is easily possible. Implement _winapi.GetConsoleCP() and _winapi.SetConsoleCP(). Write UTF-8 text to the temporary file. Change the console input codepage to UTF-8 before spawning "more.com". Revert to the original input codepage in the finally block.

A more conservative fix would be to change tempfilepager() to encode the file using the console's current input codepage, GetConsoleCP(). At least there's no mojibake.

> PS > $OutputEncoding =  [System.Text.Encoding]::GetEncoding("UTF-8")

FYI, $OutputEncoding in PowerShell has nothing to do with the python.exe and more.com processes, nor the console session to which they're attached.

> PS > [System.Console]::OutputEncoding = $OutputEncoding

The console output code page is irrelevant since more.com writes wide-character text via WriteConsoleW() and decodes the file using the console input code page, GetConsoleCP(). The console output codepage from GetConsoleOutputCP() isn't used for anything here.
History
Date User Action Args
2021-06-01 05:39:20eryksunsetrecipients: + eryksun, paul.moore, tim.golden, aroberge, methane, docs@python, zach.ware, steve.dower, jessevsilverman
2021-06-01 05:39:20eryksunsetmessageid: <1622525960.21.0.654739745293.issue44275@roundup.psfhosted.org>
2021-06-01 05:39:20eryksunlinkissue44275 messages
2021-06-01 05:39:19eryksuncreate