For posterity for anyone that finds this old issue, I investigated this problem in the debugger in Windows 7. It turns out that more.com (the pager used by Python's help) calls MultiByteToWideChar [1] with dwFlags passed as MB_PRECOMPOSED (1), which is forbidden for UTF-8. The error message is just a generic error that incorrectly assumes decoding the byte string failed due to running out of memory.
You may be happy to learn that this problem is fixed in Windows 10.
[1]: https://msdn.microsoft.com/en-us/library/dd319072
Here are a few snapshots from the debugger.
more.com calls SetConsoleConversions from its init function, InitializeThings:
Breakpoint 0 hit
more!MORE::InitializeThings:
00000000`ff293058 48895c2408 mov qword ptr [rsp+8],rbx
ss:00000000`0024f7a0=0000000000000000
0:000> g
Breakpoint 2 hit
ulib!WSTRING::SetConsoleConversions:
000007fe`f6498934 8a05d6a80000 mov al,byte ptr
[ulib!WSTRING::_UseAnsiConversions
(000007fe`f64a3210)] ds:000007f
e`f64a3210=00
This causes decoding byte strings to use the current console codepage instead of the system ANSI or OEM codepage. The intention here is to allow a user to correctly display a text file that's in a different encoding. The decoded text is written to the console as Unicode via WriteConsoleW.
Here is the bad call where dwFlags (register rdx) is passed as MB_PRECOMPOSED (1), which is invalid for codepage 65001 (register rcx).
0:000> g
Breakpoint 1 hit
KERNELBASE!MultiByteToWideChar:
000007fe`fd191f00 fff3 push rbx
0:000> ? @rcx
Evaluate expression: 65001 = 00000000`0000fde9
0:000> r rdx
rdx=0000000000000001
In Windows 10 this argument is passed as 0, the correct value.
This problem occurs indirectly via a utility library named ulib.dll, which is used by Windows command-line utilities. It should only occur when console conversions are enabled. Otherwise ulib converts using the system OEM and ANSI codepages. I searched for other utilities that use ulib!WSTRING::SetConsoleConversions:
C:\>for %f in (C:\Windows\system32\*.exe) do @(^
More? dumpbin /imports "%f" | ^
More? findstr SetConsoleConversions && echo %f)
7FF713B8934 167 ?SetConsoleConversions@WSTRING@@SAXXZ
C:\Windows\system32\find.exe
I found that find.exe is also subject to this bug in Windows 7. It fails to print the result if the console is using codepage 65001:
C:\Temp\test>type test
eggs
spam
C:\Temp\test>find /n "spam" *
---------- TEST
[2]spam
C:\Temp\test>chcp 65001
Active code page: 65001
C:\Temp\test>find /n "spam" *
---------- TEST
This works correctly in Windows 10.
|