This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author eryksun
Recipients BreamoreBoy, brian.curtin, eryksun, hct, r.david.murray, terry.reedy, tim.golden, vstinner
Date 2016-03-16.04:28:50
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1458102534.78.0.0601152409794.issue19914@psf.upfronthosting.co.za>
In-reply-to
Content
For posterity for anyone that finds this old issue, I investigated this problem in the debugger in Windows 7. It turns out that more.com (the pager used by Python's help) calls MultiByteToWideChar [1] with dwFlags passed as MB_PRECOMPOSED (1), which is forbidden for UTF-8. The error message is just a generic error that incorrectly assumes decoding the byte string failed due to running out of memory.

You may be happy to learn that this problem is fixed in Windows 10.

[1]: https://msdn.microsoft.com/en-us/library/dd319072

Here are a few snapshots from the debugger.

more.com calls SetConsoleConversions from its init function, InitializeThings:

    Breakpoint 0 hit
    more!MORE::InitializeThings:
    00000000`ff293058 48895c2408      mov     qword ptr [rsp+8],rbx
                ss:00000000`0024f7a0=0000000000000000
    0:000> g
    Breakpoint 2 hit
    ulib!WSTRING::SetConsoleConversions:
    000007fe`f6498934 8a05d6a80000    mov     al,byte ptr
                [ulib!WSTRING::_UseAnsiConversions
                 (000007fe`f64a3210)] ds:000007f
    e`f64a3210=00

This causes decoding byte strings to use the current console codepage instead of the system ANSI or OEM codepage. The intention here is to allow a user to correctly display a text file that's in a different encoding. The decoded text is written to the console as Unicode via WriteConsoleW.

Here is the bad call where dwFlags (register rdx) is passed as MB_PRECOMPOSED (1), which is invalid for codepage 65001 (register rcx).

    0:000> g
    Breakpoint 1 hit
    KERNELBASE!MultiByteToWideChar:
    000007fe`fd191f00 fff3            push    rbx
    0:000> ? @rcx
    Evaluate expression: 65001 = 00000000`0000fde9
    0:000> r rdx
    rdx=0000000000000001

In Windows 10 this argument is passed as 0, the correct value.

This problem occurs indirectly via a utility library named ulib.dll, which is used by Windows command-line utilities. It should only occur when console conversions are enabled. Otherwise ulib converts using the system OEM and ANSI codepages.  I searched for other utilities that use ulib!WSTRING::SetConsoleConversions:

    C:\>for %f in (C:\Windows\system32\*.exe) do @(^
    More? dumpbin /imports "%f" | ^
    More? findstr SetConsoleConversions && echo %f)
               7FF713B8934   167 ?SetConsoleConversions@WSTRING@@SAXXZ
    C:\Windows\system32\find.exe

I found that find.exe is also subject to this bug in Windows 7. It fails to print the result if the console is using codepage 65001:

    C:\Temp\test>type test
    eggs
    spam

    C:\Temp\test>find /n "spam" *

    ---------- TEST
    [2]spam

    C:\Temp\test>chcp 65001
    Active code page: 65001

    C:\Temp\test>find /n "spam" *

    ---------- TEST

This works correctly in Windows 10.
History
Date User Action Args
2016-03-16 04:28:55eryksunsetrecipients: + eryksun, terry.reedy, vstinner, tim.golden, r.david.murray, brian.curtin, BreamoreBoy, hct
2016-03-16 04:28:54eryksunsetmessageid: <1458102534.78.0.0601152409794.issue19914@psf.upfronthosting.co.za>
2016-03-16 04:28:54eryksunlinkissue19914 messages
2016-03-16 04:28:51eryksuncreate