Author eryksun
Recipients eryksun, izbyshev, methane, paul.moore, steve.dower, tim.golden, u36959, vstinner, zach.ware
Date 2020-12-23.00:50:37
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1608684637.74.0.800451087403.issue42707@roundup.psfhosted.org>
In-reply-to
Content
> How about treating only UTF-8 and leave legacy environment as-is?
> * When GetConsoleCP() returns CP_UTF8, use UTF-8 for stdin. 
> Otherwise, use ANSI.

Okay, and also when GetConsoleCP() fails because there's no console (e.g. python.exe w/ DETACHED_PROCESS creation flag, or pythonw.exe). 

However, using UTF-8 for the input code page is currently broken in many cases, so it should not be promoted as a recommended solution until Microsoft fixes their broken code (which should have been fixed 20 years ago; it's ridiculous). Legacy console applications rely on ReadFile and ReadConsoleA. Setting the input code page to UTF-8 is limited to reading 7-bit ASCII (ordinals 0-127). Other characters get converted to null bytes. For example:

    >>> kernel32.SetConsoleCP(65001)
    1
    >>> os.read(0, 10)
    ab¡¢£¤cd
    b'ab\x00\x00\x00\x00cd\r\n'
History
Date User Action Args
2020-12-23 00:50:37eryksunsetrecipients: + eryksun, paul.moore, vstinner, tim.golden, methane, zach.ware, steve.dower, izbyshev, u36959
2020-12-23 00:50:37eryksunsetmessageid: <1608684637.74.0.800451087403.issue42707@roundup.psfhosted.org>
2020-12-23 00:50:37eryksunlinkissue42707 messages
2020-12-23 00:50:37eryksuncreate