This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author eryksun
Recipients AGrzes, eryksun, ezio.melotti, steve.dower, tim.golden, vstinner, zach.ware
Date 2015-02-10.01:51:43
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1423533105.24.0.299289937116.issue23424@psf.upfronthosting.co.za>
In-reply-to
Content
This isn't a Python bug. The Windows console doesn't properly support UTF-8. See issue 1602 and Drekin's win-unicode-console, an alternative REPL based on the wide-character (UCS-2) console API.

FWIW, I attached a debugger to conhost.exe under Windows 7 to inspect what's happening here. In the client, the CRT's read() function calls WinAPI ReadFile. For a console handle this calls either ReadConsoleA or (in Windows 8+) NtReadFile. Either way, most of the action happens in the server process, conhost.exe. 

The server's input buffer is Unicode, which gets encoded to CP 65001 (UTF-8) by calling WideCharToMultibyte. However the server incorrectly assumes the current codepage is a Windows ANSI codepage with a one-to-one mapping, i.e. that each 16-bit wchar_t maps to an 8-bit char in the current codepage. Since 'ł' gets UTF-8 encoded as the two-byte string b'\xc5\x82', the allocated buffer is too small by a byte. The server doesn't recover from this failure by allocating a larger buffer. It just reports back to the client process that it read 0 bytes. The CRT in turn sets the end-of-file (EOF) flag on the stdin FILE stream, which causes Python to exit 'normally'.
History
Date User Action Args
2015-02-10 01:51:45eryksunsetrecipients: + eryksun, vstinner, tim.golden, ezio.melotti, zach.ware, steve.dower, AGrzes
2015-02-10 01:51:45eryksunsetmessageid: <1423533105.24.0.299289937116.issue23424@psf.upfronthosting.co.za>
2015-02-10 01:51:45eryksunlinkissue23424 messages
2015-02-10 01:51:43eryksuncreate