Message235655
This isn't a Python bug. The Windows console doesn't properly support UTF-8. See issue 1602 and Drekin's win-unicode-console, an alternative REPL based on the wide-character (UCS-2) console API.
FWIW, I attached a debugger to conhost.exe under Windows 7 to inspect what's happening here. In the client, the CRT's read() function calls WinAPI ReadFile. For a console handle this calls either ReadConsoleA or (in Windows 8+) NtReadFile. Either way, most of the action happens in the server process, conhost.exe.
The server's input buffer is Unicode, which gets encoded to CP 65001 (UTF-8) by calling WideCharToMultibyte. However the server incorrectly assumes the current codepage is a Windows ANSI codepage with a one-to-one mapping, i.e. that each 16-bit wchar_t maps to an 8-bit char in the current codepage. Since 'ł' gets UTF-8 encoded as the two-byte string b'\xc5\x82', the allocated buffer is too small by a byte. The server doesn't recover from this failure by allocating a larger buffer. It just reports back to the client process that it read 0 bytes. The CRT in turn sets the end-of-file (EOF) flag on the stdin FILE stream, which causes Python to exit 'normally'. |
|
Date |
User |
Action |
Args |
2015-02-10 01:51:45 | eryksun | set | recipients:
+ eryksun, vstinner, tim.golden, ezio.melotti, zach.ware, steve.dower, AGrzes |
2015-02-10 01:51:45 | eryksun | set | messageid: <1423533105.24.0.299289937116.issue23424@psf.upfronthosting.co.za> |
2015-02-10 01:51:45 | eryksun | link | issue23424 messages |
2015-02-10 01:51:43 | eryksun | create | |
|