Message290528
I'm closing this issue since Python's encodings in this case -- 852 (OEM) and 1250 (ANSI) -- both correctly map U+0159:
>>> u'\u0159'.encode('852')
'\xfd'
>>> u'\u0159'.encode('1250')
'\xf8'
You must be using an encoding that doesn't map U+0159. If you're using the console's default codepage (i.e. you haven't run chcp.com, mode.com, or called SetConsoleOutputCP), then Python started with stdout.encoding set to your locale's OEM codepage encoding. For example, if you're using a U.S. locale, it's cp437, and if you're using a Western Europe locale, it's cp850. Neither of these includes U+0159.
We're presented with this codepage hell because the WriteFile and WriteConsoleA functions write a stream of bytes to the console, and it needs to be told how to decode these bytes to get Unicode text. It would be nice if the console's UTF-8 implementation (codepage 65001) wasn't buggy, but Microsoft has never cared enough to fix it (at least not completely; it's still broken for input in Windows 10).
That leaves the wide-character UTF-16 function, WriteConsoleW, as the best alternative. Using this function requires bypassing Python's normal standard I/O implementation. This has been implemented as of 3.6. But for older versions you'll need to install and enable win_unicode_console. |
|
Date |
User |
Action |
Args |
2017-03-26 13:40:02 | eryksun | set | recipients:
+ eryksun, paul.moore, vstinner, tim.golden, ezio.melotti, martin.panter, zach.ware, steve.dower, Robert Baker |
2017-03-26 13:40:02 | eryksun | set | messageid: <1490535602.74.0.00888524520314.issue29907@psf.upfronthosting.co.za> |
2017-03-26 13:40:02 | eryksun | link | issue29907 messages |
2017-03-26 13:40:02 | eryksun | create | |
|