Title: Unicode encoding failure
Type: behavior Stage: resolved
Components: Unicode, Windows Versions: Python 2.7
Status: closed Resolution: out of date
Dependencies: Superseder: windows console doesn't print or input Unicode
View: 1602
Assigned To: Nosy List: Robert Baker, eryksun, ezio.melotti, martin.panter, paul.moore, steve.dower, tim.golden, vstinner, zach.ware
Priority: normal Keywords:

Created on 2017-03-26 02:26 by Robert Baker, last changed 2017-03-26 13:40 by eryksun. This issue is now closed.

Messages (5)
msg290503 - (view) Author: Robert Baker (Robert Baker) Date: 2017-03-26 02:26
Using Python 2.7 (not IDLE) on Windows 10.

I have tried to use a Python 2.7 program to print the name of Czech composer Antonín Dvořák. I remembered to add the "u" before the string, but regardless of whether I encode the caron-r as a literal character (pasted from Windows Character Map) or as \u0159, it gives the error that character 0159 is undefined. This is incorrect; that character has been defined as "lower case r with caron above" for several years now. (The interpreter has no problem with the ANSI characters in the string.)
msg290506 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2017-03-26 06:06
I presume you are trying to print to the normal Windows console. I understand the console was not well supported until Python 3.6 (see Issue 1602). Have you tried that version?

I’ll leave this open for someone more experienced to confirm, but I suspect what you want may not be possible with 2.7.
msg290513 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2017-03-26 07:41
Also, you need to:

1. Ensure you are using characters that are available in the encoding that sys.stdout uses - in Python prior to 3.6, this would be your Windows *console* code page, and in 3.6+ would be UTF-8.
2. Declare the encoding of your source code if you are not using the default (which is ASCII in Python 2, and UTF-8 in Python 3).

Specifically, if you write your source in UTF-8, or use an encoding declaration or \u escapes, and you use Python 3.6, this problem will likely have gone away.
msg290517 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-03-26 08:24
For Python 2, there is
msg290528 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2017-03-26 13:40
I'm closing this issue since Python's encodings in this case -- 852 (OEM) and 1250 (ANSI) -- both correctly map U+0159:

    >>> u'\u0159'.encode('852')
    >>> u'\u0159'.encode('1250')

You must be using an encoding that doesn't map U+0159. If you're using the console's default codepage (i.e. you haven't run,, or called SetConsoleOutputCP), then Python started with stdout.encoding set to your locale's OEM codepage encoding. For example, if you're using a U.S. locale, it's cp437, and if you're using a Western Europe locale, it's cp850. Neither of these includes U+0159.

We're presented with this codepage hell because the WriteFile and WriteConsoleA functions write a stream of bytes to the console, and it needs to be told how to decode these bytes to get Unicode text. It would be nice if the console's UTF-8 implementation (codepage 65001) wasn't buggy, but Microsoft has never cared enough to fix it (at least not completely; it's still broken for input in Windows 10). 

That leaves the wide-character UTF-16 function, WriteConsoleW, as the best alternative. Using this function requires bypassing Python's normal standard I/O implementation. This has been implemented as of 3.6. But for older versions you'll need to install and enable win_unicode_console.
Date User Action Args
2017-03-26 13:40:02eryksunsetstatus: open -> closed

nosy: + eryksun
messages: + msg290528

stage: resolved
2017-03-26 08:24:40vstinnersetmessages: + msg290517
2017-03-26 07:41:15paul.mooresetmessages: + msg290513
2017-03-26 06:06:32martin.pantersetnosy: + ezio.melotti, paul.moore, tim.golden, vstinner, martin.panter, zach.ware, steve.dower
messages: + msg290506

superseder: windows console doesn't print or input Unicode
components: + Unicode, Windows
resolution: out of date
2017-03-26 02:26:08Robert Bakercreate