Author gwideman
Recipients benjamin.peterson, docs@python, eric.araujo, ezio.melotti, gwideman, lemburg, pitrou, tshepang, vstinner
Date 2014-03-21.03:49:52
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
At the moment I've run out of time to exert much forward push on this.

By way of temporary summary/suggestion for regrouping: Focus on what this page is intending to deliver. What concepts should readers of this page be able to distinguish and understand when they are finished?

To scope out the needed concepts, I suggest identifying representative unicode-related stumbling blocks (possibly from stackoverflow questions).

Here's an example case: just trying to get trivial "beyond ASCII" functionality to work on Windows (Win7, Python 3.3):

s = 'knight \u265E'
print('Hello ' + s)

... which fails with:

"UnicodeEncodeError: 'charmap' codec can't encode character '\u265e' in position 13: character maps to undefined". 

A naive attempt to fix this by using s.encode() results in the "+" operation failing.

What paths forward do programmers explore in an effort to have this code (a) not throw an exception, and produce at least some output, and (b) make it produce the correct output?

And why does it work as intended on linux?

The set of concepts identified and explained in this article needs to be sufficient to underpin an understanding of the distinct data types, encodings, decodings, translations, settings etc relevant to this problem, and how to use them to get a desired result.

There are similar problems that occur at other Python-system boundaries, which would further illuminate the set of necessary concepts.

Thanks for all comments.

-- Graham
Date User Action Args
2014-03-21 03:49:52gwidemansetrecipients: + gwideman, lemburg, pitrou, vstinner, benjamin.peterson, ezio.melotti, eric.araujo, docs@python, tshepang
2014-03-21 03:49:52gwidemansetmessageid: <>
2014-03-21 03:49:52gwidemanlinkissue20906 messages
2014-03-21 03:49:52gwidemancreate