Message 31341 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	georg.brandl
Recipients
Date	2007-02-25.19:43:47
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
First of all: Python's Unicode handling is very consistent and straightforward, if you know the basics. Sadly, most people don't know the difference between Unicode and encoded strings. What you're seeing is not a bug, it is due to the fact that if you print Unicode to the console, and Python could correctly find out your terminal encoding, the Unicode string is automatically encoded in that encoding. If you output to a file, Python does not know which encoding you want to have, so all Unicode strings are converted to ascii only. Please direct further questions to the Python mailing list or newsgroup. The basic rule when handling Unicode is: use Unicode everywhere inside the program, and byte strings for input and output. So, your code is exactly the other way round: it takes a byte string, decodes it to unicode and then prints it. You should do it the other way: use Unicode literals in your code, and when you write something to a file, encode them in utf-8.

First of all: Python's Unicode handling is very consistent and straightforward, if you know the basics. Sadly, most people don't know the difference between Unicode and encoded strings.

What you're seeing is not a bug, it is due to the fact that if you print Unicode to the console, and Python could correctly find out your terminal encoding, the Unicode string is automatically encoded in that encoding.

If you output to a file, Python does not know which encoding you want to have, so all Unicode strings are converted to ascii only.

Please direct further questions to the Python mailing list or newsgroup.

The basic rule when handling Unicode is: use Unicode everywhere inside the program, and byte strings for input and output.
So, your code is exactly the other way round: it takes a byte string, decodes it to unicode and *then* prints it.

You should do it the other way: use Unicode literals in your code, and when you write something to a file, *encode* them in utf-8.

History
Date	User	Action	Args
2007-08-23 14:52:06	admin	link	issue1668295 messages
2007-08-23 14:52:06	admin	create