Author vstinner
Recipients Sworddragon, a.badger, bkabrda, larry, lemburg, loewis, ncoghlan, pitrou, r.david.murray, serhiy.storchaka, terry.reedy, vstinner
Date 2013-12-09.10:13:15
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1386583995.79.0.249540871674.issue19846@psf.upfronthosting.co.za>
In-reply-to
Content
I didn't understand Serhiy's "ls" example. I tried:

$ mkdir unicode
$ cd unicode
$ python3 -c 'open("ab\xe9.txt", "w").close()'
$ python3 -c 'open("euro\u20ac.txt", "w").close()'
$ ls
abé.txt  euro€.txt
$ LANG=C ls
ab??.txt  euro???.txt


Ah yes, I didn't remember that "ls" is aware of the locale encoding.

printf() and wprintf() behave differently on unencodable/undecoable characters:
http://unicodebook.readthedocs.org/en/latest/programming_languages.html#printf-functions-family

Again, the issue is not specific to Python. So it's time to learn how to configure correctly your locales.

About the "interoperability" point I mentionned in my first message ("This encoding is the best choice for interopability with other (python2 or non python) programs."): if you work around the annoying ASCII encoding by forcing UTF-8 encoding, Python may produce data which would be incompatible with other applications following POSIX and so using the ASCII encoding.
History
Date User Action Args
2013-12-09 10:13:15vstinnersetrecipients: + vstinner, lemburg, loewis, terry.reedy, ncoghlan, pitrou, larry, a.badger, r.david.murray, Sworddragon, serhiy.storchaka, bkabrda
2013-12-09 10:13:15vstinnersetmessageid: <1386583995.79.0.249540871674.issue19846@psf.upfronthosting.co.za>
2013-12-09 10:13:15vstinnerlinkissue19846 messages
2013-12-09 10:13:15vstinnercreate