Message 205670 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	Sworddragon, a.badger, bkabrda, larry, lemburg, loewis, ncoghlan, pitrou, r.david.murray, serhiy.storchaka, terry.reedy, vstinner
Date	2013-12-09.10:13:15
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1386583995.79.0.249540871674.issue19846@psf.upfronthosting.co.za>
In-reply-to

Content
I didn't understand Serhiy's "ls" example. I tried: $ mkdir unicode $ cd unicode $ python3 -c 'open("ab\xe9.txt", "w").close()' $ python3 -c 'open("euro\u20ac.txt", "w").close()' $ ls abé.txt euro€.txt $ LANG=C ls ab??.txt euro???.txt Ah yes, I didn't remember that "ls" is aware of the locale encoding. printf() and wprintf() behave differently on unencodable/undecoable characters: http://unicodebook.readthedocs.org/en/latest/programming_languages.html#printf-functions-family Again, the issue is not specific to Python. So it's time to learn how to configure correctly your locales. About the "interoperability" point I mentionned in my first message ("This encoding is the best choice for interopability with other (python2 or non python) programs."): if you work around the annoying ASCII encoding by forcing UTF-8 encoding, Python may produce data which would be incompatible with other applications following POSIX and so using the ASCII encoding.

I didn't understand Serhiy's "ls" example. I tried:

$ mkdir unicode
$ cd unicode
$ python3 -c 'open("ab\xe9.txt", "w").close()'
$ python3 -c 'open("euro\u20ac.txt", "w").close()'
$ ls
abé.txt  euro€.txt
$ LANG=C ls
ab??.txt  euro???.txt


Ah yes, I didn't remember that "ls" is aware of the locale encoding.

printf() and wprintf() behave differently on unencodable/undecoable characters:
http://unicodebook.readthedocs.org/en/latest/programming_languages.html#printf-functions-family

Again, the issue is not specific to Python. So it's time to learn how to configure correctly your locales.

About the "interoperability" point I mentionned in my first message ("This encoding is the best choice for interopability with other (python2 or non python) programs."): if you work around the annoying ASCII encoding by forcing UTF-8 encoding, Python may produce data which would be incompatible with other applications following POSIX and so using the ASCII encoding.

History
Date	User	Action	Args
2013-12-09 10:13:15	vstinner	set	recipients: + vstinner, lemburg, loewis, terry.reedy, ncoghlan, pitrou, larry, a.badger, r.david.murray, Sworddragon, serhiy.storchaka, bkabrda
2013-12-09 10:13:15	vstinner	set	messageid: <1386583995.79.0.249540871674.issue19846@psf.upfronthosting.co.za>
2013-12-09 10:13:15	vstinner	link	issue19846 messages
2013-12-09 10:13:15	vstinner	create