This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients benjamin.peterson, gz, pitrou, poolie, r.david.murray, vila, vstinner
Date 2011-12-22.00:21:26
SpamBayes Score 8.2711615e-15
Marked as misclassified No
Message-id <4EF27885.6030204@haypocalc.com>
In-reply-to <CAA9uavDwz1NR1NRiviJBUdS6d+N7YyrnkKUYg_6-9oVeX-K06g@mail.gmail.com>
Content
This discussion is becoming very long, I didn't remember the original 
purpose. You want to use UTF-8 instead of ASCII, so what? What do you 
want to do with your nicely well decoded filenames? You cannot print it 
to your terminal nor pass it to a subprocess, because your terminal uses 
ASCII, as subprocess. I don't see how it would help you.

Thanks to the PEP 383, Python 3 "just works" with an ASCII locale 
encoding. You can list the content of a directory and display a filename 
to your terminal: it will be displayed correctly (even if the terminal 
uses the correct encoding, UTF-8, whereas Python has an empty 
environment and use ASCII); you can also pass the filename to a 
subprocess: the other program will be able to open the file.

I don't understand what is the problem that your are trying to solve.

On 22/12/2011 00:02, Martin Pool wrote:
> It is a de facto, not de jure standard: UTF-8 is how things are
> typically stored.

For your information, on FreeBSD, Solaris and Mac OS X, the "C" locale 
encoding uses the ISO-8859-1, whereas on Linux it uses the "ASCII" 
encoding. There is no such "de facto standard". Each platform uses a 
different encoding and handle codecs differently.

> Other software (eg gnome file handling utilities)
> makes this assumption.  See eg
> <http://www.cl.cam.ac.uk/~mgk25/unicode.html#linux>.

The Qt library (and so KDE) and the glib library (and so Gtk and Gnome) 
use also the locale encoding to encode and decode filenames.

The glib has an useful g_get_filename_charsets() function trying other 
encodings to format correctly a filename.

> I'm not sure what you expect a technical solution at the OS level
> would look like.  The api is 8-bit strings and that's not likely to
> change.

Mac OS X kept the old legacy bytes API, but the kernel enforces valid 
UTF-8 names for filenames. This is a good start to move forward to 
Unicode. On such system, we can make some assumptions. On Linux, we 
cannot do such assumptions today.
History
Date User Action Args
2011-12-22 00:21:27vstinnersetrecipients: + vstinner, pitrou, vila, benjamin.peterson, r.david.murray, gz, poolie
2011-12-22 00:21:26vstinnerlinkissue13643 messages
2011-12-22 00:21:26vstinnercreate