This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients benjamin.peterson, gz, poolie, r.david.murray, vstinner
Date 2011-12-21.00:41:45
SpamBayes Score 3.8857806e-16
Marked as misclassified No
Message-id <1324428148.62.0.365731349442.issue13643@psf.upfronthosting.co.za>
In-reply-to
Content
> The main problem I see being discussed is that
> changing the encoding after Python starts would
> be dangerous, which I agree with, but we're not
> proposing to do that.

Not after Python start. Using two encodings at the same would just adds new problems. On UNIX (at least on Linux?), it is mandatory to use the same encoding for:

 - command line arguments
 - environment variables
 - filenames
 - and more generally, all data exchanged with the system and other programs

Let's take an example: you use UTF-8 for filenames and ISO-8859-1 for all other data. You want to check if a specific filename is present in your home directory: encode the filename to UTF-8 and read the home directory from the HOME environment variable. But environment variables are decoded from ISO-8859-1, so you have to encode them back to ISO-8859-1 to avoid mojibake (and real bugs, like file not found).

Ok, let say that filenames and environment variables are UTF-8 and that other data are ISO-8859-1. You would like to play a MP3 using mplayer: you pass the filename encoded to UTF-8 as an argument of mplayer command line. But mplayer uses ISO-8859-1 to decode its command line (it's not exactly like that, but image that it's the case): mplayer will be unable to find your MP3.

etc.

That's why on UNIX there is one unique encoding, the locale encoding, and that Python uses the same encoding (called "the filesystem encoding", I don't like this name, sys.getfilesystemencoding()).

--

It is no more possible to change the Python filesystem encoding at runtime (I remove sys.setfilesystemencoding()) because I would like to inconsistency. If you decoded a filename before changing the encoding, and then you decode the same filename after changing the encoding: you will get two different names and encode the filenames back will give you two different byte sequences (and more likely, a Unicode encode error).

It was possible to override the filesystem encoding using a PYTHONFSENCODING environment variable, but it introduced all the inconsistencies listed before (especially with external programs).

Now the only right way to change the Python (filesystem) encoding is the UNIX way of doing that: set LC_ALL, LC_CTYPE or LANG environment variable (configure your locale).
History
Date User Action Args
2011-12-21 00:42:28vstinnersetrecipients: + vstinner, benjamin.peterson, r.david.murray, gz, poolie
2011-12-21 00:42:28vstinnersetmessageid: <1324428148.62.0.365731349442.issue13643@psf.upfronthosting.co.za>
2011-12-21 00:41:45vstinnerlinkissue13643 messages
2011-12-21 00:41:45vstinnercreate