Author lemburg
Recipients Arfrever, ezio.melotti, gregory.p.smith, lemburg, loewis, vstinner
Date 2010-04-30.13:58:23
SpamBayes Score 4.75541e-07
Marked as misclassified No
Message-id <4BDAE1FD.6020307@egenix.com>
In-reply-to <201004261400.08237.victor.stinner@haypocalc.com>
Content
STINNER Victor wrote:
> 
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
> 
> Le lundi 26 avril 2010 13:06:48, vous avez écrit :
>> I don't see what environment variables have to do with the file
>> system.
> 
> A POSIX system only offers *one* function about the encoding: 
> nl_langinfo(CODESET) and Python3 uses it for the filenames, environment 
> variables and the command line arguments.
> 
> Are you suggesting that Python3 should support a encoding different for 
> environment variables and the file system? How would the user configure it?

It's better to let the application decide how to solve this problem
and in order to allow for this, the encodings must be adjustable.

By using fsencode() and fsdecode() in stdlib functions, you basically
prevent this kind of adjustment, since they hardcode the use of
a single encoding which is guessed by looking at nl_langinfo(CODESET).

Note that application may well use completely different encodings
in the environment and for things like pipes than what the user
setup for her GUI environment.

In the end, this will only lead to the same kind of mess we've
had with sys.setdefaultencoding() in Python 2.x, only this
time with sys.setfilesystemencoding() and I'd like to avoid that.

> Since Python3 choosed to store environment variables as unicode string on 
> Windows and POSIX, in this specific case you should reconvert the value to 
> byte strings using fsencode() and then manipulate byte strings. Because 
> Python3 uses surrogateescape, you will get the original byte string values.

Well, yes, but that's a cludge isn't it ?

If you know that e.g. your environment variables are going to have
Latin-1 data (say some content-type variable has this information),
but the user's default LANG setting is UTF-8, Python will fetch the
data as broken Unicode data, you then have to convert it back to bytes
and then back to Unicode using the correct Latin-1 encoding.

It would be a lot better to have the application provide the
encoding to the os.getenv() function and have Python do the
correct decoding right from the start.
History
Date User Action Args
2010-04-30 13:58:28lemburgsetrecipients: + lemburg, loewis, gregory.p.smith, vstinner, ezio.melotti, Arfrever
2010-04-30 13:58:24lemburglinkissue8514 messages
2010-04-30 13:58:23lemburgcreate