This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients Arfrever, ezio.melotti, gregory.p.smith, lemburg, loewis, vstinner
Date 2010-04-26.12:00:15
SpamBayes Score 0.0
Marked as misclassified No
Message-id <>
In-reply-to <>
Le lundi 26 avril 2010 13:06:48, vous avez écrit :
> I don't see what environment variables have to do with the file
> system.

A POSIX system only offers *one* function about the encoding: 
nl_langinfo(CODESET) and Python3 uses it for the filenames, environment 
variables and the command line arguments.

Are you suggesting that Python3 should support a encoding different for 
environment variables and the file system? How would the user configure it?

About filenames, Python3 choose the encoding using the locale, but the user 
cannot change it: sys.setfilesystemencoding() is removed by the site module.

> Also note that "mbcs" on Windows is a meta-encoding. The
> implementation of that encoding depends on the locale used by
> the Windows user. It's just a coincidence that this may actually
> work for the environment variables on Windows as well, but there's
> no guarantee.

os.getenv() should raise a TypeError on Windows if key is a byte string.

os.getenv() didn't support byte string. I patched it to support byte string 
(issue #8391, r80421). But I don't like my fix because we should reject 
support byte string *on Windows*. I would like to factorize the type check for 
all operations on the file system and environment variables in 

> On Unix, you often have the case that the environment variables
> use mixed encodings, e.g. the CGI interface is a good example
> where this happens per definition. The CGI environment can
> includes file system paths, data encoded in Latin-1 (or some
> other encoding), etc.

Since Python3 choosed to store environment variables as unicode string on 
Windows and POSIX, in this specific case you should reconvert the value to 
byte strings using fsencode() and then manipulate byte strings. Because 
Python3 uses surrogateescape, you will get the original byte string values.

My patch should help both cases: people using unicode objects and people using 
the native OS type (bytes on POSIX). As written in my previous message, you 
can still use byte strings if you want. My patch doesn't change that (on POSIX 
Date User Action Args
2010-04-26 12:00:17vstinnersetrecipients: + vstinner, lemburg, loewis, gregory.p.smith, ezio.melotti, Arfrever
2010-04-26 12:00:15vstinnerlinkissue8514 messages
2010-04-26 12:00:15vstinnercreate