Message104648
Le vendredi 30 avril 2010 15:58:28, vous avez écrit :
> It's better to let the application decide how to solve this problem
> and in order to allow for this, the encodings must be adjustable.
On POSIX, use byte strings to avoid encoding issues. Examples:
subprocess.call(['env'], {b'TEST: b'a\xff-'}) # env
subprocess.call(['echo', b'a\xff-']) # command line
open('a\xff-') # filename
os.getenv(b'a\xff-') # get env (result as unicode)
Are you talking about issues on Windows?
> By using fsencode() and fsdecode() in stdlib functions, you basically
> prevent this kind of adjustment, ...
Not if you use byte strings. On POSIX, an unicode string is always converted
at the end for the system call (using sys.getfilesystemencoding()).
> If you know that e.g. your environment variables are going to have
> Latin-1 data (say some content-type variable has this information),
> but the user's default LANG setting is UTF-8, Python will fetch the
> data as broken Unicode data, you then have to convert it back to bytes
> and then back to Unicode using the correct Latin-1 encoding.
>
> It would be a lot better to have the application provide the
> encoding to the os.getenv() function and have Python do the
> correct decoding right from the start.
You mean that os.getenv() should have an optionnal argument? Something like:
def getenv(key, default=None, encoding=None):
value = environ.get(key, default)
if encoding:
value = value.encode(sys.getfileystemencoding(), 'surrogateescape')
value = value.decode(encoding, 'surrogateescape')
return value
There are many indirect calls to os.getenv() (eg. by using os.environ.get()):
- curses uses TERM
- webbrowser uses PROGRAMFILES (path)
- distutils.msvc9compiler uses "VS%0.f0COMNTOOLS" % version (path)
- wsgiref.util uses HTTP_HOST, SERVER_NAME, SCRIPT_NAME, ... (url)
- platform uses PROCESSOR_ARCHITEW6432
- sysconfig uses PYTHONUSERBASE, APPDATA, ... (path)
- idlelib.PyShell uses IDLESTARTUP and PYTHONSTARTUP (path)
- ...
How would you specify the correct encoding in indirect calls?
If your application gets variables in *mixed* encoding, I think that your
program should start by reencoding variables:
for name, encoding in (('PATH', 'latin1'), ...):
value = os.getenv(name)
value = value.encode(sys.getfileystemencoding(), 'surrogateescape')
value = value.decode(encoding, 'surrogateescape')
os.setenv(name, value) |
|
Date |
User |
Action |
Args |
2010-04-30 16:05:29 | vstinner | set | recipients:
+ vstinner, lemburg, loewis, gregory.p.smith, ezio.melotti, Arfrever |
2010-04-30 16:05:27 | vstinner | link | issue8514 messages |
2010-04-30 16:05:26 | vstinner | create | |
|