Message118339
> > What? No. We have problems because we don't use the same encoding to
> > decode and to encode the same data type. It's not a problem to use a
> > different encoding for each data type (stdout, filenames, environment
> > variables, ...).
>
> This is exactly the very problem that we face. In particular, the
> question is what encoding to use if something is *both* a filename
> and an environment variable value, or both a filename and a command
> line argument.
The question is: what is the best default encoding for a specific data type?
There is no perfect answer (well, except maybe using byte strings :-)). Each
solution has its own use cases and disadvantages.
If an application knows exactly the encoding of a data, and it is not the
default encoding, it can still redecode the data. Using os.environb, it's a
little bit better: the application just has to decode (don't have to encode
and to know which encoding was used to decode initially the data). For
sys.argv, I still want to create sys.argvb (bytes version) ;-)
For the command line arguments and environment variables, we don't have a lot
of choices: locale or filesystem encodings. So Antoine and Martin: which
encoding do you prefer? We should maybe try to find some use cases
Here is a dummy script bla.py:
---
import sys
print(sys.argv)
try:
open(sys.argv[1]).close()
except Exception as err:
print("open error: %s" % err)
else:
print("open ok")
---
Locale encoding = FS encoding = utf-8:
$ ./python bla.py xxxé.txt
['bla.py', 'xxxé.txt']
open ok
Locale encoding = utf8, FS encoding = ascii:
$ PYTHONFSENCODING=ascii ./python bla.py xxxé.txt
['bla.py', 'xxxé.txt']
open error: 'ascii' codec can't encode character '\xe9' ...
The filename is displayed correctly, but we are unable to open the file if
PYTHONFSENCODING is used :-/ Should the filename be displayed differently if
PYTHONFSENCODING is used?
> I think these problems are sufficiently resolved now: either by
> PEP 3333, PEP 444, PEP 383, or os.environb.
Ok, cool :-)
> I think you misunderstood MAL's comment, though: the environment
> variables are not encoded in *any* specific encoding. Instead,
> they are copied literally from the HTTP request, using whatever
> bytes the browser originally put in there - which may or may
> not have followed a particular encoding. HTTP is silent on
> this most of the time, and HTML is out of scope.
Ah yes, thanks for you explaination. I was unable to find its comment. |
|
Date |
User |
Action |
Args |
2010-10-10 17:59:25 | vstinner | set | recipients:
+ vstinner, lemburg, loewis, ixokai, ronaldoussoren, pitrou, pjenvey |
2010-10-10 17:59:23 | vstinner | link | issue9992 messages |
2010-10-10 17:59:22 | vstinner | create | |
|