This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients ixokai, lemburg, loewis, pitrou, pjenvey, ronaldoussoren, vstinner
Date 2010-10-10.17:59:22
SpamBayes Score 1.8038737e-11
Marked as misclassified No
Message-id <201010101959.15671.victor.stinner@haypocalc.com>
In-reply-to <4CB1E83E.9050003@v.loewis.de>
Content
> > What? No. We have problems because we don't use the same encoding to
> > decode and to encode the same data type. It's not a problem to use a
> > different encoding for each data type (stdout, filenames, environment
> > variables, ...).
> 
> This is exactly the very problem that we face. In particular, the
> question is what encoding to use if something is *both* a filename
> and an environment variable value, or both a filename and a command
> line argument.

The question is: what is the best default encoding for a specific data type? 
There is no perfect answer (well, except maybe using byte strings :-)). Each 
solution has its own use cases and disadvantages.

If an application knows exactly the encoding of a data, and it is not the 
default encoding, it can still redecode the data. Using os.environb, it's a 
little bit better: the application just has to decode (don't have to encode 
and to know which encoding was used to decode initially the data). For 
sys.argv, I still want to create sys.argvb (bytes version) ;-)

For the command line arguments and environment variables, we don't have a lot 
of choices: locale or filesystem encodings. So Antoine and Martin: which 
encoding do you prefer? We should maybe try to find some use cases

Here is a dummy script bla.py:
---
import sys
print(sys.argv)
try:
    open(sys.argv[1]).close()
except Exception as err:
    print("open error: %s" % err)
else:
    print("open ok")
---

Locale encoding = FS encoding = utf-8:

$ ./python bla.py xxxé.txt 
['bla.py', 'xxxé.txt']
open ok

Locale encoding = utf8, FS encoding = ascii:

$ PYTHONFSENCODING=ascii ./python bla.py xxxé.txt 
['bla.py', 'xxxé.txt']
open error: 'ascii' codec can't encode character '\xe9' ...

The filename is displayed correctly, but we are unable to open the file if 
PYTHONFSENCODING is used :-/ Should the filename be displayed differently if 
PYTHONFSENCODING is used?

> I think these problems are sufficiently resolved now: either by
> PEP 3333, PEP 444, PEP 383, or os.environb.

Ok, cool :-)

> I think you misunderstood MAL's comment, though: the environment
> variables are not encoded in *any* specific encoding. Instead,
> they are copied literally from the HTTP request, using whatever
> bytes the browser originally put in there - which may or may
> not have followed a particular encoding. HTTP is silent on
> this most of the time, and HTML is out of scope.

Ah yes, thanks for you explaination. I was unable to find its comment.
History
Date User Action Args
2010-10-10 17:59:25vstinnersetrecipients: + vstinner, lemburg, loewis, ixokai, ronaldoussoren, pitrou, pjenvey
2010-10-10 17:59:23vstinnerlinkissue9992 messages
2010-10-10 17:59:22vstinnercreate