This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients Arfrever, lemburg, pitrou, vstinner
Date 2010-08-18.11:52:06
SpamBayes Score 5.551115e-16
Marked as misclassified No
Message-id <1282132333.54.0.429002372066.issue8622@psf.upfronthosting.co.za>
In-reply-to
Content
> The command line -h explanation is missing from the patch.

done

> The documentation should mention that the env var is only
> read once; subsequent changes to the env var are not seen
> by Python

I copied the PYTHONIOENCODING doc which doesn't mention that. Does Python re-read other environment variables at runtime? Anyway, I changed the doc to:

+   If this is set before running the intepreter, it overrides the encoding used
+   for the filesystem encoding (see :func:`sys.getfilesystemencoding`).

I also changed PYTHONIOENCODING doc. Is it better?

> If the codec lookup fails, Python should either issue a warning

Ok, done. I patched also get_codeset() and get_codec_name() to always set a Python error.

> ... and then ignore the env var (using the get_codeset() API).

Good idea, done.

> Unrelated to the env var, but still important: if get_codeset()
> does not return a known codec, Python should issue a warning
> before falling back to the default setting. Otherwise, a
> Python user will never know that there's an issue and this
> make debugging a lot harder.

It does already write a message to stderr, but it doesn't explain why it failed.

I changed initfsencoding() to display two messages on get_codeset() error. First explain why get_codeset() failed (with the Python error) and then say that we fallback to utf-8.

Full example (PYTHONFSENCODING error and simulated get_codeset() error):
---
PYTHONFSENCODING is not a valid encoding:
LookupError: unknown encoding: xxx
Unable to get the locale encoding:
ValueError: CODESET is not set or empty
Unable to get the filesystem encoding: fallback to utf-8
---

> We should also add a new sys.setfilesystemencoding() ...

No, I plan to REMOVE this function. sys.setfilesystemencoding() is dangerous because it introduces a lot of inconsistencies: this function is unable to reencode all filenames in all objects (eg. Python is unable to find filenames in user objects or 3rd party libraries). Eg. if you change the filesystem from utf8 to ascii, it will not be possible to use existing non-ascii (unicode) filenames: they will raise UnicodeEncodeError. As sys.setdefaultencoding() in Python2, I think that sys.setfilesystemencoding() is the root of evil :-)

At startup, initfsencoding() sets the filesystem encoding using the locale encoding. Even for the startup process (with very few objects), it's very hard to find all filenames:
 - sys.path
 - sys.meta_path
 - sys.modules
 - sys.executable
 - all code objects
 - and I'm not sure that the list is complete

See #9630 for the details.

To remove sys.setfilesystemencoding(), I already patched PEP 383 tests (r84170) and I will open a new issue. But it's maybe better to commit both changes (remove the function and PYTHONFSENCODING) at the same time.
History
Date User Action Args
2010-08-18 11:52:13vstinnersetrecipients: + vstinner, lemburg, pitrou, Arfrever
2010-08-18 11:52:13vstinnersetmessageid: <1282132333.54.0.429002372066.issue8622@psf.upfronthosting.co.za>
2010-08-18 11:52:11vstinnerlinkissue8622 messages
2010-08-18 11:52:09vstinnercreate