This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lemburg
Recipients Arfrever, amaury.forgeotdarc, brett.cannon, lemburg, loewis, pitrou, vstinner
Date 2010-09-24.12:35:26
SpamBayes Score 0.0
Marked as misclassified No
Message-id <4C9C9B0D.9060407@egenix.com>
In-reply-to <1285329179.71.0.256662382673.issue9630@psf.upfronthosting.co.za>
Content
STINNER Victor wrote:
> 
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
> 
>> Why is this needed ?
> 
> Short answer: to support filesystem encoding different than utf-8. See #8611 for a longer explanation.
> 
> Example:
> 
> $ pwd
> /home/SHARE/SVN/py3ké
> $ PYTHONFSENCODING=ascii ./python test_fs_encoding.py 
> Fatal Python error: Py_Initialize: can't initialize sys standard streams
> UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 20: ordinal not in range(128)
> Abandon
> 
> My patch fixes this specific case and prepare the work for the complete fix (support different *locale* encodings, see #8611 and #9425).
> 
> --
> 
> Longer answer: Py_FilesystemDefaultEncoding is changed too late. Some modules are already loaded, sys.executable is already set, etc. Py_FilesystemDefaultEncoding is changed but modules filenames are decoded with utf-8 and should be "redecoded".
> 
> It is not possible to set Py_FilesystemDefaultEncoding before loading the first module. initfsencoding() loads codecs and encodings modules to check the codec name. sys.executable is also set before initfsencoding().
> 
> Read my other messages of this issue to get other reasons why the patch is needed. I explained other possibilities (but they don't work).

Thanks for the explanation. So the only reason why you have to go through
all those hoops is to

 * allow the complete set of Python supported encoding names
   for the PYTHONFSENCODING

 * make sure that the Py_FilesystemDefaultEncoding is set to
   the actual name of the codec as used by the system

Given that the redecoding of the filenames is fragile, I'd suggest
to drop the encoding name check and then setting the variable right
at the start of Py_Initialize().

If the encoding defined in PYTHONFSENCODING turns out not
to be defined, the module loader will complain later on during
startup.

To play extra safe, you might run get_codec_name() at the same
point in startup as you have initfsencoding() now. If something
failed to load, you won't even get there. If things loaded
fine, then you have a chance to safely double-check at that point.
History
Date User Action Args
2010-09-24 12:35:29lemburgsetrecipients: + lemburg, loewis, brett.cannon, amaury.forgeotdarc, pitrou, vstinner, Arfrever
2010-09-24 12:35:27lemburglinkissue9630 messages
2010-09-24 12:35:26lemburgcreate