This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients Arfrever, amaury.forgeotdarc, brett.cannon, lemburg, loewis, pitrou, vstinner
Date 2010-09-24.12:58:40
SpamBayes Score 0.0
Marked as misclassified No
Message-id <201009241458.33354.victor.stinner@haypocalc.com>
In-reply-to <4C9C9B0D.9060407@egenix.com>
Content
Le vendredi 24 septembre 2010 14:35:29, Marc-Andre Lemburg a écrit :
> Thanks for the explanation. So the only reason why you have to go through
> all those hoops is to
> 
>  * allow the complete set of Python supported encoding names
>    for the PYTHONFSENCODING
> 
>  * make sure that the Py_FilesystemDefaultEncoding is set to
>    the actual name of the codec as used by the system

Yes, the problem is the get_codec_name() function: it calls _PyCodec_Lookup() 
which loads codecs module and then the "encodings.xxx" module.

> Given that the redecoding of the filenames is fragile, I'd suggest
> to drop the encoding name check and then setting the variable right
> at the start of Py_Initialize().

Yes, it is fragile. If the import machinery is changed (eg. add a new cache), 
if the code object is changed, or if something else using filenames is changed, 
reencode_filenames() should also be changed.

Check the encoding name is very important. If I remember correctly, I added it 
to avoid an unlimited recusion loop. Or it was related to 
sys.setfilesystemencoding()? I don't remember :-)

I agree that my patch is not the most simple or safe method to fix the problem. 
I will try your solution.

But we have to be careful of the fallback to utf-8 if the encoding name is 
invalid.

> If the encoding defined in PYTHONFSENCODING turns out not
> to be defined, the module loader will complain later on during
> startup.

Yes. But I hope that it doesn't fill any cache or something else keeping a 
trace of the filename encoded to the wrong encoding.

> To play extra safe, you might run get_codec_name() at the same
> point in startup as you have initfsencoding() now. If something
> failed to load, you won't even get there. If things loaded
> fine, then you have a chance to safely double-check at that point.

Exactly.

As I wrote before, I don't like my reencode* patch, but I didn't found better 
solution. I will work on patch implementing your solution and check if it 
works or not ;-)
History
Date User Action Args
2010-09-24 12:58:42vstinnersetrecipients: + vstinner, lemburg, loewis, brett.cannon, amaury.forgeotdarc, pitrou, Arfrever
2010-09-24 12:58:41vstinnerlinkissue9630 messages
2010-09-24 12:58:40vstinnercreate