This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Reencode filenames when setting the filesystem encoding
Type: Stage:
Components: Interpreter Core, Unicode Versions: Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, amaury.forgeotdarc, brett.cannon, lemburg, loewis, pitrou, vstinner
Priority: normal Keywords: patch

Created on 2010-08-17 23:47 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
redecode_modules_path-4.patch vstinner, 2010-09-29 10:50
Messages (24)
msg114191 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-08-17 23:47
Python 3 has a very important variable: the filesystem encoding, sys.getfilesystemencoding(). It is used to encode and decode filenames to access to the filesystem, to encode program arguments in subprocess, etc.

The encoding is hardcoded to "mbcs" on Windows and "utf-8" on Mac OS X. On other OSes, Python gets the encoding from the locale. The problem is that the code getting the locale encoding loads Python modules (eg. locale) and Python uses a default encoding before the locale encoding is known. As a result, modules and code objects created before Python sets the locale encoding are encoded with the old encoding.

The default encoding is "utf-8". If the locale encoding is also "utf-8", there is no problem because the filename are correctly encoded. If the locale encoding is different, we keep filenames encoded in the wrong encoding.

It becomes worse when the locale encoding is unable to encode the filenames, eg. ASCII encoding.

--

A solution would be to avoid loading any Python module, but I don't think that it is possible. The locale encoding can be something different than ascii, latin-1, utf-8 or mbcs. The locale encoding can be an alias like 'utf8' (instead of 'utf-8'), 'iso-8859-1' (Python uses 'latin_1') or 'ANSI_x3.4_1968' (for 'ascii') and encoding aliases are implemented as Lib/encodings/aliases.py which is... a Python module.

--

I wrote a patch to reencode filenames of all module and code objects in initfsencoding() when the locale encoding is known.

I tested my patch on my import_unicode branch (branch to fix #8611, see also #9425: issue to merge the branch to py3k). I would like one or more reviews of the patch because it is long and complex. Please check for refleaks :-)

--

About the patch.

I don't know how to list *all* code objects and so I created a list to store weak references to all code objects, list filled by the code object constructor. The list is destroyed at initfsencoding() exit (early in Python initialization).

There is a FIXME: I don't know if sys.path_importer_cache keys should also be reencoded.

I tried to apply all remarks made on the first patch (posted on Rietveld for #9425). The patch now stores weak references instead of strong references to code objects in the code object list.

(r84168 creates PyModule_GetFilenameObject, function needed by this patch)
msg114193 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-08-18 00:49
While working on #8622, I realized that it's not enough: sys.path and sys.executable (and sys.meta_path) should also be reencoded. New patch does that.
msg114348 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-08-19 11:58
I would rename the feature to something like "redecode-modules": the filenames were decoded with the wrong encoding, and must be decoded again.
msg114349 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-08-19 12:01
Some thoughts: since the modules were successfully imported, surely it means that their filenames where correctly computed and encoded? So why is the __filename__ attribute wrong?
msg114352 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-08-19 12:16
> since the modules were successfully imported, surely it means that
> their filenames where correctly computed and encoded? So why is the
> __filename__ attribute wrong?

Python starts with 'utf-8' encoding. If the new encoding is "smaller" (unable to encode as much characters as utf-8), PyUnicode_EncodeFS() and os.fsencode() will raise UnicodeEncodeError.

Eg. your Python setup is installed in a directory called b'py3k\xc3\xa9' and your locale is C (ascii encoding). At startup, the directory name is decoded to 'py3ké' (using the defautlt encoding, utf-8). initfsencoding() sets the encoding to ascii: 'py3ké' cannot be encoded to the filesystem encoding (ascii) anymore.

--

If we set the default filesystem encoding to ascii (#8725), it will work but the filenames will be full of surrogates characters. Eg. you Python setup is installed in b'py3k\xc3\xa9' and your locale encoding is utf-8: b'py3k\xc3\xa9' will be decoded to 'py3k\udcc3\udca9' and leaved unchanged by initfsencoding(). Surrogates characters are not pratical: you have to escape them to display them. Print a filename with surrogates in a terminal raise a UnicodeEncodeError (even with utf-8 encoding).
msg115546 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-09-04 00:10
Another possibility is to use _Py_char2wchar() + PyUnicode_FromWideChar() / _Py_wchar2char() + PyUnicode_AsWideChar() to decode / encode filenames. These functions use the locale encoding. This solution was possible in Python 3.1, but no more in Python 3.2 because of the PYTHONFSENCODING environment variable.

Even if I don't like my own solution, I don't see better solution.
msg115610 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-09-04 19:31
> Python is installed in a directory called b'py3k\xc3\xa9'
> and your locale is C
Do we really want to support this kind of configuration?
msg115668 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-09-05 19:40
> Do we really want to support this kind of configuration?

There is also a problem is the directory name is b'py3k\xe9': at startup (utf-8 encoding), the name is decoded to 'py3k\udce9'. When the locale encoding is set to iso-8859-1: the name should be reencoded to 'py3k\xe9' to avoid the surrogate character.
msg117268 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-09-24 11:34
New version of the patch:
 - reencode sys.path_importer_cache (and remove the last FIXME)
 - fix different reference leaks
 - catch PyIter_Next() failures
 - create a subfunction to reencode sys.modules: it's easier to review and manager errors in shorter functions
 - add some comments
msg117269 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-09-24 11:38
> I would rename the feature to something like "redecode-modules"

Yes, right. I will rename the functions before commiting the patch.
msg117270 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-09-24 11:43
STINNER Victor wrote:
> 
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
> 
> New version of the patch:
>  - reencode sys.path_importer_cache (and remove the last FIXME)
>  - fix different reference leaks
>  - catch PyIter_Next() failures
>  - create a subfunction to reencode sys.modules: it's easier to review and manager errors in shorter functions
>  - add some comments

Why is this needed ?

With PYTHONFSENCODING there should be no need to change the FS
encoding after startup.
msg117271 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-09-24 11:52
> Why is this needed ?

Short answer: to support filesystem encoding different than utf-8. See #8611 for a longer explanation.

Example:

$ pwd
/home/SHARE/SVN/py3ké
$ PYTHONFSENCODING=ascii ./python test_fs_encoding.py 
Fatal Python error: Py_Initialize: can't initialize sys standard streams
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 20: ordinal not in range(128)
Abandon

My patch fixes this specific case and prepare the work for the complete fix (support different *locale* encodings, see #8611 and #9425).

--

Longer answer: Py_FilesystemDefaultEncoding is changed too late. Some modules are already loaded, sys.executable is already set, etc. Py_FilesystemDefaultEncoding is changed but modules filenames are decoded with utf-8 and should be "redecoded".

It is not possible to set Py_FilesystemDefaultEncoding before loading the first module. initfsencoding() loads codecs and encodings modules to check the codec name. sys.executable is also set before initfsencoding().

Read my other messages of this issue to get other reasons why the patch is needed. I explained other possibilities (but they don't work).
msg117272 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-09-24 12:08
>It is not possible to set Py_FilesystemDefaultEncoding before loading 
>the first module. initfsencoding() loads codecs and encodings modules to 
>check the codec name.

Not sure it's related, but there seems to be a bug:

$ ./python -c "import sys; print(sys.getfilesystemencoding())"
utf-8
$ LC_CTYPE=latin1 ./python -c "import sys; print(sys.getfilesystemencoding())"
ascii
$ LC_CTYPE=fr_FR:iso8859-1 ./python -c "import sys; print(sys.getfilesystemencoding())"
ascii
msg117273 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-09-24 12:13
Some things about your patch:
- as Amaury said, functions should be named "redecode*" rather than "reencode*"
- please use -1 for error return, not 1
- have you tried to measure if it made Python startup slower?
msg117274 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-09-24 12:15
> Not sure it's related, but there seems to be a bug:

It's not a bug, it's a feature :-) If you specify a non-existing locale, the 
GNU libc fails back to ascii.

$ locale -a
C
français
french
fr_FR
fr_FR@euro
fr_FR.iso88591
fr_FR.iso885915@euro
fr_FR.utf8

$ LC_CTYPE=fr_FR.iso88591 ./python -c "import locale; 
print(locale.nl_langinfo(locale.CODESET))"
ISO-8859-1

$ LC_CTYPE=xxx ./python -c "import locale; 
print(locale.nl_langinfo(locale.CODESET))"
ANSI_X3.4-1968
msg117277 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-09-24 12:35
STINNER Victor wrote:
> 
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
> 
>> Why is this needed ?
> 
> Short answer: to support filesystem encoding different than utf-8. See #8611 for a longer explanation.
> 
> Example:
> 
> $ pwd
> /home/SHARE/SVN/py3ké
> $ PYTHONFSENCODING=ascii ./python test_fs_encoding.py 
> Fatal Python error: Py_Initialize: can't initialize sys standard streams
> UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 20: ordinal not in range(128)
> Abandon
> 
> My patch fixes this specific case and prepare the work for the complete fix (support different *locale* encodings, see #8611 and #9425).
> 
> --
> 
> Longer answer: Py_FilesystemDefaultEncoding is changed too late. Some modules are already loaded, sys.executable is already set, etc. Py_FilesystemDefaultEncoding is changed but modules filenames are decoded with utf-8 and should be "redecoded".
> 
> It is not possible to set Py_FilesystemDefaultEncoding before loading the first module. initfsencoding() loads codecs and encodings modules to check the codec name. sys.executable is also set before initfsencoding().
> 
> Read my other messages of this issue to get other reasons why the patch is needed. I explained other possibilities (but they don't work).

Thanks for the explanation. So the only reason why you have to go through
all those hoops is to

 * allow the complete set of Python supported encoding names
   for the PYTHONFSENCODING

 * make sure that the Py_FilesystemDefaultEncoding is set to
   the actual name of the codec as used by the system

Given that the redecoding of the filenames is fragile, I'd suggest
to drop the encoding name check and then setting the variable right
at the start of Py_Initialize().

If the encoding defined in PYTHONFSENCODING turns out not
to be defined, the module loader will complain later on during
startup.

To play extra safe, you might run get_codec_name() at the same
point in startup as you have initfsencoding() now. If something
failed to load, you won't even get there. If things loaded
fine, then you have a chance to safely double-check at that point.
msg117278 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-09-24 12:40
> Some things about your patch:
> - as Amaury said, functions should be named "redecode*"
> rather than "reencode*" 

Yes, as written before (msg117269), I will do it in my next patch.

> - please use -1 for error return, not 1

Ok.

> - have you tried to measure if it made Python startup slower?

(Sploiter: the overhead is around 3%)

First, my patch doesn't concern Windows or Mac OS X, because the filesystem 
encoding is hardcoded in these platforms. Then, it only concerns systems with 
a filesystem encoding different than utf-8. utf-8 is now the default encoding of 
all Linux distributions. I suppose that BSD systems do also use it by default.

Let's try a dummy benchmark with py3k r84990. 5 runs, I kept the smallest 
time.

-- pydebug mode (gcc -O0) with the patch ---

$ unset PYTHONFSENCODING; time ./python  -c "pass"
real    0m0.084s
user    0m0.080s
sys     0m0.010s

$ export PYTHONFSENCODING=ascii; time ./python  -c "pass"
real    0m0.100s
user    0m0.100s
sys     0m0.000s

The startup time overhead is around 20%.

-- default mode (gcc -O3) without the patch ---

$ unset PYTHONFSENCODING; time ./python  -c "pass"

real    0m0.033s
user    0m0.030s
sys     0m0.000s

-- default mode (gcc -O3) with the patch ---

$ export PYTHONFSENCODING=utf-8; time ./python  -c "pass"

real    0m0.032s
user    0m0.030s
sys     0m0.000s

$ export PYTHONFSENCODING=ascii; time ./python  -c "pass"

real    0m0.033s
user    0m0.020s
sys     0m0.020s

Here is overhead is around 3%.
msg117281 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-09-24 12:58
Le vendredi 24 septembre 2010 14:35:29, Marc-Andre Lemburg a écrit :
> Thanks for the explanation. So the only reason why you have to go through
> all those hoops is to
> 
>  * allow the complete set of Python supported encoding names
>    for the PYTHONFSENCODING
> 
>  * make sure that the Py_FilesystemDefaultEncoding is set to
>    the actual name of the codec as used by the system

Yes, the problem is the get_codec_name() function: it calls _PyCodec_Lookup() 
which loads codecs module and then the "encodings.xxx" module.

> Given that the redecoding of the filenames is fragile, I'd suggest
> to drop the encoding name check and then setting the variable right
> at the start of Py_Initialize().

Yes, it is fragile. If the import machinery is changed (eg. add a new cache), 
if the code object is changed, or if something else using filenames is changed, 
reencode_filenames() should also be changed.

Check the encoding name is very important. If I remember correctly, I added it 
to avoid an unlimited recusion loop. Or it was related to 
sys.setfilesystemencoding()? I don't remember :-)

I agree that my patch is not the most simple or safe method to fix the problem. 
I will try your solution.

But we have to be careful of the fallback to utf-8 if the encoding name is 
invalid.

> If the encoding defined in PYTHONFSENCODING turns out not
> to be defined, the module loader will complain later on during
> startup.

Yes. But I hope that it doesn't fill any cache or something else keeping a 
trace of the filename encoded to the wrong encoding.

> To play extra safe, you might run get_codec_name() at the same
> point in startup as you have initfsencoding() now. If something
> failed to load, you won't even get there. If things loaded
> fine, then you have a chance to safely double-check at that point.

Exactly.

As I wrote before, I don't like my reencode* patch, but I didn't found better 
solution. I will work on patch implementing your solution and check if it 
works or not ;-)
msg117593 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-09-29 10:48
Forget my previous message, I forgot important points.

> So the only reason why you have to go through
> all those hoops is to
>
> * allow the complete set of Python supported encoding
>   names for the PYTHONFSENCODING
>
> * make sure that the Py_FilesystemDefaultEncoding is set
>   to the actual name of the codec as used by the system

Not only. As I wrote in my first message (msg114191), there are two
other good reasons to keep the current code but redecode filenames:

 * Encoding aliases: locale encoding is not always written as the
   official Python encoding name. Eg. utf8 vs UTF-8, iso8859-1 vs
   latin_1, etc. We have to be able to load Lib/encodings/aliases.py to
   to get the Python codec.

 * Codecs implemented in Python: only ascii, latin1, utf8 and mbcs
   codecs are builtin. All other encodings are implemented in Python. If
   your filesystem encoding is ShiftJIS, you have to load
   Lib/encodings/shift_jis.py to load the codec.

For these two reasons, we have to import Python modules before being
able to set the filesystem encoding. So we have to redecode filenames
after setting the filesystem encodings.

> the redecoding of the filenames is fragile

We can setup a buildbot installed in a non-ascii path. Antoine had such
buildbot, which already helped to find many bugs related to non-ascii paths.

--

We can choose to only support ascii, latin1, utf8 and mbcs for the
filesystem encoding, but users will complain that we break compatibility
with old systems. Python3 already "breaks" the language, I don't think
that it is a good idea to choose to become incompatible with old systems
just to simplify (too much) the code.

--

Another solution would be to unload all modules, clear all caches,
delete all code objects, etc. after setting the filesystem encoding. But
I think that it is inefficient and nobody wants a slower Python startup.
msg117595 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-09-29 10:50
Patch version 4:
 - Rename "reencode" to "redecode"
 - Return -1 (instead of 1) on error
msg117605 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-09-29 11:45
STINNER Victor wrote:
> 
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
> 
> Forget my previous message, I forgot important points.
> 
>> So the only reason why you have to go through
>> all those hoops is to
>>
>> * allow the complete set of Python supported encoding
>>   names for the PYTHONFSENCODING
>>
>> * make sure that the Py_FilesystemDefaultEncoding is set
>>   to the actual name of the codec as used by the system
> 
> Not only. As I wrote in my first message (msg114191), there are two
> other good reasons to keep the current code but redecode filenames:
> 
>  * Encoding aliases: locale encoding is not always written as the
>    official Python encoding name. Eg. utf8 vs UTF-8, iso8859-1 vs
>    latin_1, etc. We have to be able to load Lib/encodings/aliases.py to
>    to get the Python codec.
> 
>  * Codecs implemented in Python: only ascii, latin1, utf8 and mbcs
>    codecs are builtin. All other encodings are implemented in Python. If
>    your filesystem encoding is ShiftJIS, you have to load
>    Lib/encodings/shift_jis.py to load the codec.
> 
> For these two reasons, we have to import Python modules before being
> able to set the filesystem encoding. So we have to redecode filenames
> after setting the filesystem encodings.

No, that's not needed ! Please see my earlier message: you can still
do all this at a later time during startup and double-check that
the encoding is indeed valid.

The main point is that you don't need to apply all those checks
before setting the file system encoding in the interpreter.
Early on you just assume that the env vars are setup correctly
and head on into starting up the interpreter.

If the decoding fails during startup due to a wrong encoding of
file or path names, the interpreter will signal this. If you have
a case where everything imports fine, you can then still double
check at the time the file system encoding is set now to e.g.
detect cases where the encoding was set to ascii, but in reality
the interpreter was just lucky and the file system encoding
should be utf-8.
msg117607 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-09-29 12:18
Le mercredi 29 septembre 2010 13:45:15, vous avez écrit :
> Marc-Andre Lemburg <mal@egenix.com> added the comment:
> 
> STINNER Victor wrote:
> > STINNER Victor <victor.stinner@haypocalc.com> added the comment:
> > 
> > Forget my previous message, I forgot important points.
> > 
> >> So the only reason why you have to go through
> >> all those hoops is to
> >> 
> >> * allow the complete set of Python supported encoding
> >> 
> >>   names for the PYTHONFSENCODING
> >> 
> >> * make sure that the Py_FilesystemDefaultEncoding is set
> >> 
> >>   to the actual name of the codec as used by the system
> > 
> > Not only. As I wrote in my first message (msg114191), there are two
> > 
> > other good reasons to keep the current code but redecode filenames:
> >  * Encoding aliases: locale encoding is not always written as the
> >  
> >    official Python encoding name. Eg. utf8 vs UTF-8, iso8859-1 vs
> >    latin_1, etc. We have to be able to load Lib/encodings/aliases.py to
> >    to get the Python codec.
> >  
> >  * Codecs implemented in Python: only ascii, latin1, utf8 and mbcs
> >  
> >    codecs are builtin. All other encodings are implemented in Python. If
> >    your filesystem encoding is ShiftJIS, you have to load
> >    Lib/encodings/shift_jis.py to load the codec.
> > 
> > For these two reasons, we have to import Python modules before being
> > able to set the filesystem encoding. So we have to redecode filenames
> > after setting the filesystem encodings.
> 
> No, that's not needed ! Please see my earlier message: you can still
> do all this at a later time during startup and double-check that
> the encoding is indeed valid.

I don't understand how. Eg. if you set Py_FileSystemDefaultEncoding to 
"cp1252" before loading the first module, import a module will have to load the 
codec. Load the codec require to import a module. But how can you open cp1252 
module since you are unable to encode paths to the filesystem encoding (because 
the cp1252 codec is not available yet)?

> If the decoding fails during startup due to a wrong encoding of
> file or path names, ...

It is not not problem described in my previous message. How do you load non-
builtin codecs?

Can you write a patch implementing your ideas? I tried to write such patch 
(set Py_FileSystemDefaultEncoding before loading the first module), but it 
doesn't work for different reasons (all described in this issue). Maybe I 
misunderstood your proposition.
msg117629 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-09-29 16:37
I commited redecode_modules_path-4.patch as r85115 in Python 3.2.
msg117634 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-09-29 17:39
STINNER Victor wrote:
> 
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
> 
> Le mercredi 29 septembre 2010 13:45:15, vous avez écrit :
>> Marc-Andre Lemburg <mal@egenix.com> added the comment:
>>
>> STINNER Victor wrote:
>>> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
>>>
>>> Forget my previous message, I forgot important points.
>>>
>>>> So the only reason why you have to go through
>>>> all those hoops is to
>>>>
>>>> * allow the complete set of Python supported encoding
>>>>
>>>>   names for the PYTHONFSENCODING
>>>>
>>>> * make sure that the Py_FilesystemDefaultEncoding is set
>>>>
>>>>   to the actual name of the codec as used by the system
>>>
>>> Not only. As I wrote in my first message (msg114191), there are two
>>>
>>> other good reasons to keep the current code but redecode filenames:
>>>  * Encoding aliases: locale encoding is not always written as the
>>>  
>>>    official Python encoding name. Eg. utf8 vs UTF-8, iso8859-1 vs
>>>    latin_1, etc. We have to be able to load Lib/encodings/aliases.py to
>>>    to get the Python codec.
>>>  
>>>  * Codecs implemented in Python: only ascii, latin1, utf8 and mbcs
>>>  
>>>    codecs are builtin. All other encodings are implemented in Python. If
>>>    your filesystem encoding is ShiftJIS, you have to load
>>>    Lib/encodings/shift_jis.py to load the codec.
>>>
>>> For these two reasons, we have to import Python modules before being
>>> able to set the filesystem encoding. So we have to redecode filenames
>>> after setting the filesystem encodings.
>>
>> No, that's not needed ! Please see my earlier message: you can still
>> do all this at a later time during startup and double-check that
>> the encoding is indeed valid.
> 
> I don't understand how. Eg. if you set Py_FileSystemDefaultEncoding to 
> "cp1252" before loading the first module, import a module will have to load the 
> codec. Load the codec require to import a module. But how can you open cp1252 
> module since you are unable to encode paths to the filesystem encoding (because 
> the cp1252 codec is not available yet)?

Ah, sorry, I forgot about that important circular reference :-)

You're right: there's no way to guarantee that file and path
decoding will work without first setting the file system encoding
to one of the builin codec names (latin-1 would be a good choice).

The other option would be to import everything using relative
paths (since Python itself only uses ASCII path names to the modules),
until the codec is loaded and then add the absolute paths to
these relative ones, once the codec has been loaded successfully.

A third option is the one you mentioned earlier on: we simply
don't allow Python to be installed on paths that are not
decodable using one of the builtin codecs.

>> If the decoding fails during startup due to a wrong encoding of
>> file or path names, ...
> 
> It is not not problem described in my previous message. How do you load non-
> builtin codecs?
> 
> Can you write a patch implementing your ideas? I tried to write such patch 
> (set Py_FileSystemDefaultEncoding before loading the first module), but it 
> doesn't work for different reasons (all described in this issue). Maybe I 
> misunderstood your proposition.

No, I wasn't thinking of the situation where you want to use a
codec that requires a Python module.
History
Date User Action Args
2022-04-11 14:57:05adminsetgithub: 53839
2010-09-29 17:39:03lemburgsetmessages: + msg117634
2010-09-29 16:37:07vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg117629
2010-09-29 12:18:06vstinnersetmessages: + msg117607
2010-09-29 11:45:13lemburgsetmessages: + msg117605
title: Redecode filenames when setting the filesystem encoding -> Reencode filenames when setting the filesystem encoding
2010-09-29 10:50:51vstinnersetfiles: - reencode_modules_path-3.patch
2010-09-29 10:50:28vstinnersetfiles: + redecode_modules_path-4.patch

messages: + msg117595
title: Reencode filenames when setting the filesystem encoding -> Redecode filenames when setting the filesystem encoding
2010-09-29 10:48:41vstinnersetmessages: + msg117593
2010-09-24 12:58:41vstinnersetmessages: + msg117281
2010-09-24 12:40:39vstinnersetmessages: + msg117278
2010-09-24 12:35:27lemburgsetmessages: + msg117277
2010-09-24 12:22:10pitrousetnosy: + loewis, brett.cannon
2010-09-24 12:15:31vstinnersetmessages: + msg117274
2010-09-24 12:14:00pitrousetmessages: + msg117273
2010-09-24 12:08:28pitrousetnosy: + pitrou
messages: + msg117272
2010-09-24 11:52:58vstinnersetmessages: + msg117271
2010-09-24 11:43:52lemburgsetnosy: + lemburg
messages: + msg117270
2010-09-24 11:38:07vstinnersetmessages: + msg117269
2010-09-24 11:34:29vstinnersetfiles: - reencode_modules_path-2.patch
2010-09-24 11:34:22vstinnersetfiles: + reencode_modules_path-3.patch

messages: + msg117268
2010-09-05 19:40:09vstinnersetmessages: + msg115668
2010-09-04 19:31:21amaury.forgeotdarcsetmessages: + msg115610
2010-09-04 19:08:39Arfreversetnosy: + Arfrever
2010-09-04 00:10:44vstinnersetmessages: + msg115546
2010-08-19 12:16:47vstinnersetmessages: + msg114352
2010-08-19 12:01:17amaury.forgeotdarcsetmessages: + msg114349
2010-08-19 11:58:49amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg114348
2010-08-18 01:06:27vstinnersettitle: Reencode filenames of all module and code objects when setting the filesystem encoding -> Reencode filenames when setting the filesystem encoding
2010-08-18 00:50:15vstinnersetfiles: - reencode_modules_path.patch
2010-08-18 00:49:49vstinnersetfiles: + reencode_modules_path-2.patch

messages: + msg114193
2010-08-17 23:47:04vstinnercreate