classification
Title: Remove sys.setfilesystemencoding()
Type: enhancement Stage: patch review
Components: Library (Lib), Unicode Versions: Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, eric.araujo, lemburg, pitrou, vstinner
Priority: normal Keywords: patch

Created on 2010-08-18 11:56 by vstinner, last changed 2010-09-11 10:50 by lemburg. This issue is now closed.

Files
File name Uploaded Description Edit
remove_sys_setfilesystemencoding-2.patch vstinner, 2010-08-19 11:18
Messages (14)
msg114211 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-08-18 11:56
sys.setfilesystemencoding() function is dangerous because it introduces a lot of inconsistencies: this function is unable to reencode all filenames in all objects (eg. Python is unable to find filenames in user objects or 3rd party libraries). Eg. if you change the filesystem from utf8 to ascii, it will not be possible to use existing non-ascii (unicode) filenames: they will raise UnicodeEncodeError.

As sys.setdefaultencoding() in Python2, I think that sys.setfilesystemencoding() is the root of evil :-) PYTHONFSENCODING (issue #8622) is the right solution to set the filesysteme encoding.

Attached patch removes sys.setfilesystemencoding().
msg114342 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-08-19 11:18
New version of the patch: remove also _Py_SetFileSystemEncoding().
msg114409 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-08-19 20:54
While you're right that adjusting the FS encoding long after Python has already started is probably not such a good idea, I do think that we need to provide a way to set the FS encoding from within Python without having to rely on external settings.

Think of e.g. embedded Python interpreters or py2exe-style applications running on Linux or other systems that don't use Unicode APIs for FS-interaction or have fixed FS-encodings.
msg114855 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-08-25 00:32
> Think of e.g. embedded Python interpreters or py2exe-style applications
> running on Linux or other systems that don't use Unicode APIs 
> for FS-interaction or have fixed FS-encodings.

What is the problem here? Python does guess the filesystem encoding. If the encoding is "wrong" (not the value expected by the user), filenames are not displayed correctly (mojibake) but it does just work. Anyway, why is it not possible to use PYTHONFSENCODING here? Are you talking to Python modules loaded from a non-ascii path?

Sorry, but I do not understand.
msg114856 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-08-25 00:34
About the patch: it should patch "Filenames and unicode" section of Doc/whatsnew/3.2.rst (to explain that sys.setfilesystemencoding() is replaced by the PYTHONFSENCODING env var).
msg115024 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-08-26 20:17
STINNER Victor wrote:
> 
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
> 
>> Think of e.g. embedded Python interpreters or py2exe-style applications
>> running on Linux or other systems that don't use Unicode APIs 
>> for FS-interaction or have fixed FS-encodings.
> 
> What is the problem here? Python does guess the filesystem encoding. If the encoding is "wrong" (not the value expected by the user), filenames are not displayed correctly (mojibake) but it does just work. Anyway, why is it not possible to use PYTHONFSENCODING here? Are you talking to Python modules loaded from a non-ascii path?
> 
> Sorry, but I do not understand.

In such environments you cannot expect the user to configure the
system properly (i.e. set an environment variable). Instead, the
application has to provide an educated guess to the Python
interpreter in some way, hence the idea to use a configuration
file or perhaps provide a C API that can be used to set the
variable before initializing the interpreter.
msg115105 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-08-27 17:48
> >> Think of e.g. embedded Python interpreters or py2exe-style applications
> >> running on Linux or other systems that don't use Unicode APIs 
> >> for FS-interaction or have fixed FS-encodings.
> > 
> > What is the problem here? Python does guess the filesystem encoding. If the encoding is "wrong" (not the value expected by the user), filenames are not displayed correctly (mojibake) but it does just work. Anyway, why is it not possible to use PYTHONFSENCODING here? Are you talking to Python modules loaded from a non-ascii path?
> > 
> > Sorry, but I do not understand.
> 
> In such environments you cannot expect the user to configure the
> system properly (i.e. set an environment variable). Instead, the
> application has to provide an educated guess to the Python
> interpreter in some way, hence the idea to use a configuration
> file or perhaps provide a C API that can be used to set the
> variable before initializing the interpreter.

Why wouldn't the embedding application just set the environment var
before initializing the Python interpreter?
msg115127 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-08-27 20:08
Antoine Pitrou wrote:
> 
> Antoine Pitrou <pitrou@free.fr> added the comment:
> 
>>>> Think of e.g. embedded Python interpreters or py2exe-style applications
>>>> running on Linux or other systems that don't use Unicode APIs 
>>>> for FS-interaction or have fixed FS-encodings.
>>>
>>> What is the problem here? Python does guess the filesystem encoding. If the encoding is "wrong" (not the value expected by the user), filenames are not displayed correctly (mojibake) but it does just work. Anyway, why is it not possible to use PYTHONFSENCODING here? Are you talking to Python modules loaded from a non-ascii path?
>>>
>>> Sorry, but I do not understand.
>>
>> In such environments you cannot expect the user to configure the
>> system properly (i.e. set an environment variable). Instead, the
>> application has to provide an educated guess to the Python
>> interpreter in some way, hence the idea to use a configuration
>> file or perhaps provide a C API that can be used to set the
>> variable before initializing the interpreter.
> 
> Why wouldn't the embedding application just set the environment var
> before initializing the Python interpreter?

Because that's not easy to do in a platform independent way.
OTOH, it's very easy to do via a C API function in Python and
since this env var is essential for the operation of Python,
adding such an API is warranted.
msg115547 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-09-04 00:23
> In such environments you cannot expect the user to configure the
> system properly (i.e. set an environment variable).

Why would it be different for embeded python?

> Instead, the application has to provide an educated guess 
> to the Python interpreter in some way, ...

How can the application guess the encoding better than Python? If the user doesn't configure correctly its environment, I don't see how the application can get the real (correct) environment config?!

If Python is unable to start because of the filesystem encoding, it is a bug (see #8611). If Python starts but displays incorrectly filenames, it is the user fault: the user have to setup its environment.
msg115821 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-09-07 23:27
About "embedded Python interpreters or py2exe-style applications": do you mean that the application calls a C function to set the encoding before starting the interpreter? Or you mean the Python function, sys.setfilesystemencoding()?

I would like to remove the Python function just because it doesn't work (it doesn't reencode filenames from all Python objects). But we might keep the C function if you really want to :-)
msg115822 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-09-07 23:30
"keep the C function"

Hum, currently, Python3 only has a *private* function called _Py_SetFileSystemEncoding() which can only be called after _Py_InitializeEx() (because it relies on the codecs API). If you consider that there is a real use case, we should create a function to set the filesystem encoding, function that should (have to?) be called before Py_InitializeEx().

I still think that Python knows better than the application how to set the encoding (when, how to choose it, etc.).
msg115854 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-09-08 07:54
STINNER Victor wrote:
> 
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
> 
> "keep the C function"
> 
> Hum, currently, Python3 only has a *private* function called _Py_SetFileSystemEncoding() which can only be called after _Py_InitializeEx() (because it relies on the codecs API). If you consider that there is a real use case, we should create a function to set the filesystem encoding, function that should (have to?) be called before Py_InitializeEx().
> 
> I still think that Python knows better than the application how to set the encoding (when, how to choose it, etc.).

If you embed Python into another application, say as scripting
language for that application, that other application may have
completely different requirements for the user setup than Python
expects, e.g. for a Windows GUI application it's not feasible to
ask the user to change the environment variables via the registry
in order for Python to pick up the right encoding information.

What we'd need is a way for the embedding application to provide this
information in a way that doesn't require setting up the environment
in some special way. The application will likely have its own way
of configuring things like file system or I/O stream encodings. Think
of e.g. GTK or Qt applications as example.

The Py_InitializeEx() function sounds like a good idea to pass the
information about such important extra parameters to Python. This
could take arguments for setting the file system encoding as
well as the I/O encoding. The arguments would then override the env var
settings.

So you can remove the function, but have to keep a backdoor open
for use cases like the one I described above.

The Py_InitializeEx()
function approach would also avoid all the issues that you have
with calling _Py_SetFileSystemEncoding() after the interpreter
has been initialized.
msg116047 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-09-10 21:59
I didn't proposed to add a new parameter to Py_InitializeEx() (which means create a new function to not break the API), I just wrote that _Py_SetFileSystemEncoding() doesn't work for your use case.

> If you embed Python into another application, say as scripting
> language for that application, that other application may have
> completely different requirements for the user setup than Python
> expects, e.g. for a Windows GUI application it's not feasible to
> ask the user to change the environment variables via the registry
> in order for Python to pick up the right encoding information.

Is this usecase really realistic? Except you, nobody asked for this feature.

> The application will likely have its own way
> of configuring things like file system or I/O stream encodings.
> Think of e.g. GTK or Qt applications as example.

Qt uses the unicode API on Windows: nativeOpen() uses CreateFile() (in wide chararacter mode), see src/corelib/io/qfsengine_win.cpp.

Gtk+ (glib) uses also the unicode API on Windows: g_fopen() uses _wfopen(), see glib/gstdio.c.

Python3 doesn't support your usecase currently (it doesn't work). If you consider it important, please open a new issue.

--

I commited my patch to 3.2 (r84687).
msg116089 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-09-11 10:50
STINNER Victor wrote:
> 
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
> 
> I didn't proposed to add a new parameter to Py_InitializeEx() (which means create a new function to not break the API), I just wrote that _Py_SetFileSystemEncoding() doesn't work for your use case.

Yes, it would be a new function. I was under the impression that
you wanted to use this approach to resolve the problem of not being
able to set the encoding before any file objects get opened in
Python.

>> If you embed Python into another application, say as scripting
>> language for that application, that other application may have
>> completely different requirements for the user setup than Python
>> expects, e.g. for a Windows GUI application it's not feasible to
>> ask the user to change the environment variables via the registry
>> in order for Python to pick up the right encoding information.
> 
> Is this usecase really realistic? Except you, nobody asked for this feature.

That's more likely due to the fact that no one is embedding
Python 3.x into their apps yet...

>> The application will likely have its own way
>> of configuring things like file system or I/O stream encodings.
>> Think of e.g. GTK or Qt applications as example.
> 
> Qt uses the unicode API on Windows: nativeOpen() uses CreateFile() (in wide chararacter mode), see src/corelib/io/qfsengine_win.cpp.
> 
> Gtk+ (glib) uses also the unicode API on Windows: g_fopen() uses _wfopen(), see glib/gstdio.c.

That's not the point: the applications will have their own way
of configuring themselves and in GUI apps you most likely do not
use environment variable to setup your application. As a result,
the application has to tell the embedded Python how it was configured
in a way that overrides Python's encoding finding magic.

With your patch, the only way to do this is by having the embedded
application change the OS environment. That's not exactly a very
Pythonic way of doing interfacing.

> Python3 doesn't support your usecase currently (it doesn't work). If you consider it important, please open a new issue.
>
> I commited my patch to 3.2 (r84687).

Since you are removing a function that has been around since 3.0,
please make sure that you add proper warnings to 3.1.
History
Date User Action Args
2010-09-11 10:50:09lemburgsetmessages: + msg116089
2010-09-10 21:59:24vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg116047
2010-09-08 07:54:12lemburgsetmessages: + msg115854
2010-09-07 23:30:38vstinnersetmessages: + msg115822
2010-09-07 23:27:03vstinnersetmessages: + msg115821
2010-09-04 00:23:35vstinnersetmessages: + msg115547
2010-08-27 20:08:55lemburgsetmessages: + msg115127
2010-08-27 17:48:58pitrousetmessages: + msg115105
2010-08-26 20:17:20lemburgsetmessages: + msg115024
2010-08-25 00:35:28eric.araujosetnosy: + eric.araujo
2010-08-25 00:34:24vstinnersetmessages: + msg114856
2010-08-25 00:32:35vstinnersetmessages: + msg114855
2010-08-20 18:21:17terry.reedysettype: enhancement
stage: patch review
2010-08-19 20:54:23lemburgsetmessages: + msg114409
2010-08-19 11:18:34vstinnersetfiles: - remove_sys_setfilesystemencoding.patch
2010-08-19 11:18:28vstinnersetfiles: + remove_sys_setfilesystemencoding-2.patch

messages: + msg114342
2010-08-18 11:57:20vstinnersetfiles: + remove_sys_setfilesystemencoding.patch
nosy: + lemburg, pitrou, Arfrever
keywords: + patch
2010-08-18 11:56:02vstinnercreate