Issue9632
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2010-08-18 11:56 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
remove_sys_setfilesystemencoding-2.patch | vstinner, 2010-08-19 11:18 |
Messages (14) | |||
---|---|---|---|
msg114211 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-08-18 11:56 | |
sys.setfilesystemencoding() function is dangerous because it introduces a lot of inconsistencies: this function is unable to reencode all filenames in all objects (eg. Python is unable to find filenames in user objects or 3rd party libraries). Eg. if you change the filesystem from utf8 to ascii, it will not be possible to use existing non-ascii (unicode) filenames: they will raise UnicodeEncodeError. As sys.setdefaultencoding() in Python2, I think that sys.setfilesystemencoding() is the root of evil :-) PYTHONFSENCODING (issue #8622) is the right solution to set the filesysteme encoding. Attached patch removes sys.setfilesystemencoding(). |
|||
msg114342 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-08-19 11:18 | |
New version of the patch: remove also _Py_SetFileSystemEncoding(). |
|||
msg114409 - (view) | Author: Marc-Andre Lemburg (lemburg) * | Date: 2010-08-19 20:54 | |
While you're right that adjusting the FS encoding long after Python has already started is probably not such a good idea, I do think that we need to provide a way to set the FS encoding from within Python without having to rely on external settings. Think of e.g. embedded Python interpreters or py2exe-style applications running on Linux or other systems that don't use Unicode APIs for FS-interaction or have fixed FS-encodings. |
|||
msg114855 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-08-25 00:32 | |
> Think of e.g. embedded Python interpreters or py2exe-style applications > running on Linux or other systems that don't use Unicode APIs > for FS-interaction or have fixed FS-encodings. What is the problem here? Python does guess the filesystem encoding. If the encoding is "wrong" (not the value expected by the user), filenames are not displayed correctly (mojibake) but it does just work. Anyway, why is it not possible to use PYTHONFSENCODING here? Are you talking to Python modules loaded from a non-ascii path? Sorry, but I do not understand. |
|||
msg114856 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-08-25 00:34 | |
About the patch: it should patch "Filenames and unicode" section of Doc/whatsnew/3.2.rst (to explain that sys.setfilesystemencoding() is replaced by the PYTHONFSENCODING env var). |
|||
msg115024 - (view) | Author: Marc-Andre Lemburg (lemburg) * | Date: 2010-08-26 20:17 | |
STINNER Victor wrote: > > STINNER Victor <victor.stinner@haypocalc.com> added the comment: > >> Think of e.g. embedded Python interpreters or py2exe-style applications >> running on Linux or other systems that don't use Unicode APIs >> for FS-interaction or have fixed FS-encodings. > > What is the problem here? Python does guess the filesystem encoding. If the encoding is "wrong" (not the value expected by the user), filenames are not displayed correctly (mojibake) but it does just work. Anyway, why is it not possible to use PYTHONFSENCODING here? Are you talking to Python modules loaded from a non-ascii path? > > Sorry, but I do not understand. In such environments you cannot expect the user to configure the system properly (i.e. set an environment variable). Instead, the application has to provide an educated guess to the Python interpreter in some way, hence the idea to use a configuration file or perhaps provide a C API that can be used to set the variable before initializing the interpreter. |
|||
msg115105 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2010-08-27 17:48 | |
> >> Think of e.g. embedded Python interpreters or py2exe-style applications > >> running on Linux or other systems that don't use Unicode APIs > >> for FS-interaction or have fixed FS-encodings. > > > > What is the problem here? Python does guess the filesystem encoding. If the encoding is "wrong" (not the value expected by the user), filenames are not displayed correctly (mojibake) but it does just work. Anyway, why is it not possible to use PYTHONFSENCODING here? Are you talking to Python modules loaded from a non-ascii path? > > > > Sorry, but I do not understand. > > In such environments you cannot expect the user to configure the > system properly (i.e. set an environment variable). Instead, the > application has to provide an educated guess to the Python > interpreter in some way, hence the idea to use a configuration > file or perhaps provide a C API that can be used to set the > variable before initializing the interpreter. Why wouldn't the embedding application just set the environment var before initializing the Python interpreter? |
|||
msg115127 - (view) | Author: Marc-Andre Lemburg (lemburg) * | Date: 2010-08-27 20:08 | |
Antoine Pitrou wrote: > > Antoine Pitrou <pitrou@free.fr> added the comment: > >>>> Think of e.g. embedded Python interpreters or py2exe-style applications >>>> running on Linux or other systems that don't use Unicode APIs >>>> for FS-interaction or have fixed FS-encodings. >>> >>> What is the problem here? Python does guess the filesystem encoding. If the encoding is "wrong" (not the value expected by the user), filenames are not displayed correctly (mojibake) but it does just work. Anyway, why is it not possible to use PYTHONFSENCODING here? Are you talking to Python modules loaded from a non-ascii path? >>> >>> Sorry, but I do not understand. >> >> In such environments you cannot expect the user to configure the >> system properly (i.e. set an environment variable). Instead, the >> application has to provide an educated guess to the Python >> interpreter in some way, hence the idea to use a configuration >> file or perhaps provide a C API that can be used to set the >> variable before initializing the interpreter. > > Why wouldn't the embedding application just set the environment var > before initializing the Python interpreter? Because that's not easy to do in a platform independent way. OTOH, it's very easy to do via a C API function in Python and since this env var is essential for the operation of Python, adding such an API is warranted. |
|||
msg115547 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-09-04 00:23 | |
> In such environments you cannot expect the user to configure the > system properly (i.e. set an environment variable). Why would it be different for embeded python? > Instead, the application has to provide an educated guess > to the Python interpreter in some way, ... How can the application guess the encoding better than Python? If the user doesn't configure correctly its environment, I don't see how the application can get the real (correct) environment config?! If Python is unable to start because of the filesystem encoding, it is a bug (see #8611). If Python starts but displays incorrectly filenames, it is the user fault: the user have to setup its environment. |
|||
msg115821 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-09-07 23:27 | |
About "embedded Python interpreters or py2exe-style applications": do you mean that the application calls a C function to set the encoding before starting the interpreter? Or you mean the Python function, sys.setfilesystemencoding()? I would like to remove the Python function just because it doesn't work (it doesn't reencode filenames from all Python objects). But we might keep the C function if you really want to :-) |
|||
msg115822 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-09-07 23:30 | |
"keep the C function" Hum, currently, Python3 only has a *private* function called _Py_SetFileSystemEncoding() which can only be called after _Py_InitializeEx() (because it relies on the codecs API). If you consider that there is a real use case, we should create a function to set the filesystem encoding, function that should (have to?) be called before Py_InitializeEx(). I still think that Python knows better than the application how to set the encoding (when, how to choose it, etc.). |
|||
msg115854 - (view) | Author: Marc-Andre Lemburg (lemburg) * | Date: 2010-09-08 07:54 | |
STINNER Victor wrote: > > STINNER Victor <victor.stinner@haypocalc.com> added the comment: > > "keep the C function" > > Hum, currently, Python3 only has a *private* function called _Py_SetFileSystemEncoding() which can only be called after _Py_InitializeEx() (because it relies on the codecs API). If you consider that there is a real use case, we should create a function to set the filesystem encoding, function that should (have to?) be called before Py_InitializeEx(). > > I still think that Python knows better than the application how to set the encoding (when, how to choose it, etc.). If you embed Python into another application, say as scripting language for that application, that other application may have completely different requirements for the user setup than Python expects, e.g. for a Windows GUI application it's not feasible to ask the user to change the environment variables via the registry in order for Python to pick up the right encoding information. What we'd need is a way for the embedding application to provide this information in a way that doesn't require setting up the environment in some special way. The application will likely have its own way of configuring things like file system or I/O stream encodings. Think of e.g. GTK or Qt applications as example. The Py_InitializeEx() function sounds like a good idea to pass the information about such important extra parameters to Python. This could take arguments for setting the file system encoding as well as the I/O encoding. The arguments would then override the env var settings. So you can remove the function, but have to keep a backdoor open for use cases like the one I described above. The Py_InitializeEx() function approach would also avoid all the issues that you have with calling _Py_SetFileSystemEncoding() after the interpreter has been initialized. |
|||
msg116047 - (view) | Author: STINNER Victor (vstinner) * | Date: 2010-09-10 21:59 | |
I didn't proposed to add a new parameter to Py_InitializeEx() (which means create a new function to not break the API), I just wrote that _Py_SetFileSystemEncoding() doesn't work for your use case. > If you embed Python into another application, say as scripting > language for that application, that other application may have > completely different requirements for the user setup than Python > expects, e.g. for a Windows GUI application it's not feasible to > ask the user to change the environment variables via the registry > in order for Python to pick up the right encoding information. Is this usecase really realistic? Except you, nobody asked for this feature. > The application will likely have its own way > of configuring things like file system or I/O stream encodings. > Think of e.g. GTK or Qt applications as example. Qt uses the unicode API on Windows: nativeOpen() uses CreateFile() (in wide chararacter mode), see src/corelib/io/qfsengine_win.cpp. Gtk+ (glib) uses also the unicode API on Windows: g_fopen() uses _wfopen(), see glib/gstdio.c. Python3 doesn't support your usecase currently (it doesn't work). If you consider it important, please open a new issue. -- I commited my patch to 3.2 (r84687). |
|||
msg116089 - (view) | Author: Marc-Andre Lemburg (lemburg) * | Date: 2010-09-11 10:50 | |
STINNER Victor wrote: > > STINNER Victor <victor.stinner@haypocalc.com> added the comment: > > I didn't proposed to add a new parameter to Py_InitializeEx() (which means create a new function to not break the API), I just wrote that _Py_SetFileSystemEncoding() doesn't work for your use case. Yes, it would be a new function. I was under the impression that you wanted to use this approach to resolve the problem of not being able to set the encoding before any file objects get opened in Python. >> If you embed Python into another application, say as scripting >> language for that application, that other application may have >> completely different requirements for the user setup than Python >> expects, e.g. for a Windows GUI application it's not feasible to >> ask the user to change the environment variables via the registry >> in order for Python to pick up the right encoding information. > > Is this usecase really realistic? Except you, nobody asked for this feature. That's more likely due to the fact that no one is embedding Python 3.x into their apps yet... >> The application will likely have its own way >> of configuring things like file system or I/O stream encodings. >> Think of e.g. GTK or Qt applications as example. > > Qt uses the unicode API on Windows: nativeOpen() uses CreateFile() (in wide chararacter mode), see src/corelib/io/qfsengine_win.cpp. > > Gtk+ (glib) uses also the unicode API on Windows: g_fopen() uses _wfopen(), see glib/gstdio.c. That's not the point: the applications will have their own way of configuring themselves and in GUI apps you most likely do not use environment variable to setup your application. As a result, the application has to tell the embedded Python how it was configured in a way that overrides Python's encoding finding magic. With your patch, the only way to do this is by having the embedded application change the OS environment. That's not exactly a very Pythonic way of doing interfacing. > Python3 doesn't support your usecase currently (it doesn't work). If you consider it important, please open a new issue. > > I commited my patch to 3.2 (r84687). Since you are removing a function that has been around since 3.0, please make sure that you add proper warnings to 3.1. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:05 | admin | set | github: 53841 |
2010-09-11 10:50:09 | lemburg | set | messages: + msg116089 |
2010-09-10 21:59:24 | vstinner | set | status: open -> closed resolution: fixed messages: + msg116047 |
2010-09-08 07:54:12 | lemburg | set | messages: + msg115854 |
2010-09-07 23:30:38 | vstinner | set | messages: + msg115822 |
2010-09-07 23:27:03 | vstinner | set | messages: + msg115821 |
2010-09-04 00:23:35 | vstinner | set | messages: + msg115547 |
2010-08-27 20:08:55 | lemburg | set | messages: + msg115127 |
2010-08-27 17:48:58 | pitrou | set | messages: + msg115105 |
2010-08-26 20:17:20 | lemburg | set | messages: + msg115024 |
2010-08-25 00:35:28 | eric.araujo | set | nosy:
+ eric.araujo |
2010-08-25 00:34:24 | vstinner | set | messages: + msg114856 |
2010-08-25 00:32:35 | vstinner | set | messages: + msg114855 |
2010-08-20 18:21:17 | terry.reedy | set | type: enhancement stage: patch review |
2010-08-19 20:54:23 | lemburg | set | messages: + msg114409 |
2010-08-19 11:18:34 | vstinner | set | files: - remove_sys_setfilesystemencoding.patch |
2010-08-19 11:18:28 | vstinner | set | files:
+ remove_sys_setfilesystemencoding-2.patch messages: + msg114342 |
2010-08-18 11:57:20 | vstinner | set | files:
+ remove_sys_setfilesystemencoding.patch nosy: + lemburg, pitrou, Arfrever keywords: + patch |
2010-08-18 11:56:02 | vstinner | create |