New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python3/POSIX: errors if file system encoding is None #52856
Comments
On POSIX (but not on Mac OS X), Python3 calls get_codeset() to get the file system encoding. If this function fails, sys.getfilesystemencoding() returns None. PyUnicode_DecodeFSDefaultAndSize() fallbacks to utf-8 whereas subprocess fail: ... We have two choices: raise a fatal error if get_codeset() failed, or fallback to utf-8. On Windows and Mac OS X, get_codeset() shouldn't be called because the result is just dropped. We should call _PyCodec_Lookup(Py_FileSystemDefaultEncoding) instead to ensure that the file system encoding can be loaded. |
Here is a patch for the first solution: display a fatal error if we are unable to get the locale encoding. It does always exit with a fatal error if nl_langinfo(CODESET) is not available (and Py_FileSystemDefaultEncoding is not set). I don't think it's a good idea to display an fatal error at runtime. If nl_langinfo(CODESET) is not available, configure should fail or we should fallback to an hardcoded encoding (ok but which one?). Extract of the nl_langinfo() manual page (on Linux): CONFORMING TO |
Patch for the second solution (fallback to utf-8 on get_codeset() failure):
Since I wrote patches for both solution, I can now compare correctly advantages and disavantages. I prefer initfsencoding() because it works on all cases and is simpler than no_fsencoding_error.patch. |
STINNER Victor wrote:
If nl_langinfo(CODESET) fails, Python should assume the default """
""" As with all locale APIs, it is not thread-safe, which can become There's also another issue: it's possible that nl_langinfo(CODESET) In such a case, it would be best to issue a warning to the Terminating Python with a fatal error would provide the worst of |
It's more about get_codeset(). This function can fail for different reasons:
Do you think that you should fallback to ASCII if nl_langinfo() result is an empty string, and UTF-8 otherwise? get_codeset() failure is very unlikely, and I think that fallback to UTF-8 is just fine. A warning is printed to stderr, the user should try to understand why get_codeset() failed. You can at least reproduce the _PyCodec_Lookup() error with bpo-8611. My problem is also that the file system encoding is required (encoding != None) by os.environ mapping with my os.environb patch. (bpo-8603) |
STINNER Victor wrote:
I think that using ASCII is a safer choice in case of errors. I also think that an application should be able to update the |
I choosed UTF-8 to keep backward compatibility: PyUnicode_DecodeFSDefaultAndSize() uses utf-8 if Py_FileSystemDefaultEncoding==NULL. If the OS has no nl_langinfo(CODESET) function at all, Python3 uses utf-8.
Well, I don't know. You are maybe right. And which encoding should be used if nl_langinfo(CODESET) function is missing: ASCII or UTF-8? UTF-8 is also an optimist choice: I bet that more and more OS will move to UTF-8. |
STINNER Victor wrote:
Ouch, that was a poor choice. In Python we have a tradition to Nothing we can do about now, though.
I think we should also add a new environment variable to override PYTHONFSENCODING: Encoding[:errors] used for file system. (that would need to go on a new ticket, though) |
I've opened bpo-8622 for the env. var idea. |
New patch:
NEWS entry: "Issue bpo-8610: Load file system codec at startup, and display a fatal error on failure. Set the file system encoding to ascii if getting the locale encoding failed, or if nl_langinfo(CODESET) function is missing." |
"I think that using ASCII is a safer choice in case of errors. (...) Ouch, that was a poor choice." Ok, you conviced me with your PYTHONFSENCODING suggestion (bpo-8622). Can you review my last patch please? |
STINNER Victor wrote:
I don't think we can change the fallback encoding in 3.2. But you The number of Python 3.x users is still small, so perhaps it's still Some comments on the patch: + fprintf(stderr, This would have to read "... to ASCII" + Py_FileSystemDefaultEncoding = "ascii"; + codec = _PyCodec_Lookup(Py_FileSystemDefaultEncoding); It's better to use the same approach as above for this situation Fatal errors are just not user friendly and will likely cause E.g.
You also need to change this line in pythonrun.c: /* reset file system default encoding */
if (!Py_HasFileSystemDefaultEncoding) {
free((char*)Py_FileSystemDefaultEncoding);
Py_FileSystemDefaultEncoding = NULL;
} I'm not sure what the purpose of Py_HasFileSystemDefaultEncoding In any case, initfsencoding() would always have to set that |
Le vendredi 07 mai 2010 11:19:52, vous avez écrit :
Ok, I will ask on python-dev.
Fixed.
I choosed to display a fatal error here to give a more revelant error message The fatal error only occurs in critical situations: no more memory, import About nl_langinfo(CODESET): get_codeset() does already reject unknown
Fixed. This test only match if get_codeset() is used: I choosed to set the
Its name doesn't help. It's just a flag to tell if free() should be called or
initfsencoding() is a static function and it's only called by |
STINNER Victor wrote:
Ok, please add a comment to that part explaining why it can only
Interesting... I would associate a completely different meaning Scratch my comment on that flag then.
Well, it shouldn't be called multiple times, but then you never |
Version 4: I forgot #include <langinfo.h> in bltinmodule.c. |
I realized that fallback to ASCII instead of UTF-8 is not possible yet because of bpo-8611: if it fallbacks to ASCII, it's not more possible to run Python in a non-ASCII directory. I have a patch set fixing bpo-8611 but it's huge and complex. I will not be fixed quickly (if it would be possible someday to fix it). My new patch fallback to utf-8 instead of ascii, even if I agree that it would be better to fallback to ascii. Improve unicode, surrogates & friends is complex, and I prefer to fix bugs step by step. I propose to first ensure that Py_FileSystemEncoding is always set, and later write a new patch to fallback to ASCII instead of UTF-8. Patch version 5:
|
I commited the last patch (fall back to UTF-8): r81190 (3.x), blocked in 3.1 (r81191). I opened a new issue for the UTF-8/ASCII fallback: bpo-8725, because the ASCII fallback is a different issue. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: