Message101815
If the fullpath to the python3 binary contains a non-ASCII character and the file system encoding is ASCII, Python fails with:
---
Could not find platform independent libraries <prefix>
Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
Fatal Python error: Py_Initialize: can't initialize sys standard streams
ImportError: No module named encodings.utf_8
Abandon
---
The file system encoding is set to ASCII if there is no locale (eg. LANG=C).
The problem is that the command line argument, especially argv[0], is stored to a wchar_t* string using surrogates to store undecodable bytes.
Attached patch fixes calculate_path() and import functions to support surrogates. Details:
* Initialize Py_FileSystemDefaultEncoding earlier in Py_InitializeEx(), because its value is required to encode unicode using surrogates to bytes
* Rename char2wchar() to _Py_char2wchar(), the function is not more static ; and create function _Py_wchar2char()
* Escape surrogates (reimplement surrogateescape decoder) in calculate_path() subfunctions (_wstat, _wgetcwd, _Py_wreadlink)
* Use surrogateescape error handler in find_module(), NullImporter_init() and zipimporter_init()
* Write a "fastpath" (I don't know the right term: is it an hack?) for utf-8 encoding with surrogateescape error handler in PyUnicode_AsEncodedObject() and PyUnicode_AsEncodedString(): required because these functions are called by codecs module is initialized
The patch is a work in progress: there are some FIXME (I don't know if the string should be encoded/decoded using surrogates or not).
I only tested ASCII and UTF-8 file system encodings. I don't know if we can support more encodings. Python has few builtin encodings. Other encodings are implemented in Python: we have to import them, but we need the codec to import a module, so...
I don't think that Windows is affected by this issue because it has a better API for unicode filenames and command line arguments, and most patched functions are surrounded by #ifndef WINDOWS ... #endif |
|
Date |
User |
Action |
Args |
2010-03-27 01:12:37 | vstinner | set | recipients:
+ vstinner |
2010-03-27 01:12:37 | vstinner | set | messageid: <1269652357.44.0.254995855854.issue8242@psf.upfronthosting.co.za> |
2010-03-27 01:12:36 | vstinner | link | issue8242 messages |
2010-03-27 01:12:35 | vstinner | create | |
|