This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients vstinner
Date 2010-03-27.01:12:32
SpamBayes Score 0.0
Marked as misclassified No
Message-id <1269652357.44.0.254995855854.issue8242@psf.upfronthosting.co.za>
In-reply-to
Content
If the fullpath to the python3 binary contains a non-ASCII character and the file system encoding is ASCII, Python fails with:
---
Could not find platform independent libraries <prefix>
Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
Fatal Python error: Py_Initialize: can't initialize sys standard streams
ImportError: No module named encodings.utf_8
Abandon
---

The file system encoding is set to ASCII if there is no locale (eg. LANG=C).

The problem is that the command line argument, especially argv[0], is stored to a wchar_t* string using surrogates to store undecodable bytes.

Attached patch fixes calculate_path() and import functions to support surrogates. Details:

 * Initialize Py_FileSystemDefaultEncoding earlier in Py_InitializeEx(), because its value is required to encode unicode using surrogates to bytes
 * Rename char2wchar() to _Py_char2wchar(), the function is not more static ; and create function _Py_wchar2char()
 * Escape surrogates (reimplement surrogateescape decoder) in calculate_path() subfunctions (_wstat, _wgetcwd, _Py_wreadlink)
 * Use surrogateescape error handler in find_module(), NullImporter_init() and zipimporter_init()
 * Write a "fastpath" (I don't know the right term: is it an hack?) for utf-8 encoding with surrogateescape error handler in PyUnicode_AsEncodedObject() and PyUnicode_AsEncodedString(): required because these functions are called by codecs module is initialized

The patch is a work in progress: there are some FIXME (I don't know if the string should be encoded/decoded using surrogates or not).

I only tested ASCII and UTF-8 file system encodings. I don't know if we can support more encodings. Python has few builtin encodings. Other encodings are implemented in Python: we have to import them, but we need the codec to import a module, so...

I don't think that Windows is affected by this issue because it has a better API for unicode filenames and command line arguments, and most patched functions are surrounded by #ifndef WINDOWS ... #endif
History
Date User Action Args
2010-03-27 01:12:37vstinnersetrecipients: + vstinner
2010-03-27 01:12:37vstinnersetmessageid: <1269652357.44.0.254995855854.issue8242@psf.upfronthosting.co.za>
2010-03-27 01:12:36vstinnerlinkissue8242 messages
2010-03-27 01:12:35vstinnercreate