This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients vstinner
Date 2010-07-30.00:13:28
SpamBayes Score 1.3211227e-06
Marked as misclassified No
Message-id <1280448814.73.0.312881779361.issue9425@psf.upfronthosting.co.za>
In-reply-to
Content
Python (2 and 3) is unable to load a module installed in a directory containing characters not encodable to the locale encoding. And Python doesn't work if it's installed in non-ASCII directory on Windows or with a locale encoding different than UTF-8. On Windows, the locale encoding is "mbcs", which is a small charset, unable to mix different languages, whereas the file system is fully unicode compatible (it uses UTF-16). Python should work with unicode strings (wchar_t*, Py_UNICODE* or PyUnicodeObject) instead of byte strings (char* or PyBytesObject), especially while loading a Python module.

It's not an easy task because it requires to change a lot of code, especially in Python/import.c. I am working on this topic since some months and I have now a working patch. It's now possible to run Python from the source tree containing a non-ASCII character in C locale (ASCII encoding). Except just a minor bug in test_gdb, all tests of the test suite pass.

I posted the whole patch on Rietveld for a review:
http://codereview.appspot.com/1874048

The patch is huge because it fixes different things:

 a) import machinery (import.c, getpath.c, importdl.c, ...)
 b) many error handlers using filenames (compile.c, errors.c, _warnings.c, sysmodule.c, ...)
 c) functions using filenames, especially Python full path: log the filename (eg. Lib/distutils/file_util.py), filename written to a program output (eg. Lib/platform.py)
 d) tests (Lib/test/test_*.py)

(b), (c) and (d) can be fixed before/without (a). But (a) requires other parts to work correctly.

If it's not possible to review the patch, I can try to split it in smaller parts.

--

Related issues:

 #3080: Full unicode import system
 #4352: imp.find_module() fails with a UnicodeDecodeError 
        when called with non-ASCII search paths
 #8611: Python3 doesn't support locale different than utf8 
        and an non-ASCII path (POSIX)
 #8988: import + coding = failure (3.1.2/win32)

--

See also my email sent to python-dev for more information:
http://mail.python.org/pipermail/python-dev/2010-July/101619.html
History
Date User Action Args
2010-07-30 00:13:35vstinnersetrecipients: + vstinner
2010-07-30 00:13:34vstinnersetmessageid: <1280448814.73.0.312881779361.issue9425@psf.upfronthosting.co.za>
2010-07-30 00:13:32vstinnerlinkissue9425 messages
2010-07-30 00:13:28vstinnercreate