Author vstinner
Recipients ingemar, r.david.murray, terry.reedy, vstinner
Date 2011-01-09.02:40:20
SpamBayes Score 6.62759e-12
Marked as misclassified No
Message-id <1294540825.33.0.360294796512.issue10828@psf.upfronthosting.co.za>
In-reply-to
Content
> ANSI code page: cp1252 ...os.fsencode('ä') => b'\xe4'

Hum, I ran your example with a debugger, and ok, I now remember the whole thing.

I fixed Python to support non-ASCII characters (... only non-ASCII characters encodable to the ANSI code page for Windows) in the *search path*, not in the module name.

The import machinery encodes each search path to the filesystem encoding, but it encodes the module name to UTF-8. Concatenate two byte strings encoded to different encodings doesn't work (it leads to mojibake).

To fix this problem, there are two solutions:

 a) encode the module name to the fileystem encoding
 b) manipulate paths as unicode strings; to access the filesystem: use the wide character (unicode) API of Windows and encode paths to the filesystem encoding on UNIX/BSD

It is easier to implement (a) than (b), but (a) only gives you the support of paths and module names encodable to the ANSI code page.

(b) gives you the full unicode support because it never *encodes* paths to the filesystem encoding, but it may *decodes* paths from the filesystem encoding. Encode a path raises a UnicodeEncodeError on the first character not encodable to the ANSI code page, whereas decode a path never fails (except if the user manually changed its code page to a rare ANSI code page like UTF-8).

I implemented (b) in my import_unicode SVN branch, but as I wrote, I still have some work to merge this branch into py3k, and anyway I will wait for Python 3.3.
History
Date User Action Args
2011-01-09 02:40:25vstinnersetrecipients: + vstinner, terry.reedy, r.david.murray, ingemar
2011-01-09 02:40:25vstinnersetmessageid: <1294540825.33.0.360294796512.issue10828@psf.upfronthosting.co.za>
2011-01-09 02:40:20vstinnerlinkissue10828 messages
2011-01-09 02:40:20vstinnercreate