Message126656
> A packaging mechanism that prepares code developed on a Latin-1
> filesystem for distribution, would have to NFKC-normalize
> filenames before encoding them using UTF-8.
It causes portability issues: if you copy a non-ASCII module on a new
host, the program will work or not depending on the filesystem encoding.
Having to transform the filename when you copy a file, just to fix a
corner case, is a pain.
> One possible solution to this problem is to define a 'compat' error
> handler that would detect unencodable strings with encodable
> compatibility equivalents and produce encoding of an NFKC equivalent
> string instead of raising an error.
Only few people use non-ASCII module names and most operating systems
are able to store all Unicode characters, so I don't think that we need
to support U+00B5 in a module name with Latin1 filesystem at all. If you
use an old system using Latin1 filesystem, you have to limit your
expectation on Python unicode support :-)
os.fsencode() and os.fsdecode() already use a custom error handler:
surrogateescape. compat will conflict with surrogateescape. Loading a
module concatenates two parts: a path from sys.path (decoded from the
filesystem encoding and surrogateescape error handler) and a module
name. If custom is used to encode the filename, the module name will be
encoded correctly, but not the path. |
|
Date |
User |
Action |
Args |
2011-01-20 22:23:22 | vstinner | set | recipients:
+ vstinner, loewis, belopolsky |
2011-01-20 22:23:20 | vstinner | link | issue10952 messages |
2011-01-20 22:23:20 | vstinner | create | |
|