This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients belopolsky, loewis, vstinner
Date 2011-01-20.22:23:20
SpamBayes Score 2.2150892e-11
Marked as misclassified No
Message-id <1295562193.29835.15.camel@marge>
In-reply-to <AANLkTim0POiAYAH8e0gxTSqAmFuPNvhf7j8P2P3+dufm@mail.gmail.com>
Content
> A packaging mechanism that prepares code developed on a Latin-1
> filesystem for distribution, would have to NFKC-normalize 
> filenames before encoding them using UTF-8.

It causes portability issues: if you copy a non-ASCII module on a new
host, the program will work or not depending on the filesystem encoding.
Having to transform the filename when you copy a file, just to fix a
corner case, is a pain.

> One possible solution to this problem is to define a 'compat' error
> handler that would detect unencodable strings with encodable
> compatibility equivalents and produce encoding of an NFKC equivalent
> string instead of raising an error.

Only few people use non-ASCII module names and most operating systems
are able to store all Unicode characters, so I don't think that we need
to support U+00B5 in a module name with Latin1 filesystem at all. If you
use an old system using Latin1 filesystem, you have to limit your
expectation on Python unicode support :-)

os.fsencode() and os.fsdecode() already use a custom error handler:
surrogateescape. compat will conflict with surrogateescape. Loading a
module concatenates two parts: a path from sys.path (decoded from the
filesystem encoding and surrogateescape error handler) and a module
name. If custom is used to encode the filename, the module name will be
encoded correctly, but not the path.
History
Date User Action Args
2011-01-20 22:23:22vstinnersetrecipients: + vstinner, loewis, belopolsky
2011-01-20 22:23:20vstinnerlinkissue10952 messages
2011-01-20 22:23:20vstinnercreate