Message 126592 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	belopolsky
Recipients	belopolsky, loewis, vstinner
Date	2011-01-20.06:40:20
SpamBayes Score	1.5984127e-09
Marked as misclassified	No
Message-id	<AANLkTi=Q5ZK4iG2nMgic50fMLCqWDEiDVQgKTripr+bX@mail.gmail.com>
In-reply-to	<1295504345.52.0.340096637702.issue10952@psf.upfronthosting.co.za>

Content
On Thu, Jan 20, 2011 at 1:19 AM, Martin v. Löwis <report@bugs.python.org> wrote: .. > I'd like to request that PEP 3131 is followed as it stands: identifier lookup uses NFKC, > period. This gives two issues: a) how can users make sure that they name the files > correctly? and b) what if the file system implementation mangles file names. > There is also issue c) what if the filesystem encoding can only represent a compatibility character, say U+00B5, but not its NFKC equivalent, U+03BC? Suppose you have a system with both locale and FS encodings being Latin-1. You can write Python code using Latin-1 and the following is valid bytestream: b'# encoding: latin-1\nimport \xB5Torrent\n" However, this code will always fail because '\xB5Torrent' will be normalized into '\u03BCTorrent' and a file named '\u03BCTorrent.py' cannot be created on a filesystem with Latin-1 encoding.

On Thu, Jan 20, 2011 at 1:19 AM, Martin v. Löwis <report@bugs.python.org> wrote:
..
> I'd like to request that PEP 3131 is followed as it stands: identifier lookup uses NFKC,
> period. This gives two issues: a) how can users make sure that they name the files
> correctly? and b) what if the file system implementation mangles file names.
>

There is also issue c) what if the filesystem encoding can only
represent a compatibility character, say U+00B5, but not its NFKC
equivalent, U+03BC?  Suppose you have a system with both locale and FS
encodings being Latin-1.  You can write Python code using Latin-1 and
the following is valid bytestream:

b'# encoding: latin-1\nimport \xB5Torrent\n"

However, this code will always fail because '\xB5Torrent' will be
normalized into '\u03BCTorrent' and a file named '\u03BCTorrent.py'
cannot be created on a filesystem with Latin-1 encoding.

History
Date	User	Action	Args
2011-01-20 06:40:27	belopolsky	set	recipients: + belopolsky, loewis, vstinner
2011-01-20 06:40:20	belopolsky	link	issue10952 messages
2011-01-20 06:40:20	belopolsky	create