This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author loewis
Recipients MrJean1, amaury.forgeotdarc, db3l, flox, ixokai, loewis, mark.dickinson, michael.foord, ned.deily, piro, pitrou, ronaldoussoren, rpetrov, skip.montanaro, slmnhq, vstinner
Date 2010-10-28.16:11:40
SpamBayes Score 3.3543848e-09
Marked as misclassified No
Message-id <4CC9A0BA.209@v.loewis.de>
In-reply-to <1288278116.28.0.203822885088.issue10209@psf.upfronthosting.co.za>
Content
> Yes, but not exactly... Mac OS X NFD normalization is a little bit
> different than Python's normalization: see msg105669 and 
> http://developer.apple.com/library/mac/#qa/qa2001/qa1173.html

I see. This is one more reason not to convert strings into NFD, no?

> I don't understand why test_pep277 pass on issue10209 branch, but it
> works. I suppose that normalize the filename to NFD in Python avoids
> some Mac OS X normalization bugs?

My question is rather why it failed in the first place, when issue8207
had supposedly fixed it.

> I propose to normalize to NFC because Qt does that.

Hmm. I find that a weak argument - in particular given that the
system will normalize then in turn anyway, and to a slightly different
normalform. So what is Qt's motivation to normalize?

> On Linux, the keyboard uses NFC.

I think this is technically incorrect. When you press é, then some
scan code is generated. That goes through various mapping layers.
The outcome will depend on how specifically these layers are
configured.

> Which norm is used on Mac OS X, eg. for the keyboard?

Same reasoning: pressing a key initially does not generate any Unicode
at all. My guess is that when eventually a character is generated
(e.g. on the terminal), no normal form is used; instead, it most likely
will always strive to generate a single character (even if that is not
normalized). See

http://developer.apple.com/library/mac/#qa/qa2001/qa1235.html

which says "Macintosh keyboards generally produce precomposed Unicode"

> Anyway, I think that os.fsencode(os.fsdecode(name)) should be equal
> to name.

I agree. and that is currently already the case.

> If it's different, "open(name, 'w').close(); name in
> listdir()" is False (on systems storing filenames as bytes). So if
> you change fsdecode(), fsencode() should also be changed.

I'm saying that fsdecode shouldn't change, either, the primary reason
being backwards compatibility here.
History
Date User Action Args
2010-10-28 16:11:42loewissetrecipients: + loewis, skip.montanaro, ixokai, db3l, ronaldoussoren, amaury.forgeotdarc, mark.dickinson, pitrou, vstinner, piro, MrJean1, ned.deily, rpetrov, michael.foord, flox, slmnhq
2010-10-28 16:11:40loewislinkissue10209 messages
2010-10-28 16:11:40loewiscreate