Message 119798 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	loewis
Recipients	MrJean1, amaury.forgeotdarc, db3l, flox, ixokai, loewis, mark.dickinson, michael.foord, ned.deily, piro, pitrou, ronaldoussoren, rpetrov, skip.montanaro, slmnhq, vstinner
Date	2010-10-28.16:11:40
SpamBayes Score	3.3543848e-09
Marked as misclassified	No
Message-id	<4CC9A0BA.209@v.loewis.de>
In-reply-to	<1288278116.28.0.203822885088.issue10209@psf.upfronthosting.co.za>

Content
> Yes, but not exactly... Mac OS X NFD normalization is a little bit > different than Python's normalization: see msg105669 and > http://developer.apple.com/library/mac/#qa/qa2001/qa1173.html I see. This is one more reason not to convert strings into NFD, no? > I don't understand why test_pep277 pass on issue10209 branch, but it > works. I suppose that normalize the filename to NFD in Python avoids > some Mac OS X normalization bugs? My question is rather why it failed in the first place, when issue8207 had supposedly fixed it. > I propose to normalize to NFC because Qt does that. Hmm. I find that a weak argument - in particular given that the system will normalize then in turn anyway, and to a slightly different normalform. So what is Qt's motivation to normalize? > On Linux, the keyboard uses NFC. I think this is technically incorrect. When you press é, then some scan code is generated. That goes through various mapping layers. The outcome will depend on how specifically these layers are configured. > Which norm is used on Mac OS X, eg. for the keyboard? Same reasoning: pressing a key initially does not generate any Unicode at all. My guess is that when eventually a character is generated (e.g. on the terminal), no normal form is used; instead, it most likely will always strive to generate a single character (even if that is not normalized). See http://developer.apple.com/library/mac/#qa/qa2001/qa1235.html which says "Macintosh keyboards generally produce precomposed Unicode" > Anyway, I think that os.fsencode(os.fsdecode(name)) should be equal > to name. I agree. and that is currently already the case. > If it's different, "open(name, 'w').close(); name in > listdir()" is False (on systems storing filenames as bytes). So if > you change fsdecode(), fsencode() should also be changed. I'm saying that fsdecode shouldn't change, either, the primary reason being backwards compatibility here.

> Yes, but not exactly... Mac OS X NFD normalization is a little bit
> different than Python's normalization: see msg105669 and 
> http://developer.apple.com/library/mac/#qa/qa2001/qa1173.html

I see. This is one more reason not to convert strings into NFD, no?

> I don't understand why test_pep277 pass on issue10209 branch, but it
> works. I suppose that normalize the filename to NFD in Python avoids
> some Mac OS X normalization bugs?

My question is rather why it failed in the first place, when issue8207
had supposedly fixed it.

> I propose to normalize to NFC because Qt does that.

Hmm. I find that a weak argument - in particular given that the
system will normalize then in turn anyway, and to a slightly different
normalform. So what is Qt's motivation to normalize?

> On Linux, the keyboard uses NFC.

I think this is technically incorrect. When you press é, then some
scan code is generated. That goes through various mapping layers.
The outcome will depend on how specifically these layers are
configured.

> Which norm is used on Mac OS X, eg. for the keyboard?

Same reasoning: pressing a key initially does not generate any Unicode
at all. My guess is that when eventually a character is generated
(e.g. on the terminal), no normal form is used; instead, it most likely
will always strive to generate a single character (even if that is not
normalized). See

http://developer.apple.com/library/mac/#qa/qa2001/qa1235.html

which says "Macintosh keyboards generally produce precomposed Unicode"

> Anyway, I think that os.fsencode(os.fsdecode(name)) should be equal
> to name.

I agree. and that is currently already the case.

> If it's different, "open(name, 'w').close(); name in
> listdir()" is False (on systems storing filenames as bytes). So if
> you change fsdecode(), fsencode() should also be changed.

I'm saying that fsdecode shouldn't change, either, the primary reason
being backwards compatibility here.

History
Date	User	Action	Args
2010-10-28 16:11:42	loewis	set	recipients: + loewis, skip.montanaro, ixokai, db3l, ronaldoussoren, amaury.forgeotdarc, mark.dickinson, pitrou, vstinner, piro, MrJean1, ned.deily, rpetrov, michael.foord, flox, slmnhq
2010-10-28 16:11:40	loewis	link	issue10209 messages
2010-10-28 16:11:40	loewis	create