This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author loewis
Recipients PeterL, loewis
Date 2009-02-10.20:15:00
SpamBayes Score 9.698746e-08
Marked as misclassified No
Message-id <4991E042.8050103@v.loewis.de>
In-reply-to <200902102104.52469.peter.talken@telia.com>
Content
> Should not the Danish letter "Ø" be normalized as "O"? I get "Ø" for all NFC/NFD/NFKC/NFKD 
> normalizations?

I think you have a fundamental misunderstanding what a "decomposition"
is. "Ø" should *not* be decomposed as "O", because clearly, "Ø" and "O"
are different letters. If anything, it would be decomposed as
"O" + PLUS SOME COMBINING MARK

Now, in the specific case of

00D8;LATIN CAPITAL LETTER O WITH STROKE;Lu;0;L;;;;;N;LATIN CAPITAL
LETTER O SLASH;;;00F8;

no canonical decomposition is specified. Compare this to

00D5;LATIN CAPITAL LETTER O WITH TILDE;Lu;0;L;004F 0303;;;;N;LATIN
CAPITAL LETTER O TILDE;;;00F5;

which decomposes to U+004F followed by U+0303, i.e.
LATIN CAPITAL LETTER O followed by COMBINING TILDE.

If "Ø" was to be decomposed, it should use a mark COMBINING STROKE,
but no such combining mark exists in Unicode. I don't know why that
is; you would have to ask the Unicode consortium. In any case, Unicode
guarantees stability wrt. decompositions, so even if some combining
mark gets added later on, the existing decomposition remain stable.
History
Date User Action Args
2009-02-10 20:15:01loewissetrecipients: + loewis, PeterL
2009-02-10 20:15:00loewislinkissue5200 messages
2009-02-10 20:15:00loewiscreate