Message81596
> Should not the Danish letter "Ø" be normalized as "O"? I get "Ø" for all NFC/NFD/NFKC/NFKD
> normalizations?
I think you have a fundamental misunderstanding what a "decomposition"
is. "Ø" should *not* be decomposed as "O", because clearly, "Ø" and "O"
are different letters. If anything, it would be decomposed as
"O" + PLUS SOME COMBINING MARK
Now, in the specific case of
00D8;LATIN CAPITAL LETTER O WITH STROKE;Lu;0;L;;;;;N;LATIN CAPITAL
LETTER O SLASH;;;00F8;
no canonical decomposition is specified. Compare this to
00D5;LATIN CAPITAL LETTER O WITH TILDE;Lu;0;L;004F 0303;;;;N;LATIN
CAPITAL LETTER O TILDE;;;00F5;
which decomposes to U+004F followed by U+0303, i.e.
LATIN CAPITAL LETTER O followed by COMBINING TILDE.
If "Ø" was to be decomposed, it should use a mark COMBINING STROKE,
but no such combining mark exists in Unicode. I don't know why that
is; you would have to ask the Unicode consortium. In any case, Unicode
guarantees stability wrt. decompositions, so even if some combining
mark gets added later on, the existing decomposition remain stable. |
|
Date |
User |
Action |
Args |
2009-02-10 20:15:01 | loewis | set | recipients:
+ loewis, PeterL |
2009-02-10 20:15:00 | loewis | link | issue5200 messages |
2009-02-10 20:15:00 | loewis | create | |
|