> Martin v. Löwis <martin@v.loewis.de> added the comment:

> > The same applies "Å" and "A", "Ä" and "A" and "Ö" and "O"

> > which also are also different letters as "Ø" and "O" are.

>

> Sure. And rightfully, they "Å" is *not* (I repeat: not)

> normalized as "A", under NFD:

>

> py> unicodedata.normalize("NFD", u"Å")

> u'A\u030a'

>

> > Maybe not in the unicode world but in treal life.

>

> They are different letters also in the Unicode world.

>

> > That's why I'm a little confused.

>

> I think the confusion comes from your assumption that

> normalizing "Å" produces "A". It does not. Really not.

Yes, you are right.

However the confusion/problem shows up when it is used in the application to

build an alphabet and group for example all version of E, É, È, Ë, Ê

together under E. The first character in the result of normalize is

used to build alphabet labels for surnames:

letter = normalize("NFD", surname)[0].upper()

if letter != last_letter:

last_letter = letter

....

and this is why I get "A" when the surname begins with "Å".

This way it works for all variations of E to be grouped under "E",

but fails as "Å" is shown under the label "A", not the "A" in the

beginning of the alphabet but after "Z", where "ÅÄÖ" comes.

So a previous sorting of the surnames works correctly.

(The Swedish alphabet has 29 letters: A,B,C... X,Y,Z,Å,Ä,Ö)

Can you think of any solution to this conflict?

I still think "Å" or "Ä" or "Ö" should behave as "Ø":

>>> unicodedata.normalize("NFD",u"Ø")

u'\xd8'

Now, as you said:

>>> unicodedata.normalize("NFD",u"Å")

u'A\u030a'

But it should be (in my opinion):

>>> unicodedata.normalize("NFD",u"Å")

u'\xc5'

This is obviously the result of how the unicode spec is written

interpreting "Å" as a variation of "A". which it is not.

I have asked the unicode people, but not got any answer yet.

The application is GRAMPS: http://gramps-project.org/

Once again thanks for make some of the unicode stuff clear!

Regards,

Peter Landgren