This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author PeterL
Recipients PeterL, loewis
Date 2009-02-11.08:24:04
SpamBayes Score 6.247794e-09
Marked as misclassified No
Message-id <200902110925.24953.peter.talken@telia.com>
In-reply-to <4991F25F.8080205@v.loewis.de>
Content
> Martin v. Löwis <martin@v.loewis.de> added the comment:
> > The same applies  "Å" and "A", "Ä" and "A" and "Ö" and "O"
> > which also are also different letters as "Ø" and "O" are.
>
> Sure. And rightfully, they "Å" is *not* (I repeat: not)
> normalized as "A", under NFD:
>
> py> unicodedata.normalize("NFD", u"Å")
> u'A\u030a'
>
> > Maybe not in the unicode world but in treal life.
>
> They are different letters also in the Unicode world.
>
> > That's why I'm a little confused.
>
> I think the confusion comes from your assumption that
> normalizing "Å" produces "A". It does not. Really not.

Yes, you are right.

However the confusion/problem shows up when it is used in the application to
build an alphabet and group for example all version of E, É, È, Ë, Ê
together under E. The first character in the result of normalize is
used to build alphabet labels for surnames:

letter = normalize("NFD", surname)[0].upper()
if letter != last_letter:
    last_letter = letter
....
and this is why I get "A" when the surname begins with "Å".

This way it works for all variations of E to be grouped under "E",
but fails as "Å" is shown under the label "A", not the "A" in the
beginning of the alphabet but after "Z", where "ÅÄÖ" comes.
So a previous sorting of the surnames works correctly.
(The Swedish alphabet has 29 letters: A,B,C... X,Y,Z,Å,Ä,Ö)

Can you think of any solution to this conflict? 

u'\xd8'

u'A\u030a'

u'\xc5'

This is obviously the result of how the unicode spec is written
interpreting "Å" as a variation of "A". which it is not.

I have asked the unicode people, but not got any answer yet.

The application is GRAMPS: http://gramps-project.org/

Once again thanks for make some of the unicode stuff clear!
Regards,
Peter Landgren
Files
File name Uploaded
unnamed PeterL, 2009-02-11.08:24:02
History
Date User Action Args
2009-02-11 08:24:06PeterLsetrecipients: + PeterL, loewis
2009-02-11 08:24:05PeterLlinkissue5200 messages
2009-02-11 08:24:05PeterLcreate