classification
Title: title information of unicodedata is wrong in some cases
Type: behavior Stage:
Components: Interpreter Core Versions: Python 3.0, Python 2.4, Python 2.6, Python 2.5
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Incorrect title case
View: 4971
Assigned To: Nosy List: Carl.Friedrich.Bolz, loewis
Priority: normal Keywords:

Created on 2009-04-19 10:57 by Carl.Friedrich.Bolz, last changed 2009-04-19 11:05 by loewis. This issue is now closed.

Messages (2)
msg86163 - (view) Author: Carl Friedrich Bolz-Tereick (Carl.Friedrich.Bolz) * Date: 2009-04-19 10:57
There seems to be a problem with some unicode character's title information:

$ python2.6
Python 2.6.2c1 (release26-maint, Apr 14 2009, 08:02:48)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> unichr(453)
u'\u01c5'
>>> unichr(453).title()
u'\u01c4'

But the title should return the same character, according to this:

http://www.fileformat.info/info/unicode/char/01c5/index.htm

(I also checked the files that unicode.org provides). I tried to follow
the problem a bit, it seems to come from _PyUnicode_ToTitlecase in
unicodetype.c. The unicode record contains the offset of the character
to its titled version. If the character is its own titled version, then
the offset is zero. But zero is also used for when there is no
information available, so the offset to the upper-case version of the
character is used. If this is a different character (as for the example
above), the result of .title() is wrong.
msg86164 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-04-19 11:05
This is a duplicate of issue4971
History
Date User Action Args
2009-04-19 11:05:08loewissetstatus: open -> closed

nosy: + loewis
messages: + msg86164

superseder: Incorrect title case
resolution: duplicate
2009-04-19 10:57:47Carl.Friedrich.Bolzcreate