Message80061
Martin:"""Considering this note, the simple titlecase of U+01C5 *is*
U+01C4: the titlecase value is omitted, hence it is the same as
uppercase, hence it is U+01C4."""
Perhaps we are looking at different files; in the Unicode 5.1
UnicodeData.txt that I downloaded
(http://www.unicode.org/Public/UNIDATA/UnicodeData.txt), the title field
for U+01C5 is *NOT* omitted, it is set to 01C5. AFAICT the intention is
that the four characters in question are their own titlecase, which is
not altogether unexpected given their visual representation.
Here's the record for U+01C5:
01C5;LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH
CARON;Lt;0;L;<compat> 0044 017E;;;;N;LATIN LETTER CAPITAL D SMALL Z
HACEK;;01C4;01C6;01C5
The note (which I hadn't noticed and explains the mention of
ctype->upper in the _PyUnicode_ToTitlecase function) says that the
titlecase value may be omitted if it is the same as the uppercase. FWIW
there are *no* examples in the current (5.1) file where the title field
is empty and the upper field is not empty.
ISTM the problem is that implementing the default-to-uppercase was not
done in Tools/unicode/makeunicodedata.py where full information is
available. This left no way in _PyUnicode_ToTitlecase of resolving the
ambiguity of a zero value for ctype->title -- is it "no titlecase
supplied so use uppercase" or is it "titlecase supplied, delta == 0,
means ch.title() -> ch"? |
|
Date |
User |
Action |
Args |
2009-01-17 23:46:24 | sjmachin | set | recipients:
+ sjmachin, loewis, mrabarnett |
2009-01-17 23:46:23 | sjmachin | set | messageid: <1232235983.84.0.135501490923.issue4971@psf.upfronthosting.co.za> |
2009-01-17 23:46:23 | sjmachin | link | issue4971 messages |
2009-01-17 23:46:22 | sjmachin | create | |
|