Message 314150 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	steven.daprano
Recipients	Kiril Dimitrov, ezio.melotti, methane, steven.daprano, vstinner
Date	2018-03-20.16:12:41
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1521562361.54.0.467229070634.issue33108@psf.upfronthosting.co.za>
In-reply-to

Content
It has never been the case that upper() or lower() are guaranteed to preserve string length in Unicode. For example, some characters decompose into a base plus combining characters. Ligatures are another example. See here for more details: https://unicode.org/faq/casemap_charprop.html However, this example surprises me. In Python 2, I get the result I expected: py> c = unichr(304) py> unicodedata.name(c) 'LATIN CAPITAL LETTER I WITH DOT ABOVE' py> unicodedata.name(c.lower()) 'LATIN SMALL LETTER I' If I am reading the UnicodeData.txt file correctly, I think that the right behaviour is for LATIN CAPITAL LETTER I WITH DOT ABOVE to lowercase to LATIN SMALL LETTER I, as it did in Python 2. ftp://ftp.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt

It has never been the case that upper() or lower() are guaranteed to preserve string length in Unicode. For example, some characters decompose into a base plus combining characters. Ligatures are another example. See here for more details:

https://unicode.org/faq/casemap_charprop.html


However, this example surprises me. In Python 2, I get the result I expected:

py> c = unichr(304)
py> unicodedata.name(c)
'LATIN CAPITAL LETTER I WITH DOT ABOVE'
py> unicodedata.name(c.lower())
'LATIN SMALL LETTER I'


If I am reading the UnicodeData.txt file correctly, I think that the right behaviour is for LATIN CAPITAL LETTER I WITH DOT ABOVE to lowercase to LATIN SMALL LETTER I, as it did in Python 2.

ftp://ftp.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt

History
Date	User	Action	Args
2018-03-20 16:12:41	steven.daprano	set	recipients: + steven.daprano, vstinner, ezio.melotti, methane, Kiril Dimitrov
2018-03-20 16:12:41	steven.daprano	set	messageid: <1521562361.54.0.467229070634.issue33108@psf.upfronthosting.co.za>
2018-03-20 16:12:41	steven.daprano	link	issue33108 messages
2018-03-20 16:12:41	steven.daprano	create