This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients ezio.melotti, vstinner, zamsalak
Date 2018-09-18.14:15:42
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1537280143.02.0.956365154283.issue34723@psf.upfronthosting.co.za>
In-reply-to
Content
> Should it not simply return “i”?

Python implements the Unicode standard.

>>> "U+%04x" % ord("İ")
'U+0130'
>>> ["U+%04x" % ord(ch) for ch in "İ".lower()]
['U+0069', 'U+0307']

>>> unicodedata.name("İ")
'LATIN CAPITAL LETTER I WITH DOT ABOVE'
>>> [unicodedata.name(ch) for ch in "İ".lower()]
['LATIN SMALL LETTER I', 'COMBINING DOT ABOVE']

At the C level(), lower_ucs4() calls _PyUnicode_ToLowerFull() which lookup into Python internal Unicode database.

U+0130 character enters the EXTENDED_CASE_MASK case: use _PyUnicode_ExtendedCase secondary database for "extended case".

Well, at the end, Python uses the following data file from the Unicode standard:

https://www.unicode.org/Public/9.0.0/ucd/SpecialCasing.txt

Extract:
"""
# Preserve canonical equivalence for I with dot. Turkic is handled below.

0130; 0069 0307; 0130; 0130; # LATIN CAPITAL LETTER I WITH DOT ABOVE
"""


If you want to convert strings differently for the special case of Turkish, you need to use a different standard than Unicode...

I close the issue as NOT A BUG.
History
Date User Action Args
2018-09-18 14:15:43vstinnersetrecipients: + vstinner, ezio.melotti, zamsalak
2018-09-18 14:15:43vstinnersetmessageid: <1537280143.02.0.956365154283.issue34723@psf.upfronthosting.co.za>
2018-09-18 14:15:43vstinnerlinkissue34723 messages
2018-09-18 14:15:42vstinnercreate