Title: Lowecasing Unicode Characters
Type: behavior Stage: resolved
Components: Versions: Python 3.6
Status: closed Resolution: duplicate
Dependencies: Superseder: Latin Capital Letter I with Dot Above
View: 17252
Assigned To: Nosy List: kingofsevens, malin, steven.daprano
Priority: normal Keywords:

Created on 2019-01-02 12:03 by kingofsevens, last changed 2019-01-04 21:31 by terry.reedy. This issue is now closed.

Messages (2)
msg332865 - (view) Author: Erdem Uney (kingofsevens) Date: 2019-01-02 12:03
assert 'ŞİŞLİ'.lower() == 'şişli'

Lowercasing the capital İ (with a dot on - \u0130) adds a unicode character \u0307 after i and if there is a following character it adds that dot (\u0307) over that character. The behavior is different in Python 2.7.10 where it adds the dot on top of 'i'.

Accord to Unicode Specifications character \u0130 should be converted to character \u0069.
msg332875 - (view) Author: Ma Lin (malin) * Date: 2019-01-02 13:28
please read this discussion

behavior in Python 3.2- is correct for Turkish users.
behavior in Python 3.3+ is correct for non-Turkish users.
Date User Action Args
2019-01-04 21:31:09terry.reedysetstatus: open -> closed
superseder: Latin Capital Letter I with Dot Above
resolution: duplicate
stage: resolved
2019-01-03 15:16:28steven.dapranosetnosy: + steven.daprano
2019-01-02 13:28:52malinsetnosy: + malin
messages: + msg332875
2019-01-02 12:03:43kingofsevenscreate