This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: str.lower() looses character information when working with UTF-8
Type: behavior Stage: resolved
Components: Unicode Versions: Python 3.6
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Kadam Parikh, SilentGhost, ezio.melotti, vstinner
Priority: normal Keywords:

Created on 2019-04-20 07:02 by Kadam Parikh, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (2)
msg340563 - (view) Author: Kadam Parikh (Kadam Parikh) Date: 2019-04-20 07:02
When converting a particular UTF-8 character "İ" to lowercase, it doesn't behave correctly. It returns two lowercase characters instead of one. This is not as desired.

Code:

>>> print("\u0130")
İ
>>> print("\u0130".lower())
i̇
>>>
msg340567 - (view) Author: SilentGhost (SilentGhost) * (Python triager) Date: 2019-04-20 07:48
This is the behaviour according to the Unicode standard version 11. This is not an oversight on part of CPython implementation, this character (among others) lowercases to two characters.
History
Date User Action Args
2022-04-11 14:59:14adminsetgithub: 80852
2019-04-20 07:48:26SilentGhostsetstatus: open -> closed

nosy: + SilentGhost
messages: + msg340567

resolution: not a bug
stage: resolved
2019-04-20 07:02:42Kadam Parikhcreate