Issue 36671: str.lower() looses character information when working with UTF-8

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/80852

classification

Title:	str.lower() looses character information when working with UTF-8
Type:	behavior	Stage:	resolved
Components:	Unicode	Versions:	Python 3.6

process

Status:	closed	Resolution:	not a bug
Dependencies:		Superseder:
Assigned To:		Nosy List:	Kadam Parikh, SilentGhost, ezio.melotti, vstinner
Priority:	normal	Keywords:

Created on 2019-04-20 07:02 by Kadam Parikh, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (2)
msg340563 - (view)	Author: Kadam Parikh (Kadam Parikh)	Date: 2019-04-20 07:02
When converting a particular UTF-8 character "İ" to lowercase, it doesn't behave correctly. It returns two lowercase characters instead of one. This is not as desired. Code: >>> print("\u0130") İ >>> print("\u0130".lower()) i̇ >>>
msg340567 - (view)	Author: SilentGhost (SilentGhost) *	Date: 2019-04-20 07:48
This is the behaviour according to the Unicode standard version 11. This is not an oversight on part of CPython implementation, this character (among others) lowercases to two characters.

History
Date	User	Action	Args
2022-04-11 14:59:14	admin	set	github: 80852
2019-04-20 07:48:26	SilentGhost	set	status: open -> closed nosy: + SilentGhost messages: + msg340567 resolution: not a bug stage: resolved
2019-04-20 07:02:42	Kadam Parikh	create