Issue 43020: str.lower method with "İ" character

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/87186

classification

Title:	str.lower method with "İ" character
Type:	behavior	Stage:	resolved
Components:	Build	Versions:	Python 3.9

process

Status:	closed	Resolution:	not a bug
Dependencies:		Superseder:
Assigned To:		Nosy List:	emiryegnidemir7, steven.daprano
Priority:	normal	Keywords:

Created on 2021-01-25 06:59 by emiryegnidemir7, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (3)
msg385603 - (view)	Author: Emir (emiryegnidemir7)	Date: 2021-01-25 06:59
In Turkish there is a character when you capitalize "i" and it's written as "İ". When I use str.lower method, it seems like it works just fine because it prints the character I expected(i). But the problem is when I compare it to normal "i" character (without lowering), the character which is "İ" then became "i" has 2 digits(or length) so same words are no more the same. (ex. the word "issue" and the word "İssue"-->"issue" are not the same when compared) This is a big issue in terms of word counter softwares.
msg385614 - (view)	Author: Steven D'Aprano (steven.daprano) *	Date: 2021-01-25 10:22
This is not a bug, but an issue with the way the Unicode standard defines the lowercase of dotted I. See #34723 Fortunately, Unicode will (hopefully!) fix this in revision 14.0, which is scheduled to be included in Python 3.10. Until then, perhaps the simplest way to solve this is that if you are processing Turkish text, change your call to .lower() to .replace('İ', 'I').lower()
msg385906 - (view)	Author: Emir (emiryegnidemir7)	Date: 2021-01-29 12:08
Thanks for the answer. I will keep that in mind when posting another "issue" instead of naming them as "bug". Steven D'Aprano <report@bugs.python.org>, 25 Oca 2021 Pzt, 13:22 tarihinde şunu yazdı: > > Steven D'Aprano <steve+python@pearwood.info> added the comment: > > This is not a bug, but an issue with the way the Unicode standard defines > the lowercase of dotted I. > > See #34723 > > Fortunately, Unicode will (hopefully!) fix this in revision 14.0, which is > scheduled to be included in Python 3.10. > > Until then, perhaps the simplest way to solve this is that if you are > processing Turkish text, change your call to .lower() to .replace('İ', > 'I').lower() > > ---------- > nosy: +steven.daprano > resolution: -> not a bug > stage: -> resolved > status: open -> closed > > _______________________________________ > Python tracker <report@bugs.python.org> > <https://bugs.python.org/issue43020> > _______________________________________ >

History
Date	User	Action	Args
2022-04-11 14:59:40	admin	set	github: 87186
2021-01-29 12:08:34	emiryegnidemir7	set	messages: + msg385906
2021-01-25 10:22:55	steven.daprano	set	status: open -> closed nosy: + steven.daprano messages: + msg385614 resolution: not a bug stage: resolved
2021-01-25 06:59:06	emiryegnidemir7	create