This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: str.lower method with "İ" character
Type: behavior Stage: resolved
Components: Build Versions: Python 3.9
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: emiryegnidemir7, steven.daprano
Priority: normal Keywords:

Created on 2021-01-25 06:59 by emiryegnidemir7, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (3)
msg385603 - (view) Author: Emir (emiryegnidemir7) Date: 2021-01-25 06:59
In Turkish there is a character when you capitalize "i" and it's written as "İ". When I use str.lower method, it seems like it works just fine because it prints the character I expected(i). But the problem is when I compare it to normal "i" character (without lowering), the character which is "İ" then became "i" has 2 digits(or length) so same words are no more the same. (ex. the word "issue" and the word "İssue"-->"issue" are not the same when compared) This is a big issue in terms of word counter softwares.
msg385614 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2021-01-25 10:22
This is not a bug, but an issue with the way the Unicode standard defines the lowercase of dotted I.

See #34723

Fortunately, Unicode will (hopefully!) fix this in revision 14.0, which is scheduled to be included in Python 3.10.

Until then, perhaps the simplest way to solve this is that if you are processing Turkish text, change your call to .lower() to .replace('İ', 'I').lower()
msg385906 - (view) Author: Emir (emiryegnidemir7) Date: 2021-01-29 12:08
Thanks for the answer. I will keep that in mind when posting another
"issue" instead of naming them as "bug".

Steven D'Aprano <report@bugs.python.org>, 25 Oca 2021 Pzt, 13:22 tarihinde
şunu yazdı:

>
> Steven D'Aprano <steve+python@pearwood.info> added the comment:
>
> This is not a bug, but an issue with the way the Unicode standard defines
> the lowercase of dotted I.
>
> See #34723
>
> Fortunately, Unicode will (hopefully!) fix this in revision 14.0, which is
> scheduled to be included in Python 3.10.
>
> Until then, perhaps the simplest way to solve this is that if you are
> processing Turkish text, change your call to .lower() to .replace('İ',
> 'I').lower()
>
> ----------
> nosy: +steven.daprano
> resolution:  -> not a bug
> stage:  -> resolved
> status: open -> closed
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue43020>
> _______________________________________
>
History
Date User Action Args
2022-04-11 14:59:40adminsetgithub: 87186
2021-01-29 12:08:34emiryegnidemir7setmessages: + msg385906
2021-01-25 10:22:55steven.dapranosetstatus: open -> closed

nosy: + steven.daprano
messages: + msg385614

resolution: not a bug
stage: resolved
2021-01-25 06:59:06emiryegnidemir7create