This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author brian.gallagher
Recipients brian.gallagher, lemburg, tim.peters
Date 2020-03-08.12:57:23
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1583672243.36.0.884165031963.issue39891@roundup.psfhosted.org>
In-reply-to
Content
I agree that there is an appeal to leaving any normalization to the application and that trying guess what people want is a tough hole -- I hadn't even considered what casing would mean in a general sense for Unicode.

I'm not entirely convinced that this should be pursued either, but I'll refine my proposal, provide a little context in which I thought it could be a problem and see what you guys think.

1. Some code is written that assumes get_close_matches() will match on a case-insensitive basis. Only a small bit of testing is done because the functionality is provided by the standard library not the application code, so we throw a few examples like 'apple' and 'ape' and decide it is okay. We later on discover we have a bug because we actually need to match against 'AppLE' too.

2. The extension I had in mind was to match on a case-insensitive basis for only the alphabet characters. I don't know much about Unicode, but there's definitely gotchas lurking in my previous statement (titlecase vs. uppercase) so copying the behaviour of string.upper()/string.lower() would seem reasonable to me. The functionality would only be extended to match the same strings it would anyways, but now ignore casing. We wouldn't be eliminating any existing matches. I guess this still has the potential to be a breaking change, since someone might indirectly be depending on this.

For 1., not testing that your code can handle mixed case comparisons in the way you're assuming it will is probably your own fault. On the other hand, I think it is a reasonable assumption to think that get_close_matches() will match an uppercase/lowercase counterpart since the function's intent is to provide intuitive matches that "look right" to a human. 

Maybe this is more of a documentation issue than something that needs to be addressed in the code. If a caveat about the case sensitivity of the function is added to the documentation, then a developer can be aware of the limitation in order to provide any normalization they want in the application code.

Let me know what you guys think.
History
Date User Action Args
2020-03-08 12:57:23brian.gallaghersetrecipients: + brian.gallagher, lemburg, tim.peters
2020-03-08 12:57:23brian.gallaghersetmessageid: <1583672243.36.0.884165031963.issue39891@roundup.psfhosted.org>
2020-03-08 12:57:23brian.gallagherlinkissue39891 messages
2020-03-08 12:57:23brian.gallaghercreate