This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author brian.gallagher
Recipients brian.gallagher, lemburg, python-dev, rhettinger, tim.peters
Date 2020-04-08.22:44:00
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1586385840.65.0.713814737141.issue39891@roundup.psfhosted.org>
In-reply-to
Content
Just giving this a bump, in case it has been forgotten about.

I've posted a patch at https://github.com/python/cpython/pull/18983.

It adds a new parameter "ignorecase" to get_close_matches() that, if set to True, will result in the SequenceMatcher treating any character case insensitively (as determined by str.lower()).

The benefit to using this keyword, as opposed to letting the application handle the normalization, is that it saves on memory. If the application has to normalize and supply a separate list to get_close_matches(), then it ends up having to maintain a mapping between the original string and the normalized string. As an example:

>>> from difflib import get_close_matches
>>> word = 'apple'
>>> possibilities = ['apPLE', 'APPLE', 'APE', 'Banana', 'Fruit', 'PEAR', 'CoCoNuT']
>>> normalized_possibilities = {p.lower(): p for p in possibilities}
>>> result = get_close_matches(word, normalized_possibilities.keys())
>>> result
['apple', 'ape']
>>> normalized_result = [normalized_possibilities[r] for r in result]
>>> normalized_result
['APPLE', 'APE']

By letting the SequenceMatcher handle the casing on the fly, we could potentially save large amounts of memory if someone was providing a huge list to get_close_matches.
History
Date User Action Args
2020-04-08 22:44:00brian.gallaghersetrecipients: + brian.gallagher, lemburg, tim.peters, rhettinger, python-dev
2020-04-08 22:44:00brian.gallaghersetmessageid: <1586385840.65.0.713814737141.issue39891@roundup.psfhosted.org>
2020-04-08 22:44:00brian.gallagherlinkissue39891 messages
2020-04-08 22:44:00brian.gallaghercreate