Message 244840 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	floyd
Recipients	floyd
Date	2015-06-04.20:12:52
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1433448775.15.0.914528193485.issue24384@psf.upfronthosting.co.za>
In-reply-to

Content
I guess a lot of users of difflib call the SequenceMatcher in the following way (where a and b often have different lengths): if difflib.SequenceMatcher.quick_ratio(None, a, b) >= threshold: However, for this use case the current quick_ratio is quite a performance loss. Therefore I propose to add an additional, optimized version quick_ratio_ge which would be called like this: if difflib.SequenceMatcher.quick_ratio_ge(None, a, b, threshold): As we are able to calculate upper bounds for threshold depending on the lengths of a and b this function would return much faster in a lot of cases. An example of how quick_ratio_ge could be implemented is attached.

I guess a lot of users of difflib call the SequenceMatcher in the following way (where a and b often have different lengths):

if difflib.SequenceMatcher.quick_ratio(None, a, b) >= threshold:

However, for this use case the current quick_ratio is quite a performance loss. Therefore I propose to add an additional, optimized version quick_ratio_ge which would be called like this:

if difflib.SequenceMatcher.quick_ratio_ge(None, a, b, threshold):

As we are able to calculate upper bounds for threshold depending on the lengths of a and b this function would return much faster in a lot of cases.

An example of how quick_ratio_ge could be implemented is attached.

History
Date	User	Action	Args
2015-06-04 20:12:55	floyd	set	recipients: + floyd
2015-06-04 20:12:55	floyd	set	messageid: <1433448775.15.0.914528193485.issue24384@psf.upfronthosting.co.za>
2015-06-04 20:12:55	floyd	link	issue24384 messages
2015-06-04 20:12:54	floyd	create