classification
Title: SequenceMatcher.ratio() noncommutativity not well-documented
Type: Stage: resolved
Components: Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Dennis Sweeney, docs@python, miss-islington, terry.reedy, tim.peters
Priority: normal Keywords: patch

Created on 2019-05-22 02:09 by Dennis Sweeney, last changed 2019-08-07 16:13 by tim.peters. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 13482 merged python-dev, 2019-05-22 02:21
PR 15157 merged miss-islington, 2019-08-07 04:37
PR 15158 merged miss-islington, 2019-08-07 04:37
PR 15159 closed miss-islington, 2019-08-07 04:37
Messages (4)
msg343138 - (view) Author: Dennis Sweeney (Dennis Sweeney) * Date: 2019-05-22 02:09
I understand that the SequenceMatcher's ratio method does not guarantee that SequenceMatcher(None, a, b).ratio() == SequenceMatcher(None, b, a).ratio(). Below is a counterexample:

    # Example from https://mail.python.org/pipermail/python-list/2010-November/593063.html
    >>> SequenceMatcher(None, 'BRADY', 'BYRD').ratio()
    0.6666666666666666
    >>> SequenceMatcher(None, 'BYRD', 'BRADY').ratio()
    0.4444444444444444

I was recently solving a problem that required a textual similarity ratio function and I wrongly assumed that SequenceMatcher treated both input strings symmetrically, which was an extremely difficult bug to find, especially because for many simple tests, the ratio IS symmetric:

    >>> SequenceMatcher(None, 'apple', 'banana').ratio()
    0.18181818181818182
    >>> SequenceMatcher(None, 'banana', 'apple').ratio()
    0.18181818181818182

I would like to see a clearer warning of this asymmetry in the documentation for the difflib module. Perhaps something like

      .. note::

         Caution: The result of a :meth:`ratio` call is *NOT* symmetric with 
         respect to the order of the arguments. For instance::
            
            >>> SequenceMatcher(None, 'brady', 'byrd').ratio()
            0.6666666666666666
            >>> SequenceMatcher(None, 'byrd', 'brady').ratio()
            0.4444444444444444

Without such a note near the ratio methods' documentations, it is far too easy to google for a Python stdlib functionality for computing text similarity, skip straight to the ratio method, look at the examples given, try some of your own simple examples, and accidentally convince oneself that this symmetry exists.
msg349151 - (view) Author: miss-islington (miss-islington) Date: 2019-08-07 04:37
New changeset e9cbcd0018abd2a5f2348c45d5c9c4265c4f42dc by Miss Islington (bot) (sweeneyde) in branch 'master':
bpo-37004: Documented asymmetry of string arguments in difflib.SequenceMatcher for ratio method (GH-13482)
https://github.com/python/cpython/commit/e9cbcd0018abd2a5f2348c45d5c9c4265c4f42dc
msg349169 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2019-08-07 15:39
New changeset 1a3a40c1cb582e436d568009fae2b06c0b1978ed by Terry Jan Reedy (Miss Islington (bot)) in branch '3.8':
bpo-37004: Documented asymmetry of string arguments in difflib.SequenceMatcher for ratio method (GH-13482) (#15157)
https://github.com/python/cpython/commit/1a3a40c1cb582e436d568009fae2b06c0b1978ed
msg349170 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2019-08-07 15:39
New changeset 7dafbe81bd0afb8bd67bc3a4c851a6c728fd87fe by Terry Jan Reedy (Miss Islington (bot)) in branch '3.7':
bpo-37004: Documented asymmetry of string arguments in difflib.SequenceMatcher for ratio method (GH-13482) (#15158)
https://github.com/python/cpython/commit/7dafbe81bd0afb8bd67bc3a4c851a6c728fd87fe
History
Date User Action Args
2019-08-07 16:13:49tim.peterssetstatus: open -> closed
stage: patch review -> resolved
resolution: fixed
versions: + Python 3.7, Python 3.8, Python 3.9
2019-08-07 15:39:52terry.reedysetmessages: + msg349170
2019-08-07 15:39:35terry.reedysetnosy: + terry.reedy
messages: + msg349169
2019-08-07 04:37:34miss-islingtonsetpull_requests: + pull_request14893
2019-08-07 04:37:28miss-islingtonsetpull_requests: + pull_request14892
2019-08-07 04:37:21miss-islingtonsetpull_requests: + pull_request14891
2019-08-07 04:37:15miss-islingtonsetnosy: + miss-islington
messages: + msg349151
2019-05-22 02:21:34python-devsetkeywords: + patch
stage: patch review
pull_requests: + pull_request13394
2019-05-22 02:20:03xtreaksetnosy: + tim.peters
2019-05-22 02:09:15Dennis Sweeneycreate