Message 343138 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	Dennis Sweeney
Recipients	Dennis Sweeney, docs@python
Date	2019-05-22.02:09:15
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1558490955.4.0.899055526874.issue37004@roundup.psfhosted.org>
In-reply-to

Content
I understand that the SequenceMatcher's ratio method does not guarantee that SequenceMatcher(None, a, b).ratio() == SequenceMatcher(None, b, a).ratio(). Below is a counterexample: # Example from https://mail.python.org/pipermail/python-list/2010-November/593063.html >>> SequenceMatcher(None, 'BRADY', 'BYRD').ratio() 0.6666666666666666 >>> SequenceMatcher(None, 'BYRD', 'BRADY').ratio() 0.4444444444444444 I was recently solving a problem that required a textual similarity ratio function and I wrongly assumed that SequenceMatcher treated both input strings symmetrically, which was an extremely difficult bug to find, especially because for many simple tests, the ratio IS symmetric: >>> SequenceMatcher(None, 'apple', 'banana').ratio() 0.18181818181818182 >>> SequenceMatcher(None, 'banana', 'apple').ratio() 0.18181818181818182 I would like to see a clearer warning of this asymmetry in the documentation for the difflib module. Perhaps something like .. note:: Caution: The result of a :meth:`ratio` call is NOT symmetric with respect to the order of the arguments. For instance:: >>> SequenceMatcher(None, 'brady', 'byrd').ratio() 0.6666666666666666 >>> SequenceMatcher(None, 'byrd', 'brady').ratio() 0.4444444444444444 Without such a note near the ratio methods' documentations, it is far too easy to google for a Python stdlib functionality for computing text similarity, skip straight to the ratio method, look at the examples given, try some of your own simple examples, and accidentally convince oneself that this symmetry exists.

I understand that the SequenceMatcher's ratio method does not guarantee that SequenceMatcher(None, a, b).ratio() == SequenceMatcher(None, b, a).ratio(). Below is a counterexample:

    # Example from https://mail.python.org/pipermail/python-list/2010-November/593063.html
    >>> SequenceMatcher(None, 'BRADY', 'BYRD').ratio()
    0.6666666666666666
    >>> SequenceMatcher(None, 'BYRD', 'BRADY').ratio()
    0.4444444444444444

I was recently solving a problem that required a textual similarity ratio function and I wrongly assumed that SequenceMatcher treated both input strings symmetrically, which was an extremely difficult bug to find, especially because for many simple tests, the ratio IS symmetric:

    >>> SequenceMatcher(None, 'apple', 'banana').ratio()
    0.18181818181818182
    >>> SequenceMatcher(None, 'banana', 'apple').ratio()
    0.18181818181818182

I would like to see a clearer warning of this asymmetry in the documentation for the difflib module. Perhaps something like

      .. note::

         Caution: The result of a :meth:`ratio` call is *NOT* symmetric with 
         respect to the order of the arguments. For instance::
            
            >>> SequenceMatcher(None, 'brady', 'byrd').ratio()
            0.6666666666666666
            >>> SequenceMatcher(None, 'byrd', 'brady').ratio()
            0.4444444444444444

Without such a note near the ratio methods' documentations, it is far too easy to google for a Python stdlib functionality for computing text similarity, skip straight to the ratio method, look at the examples given, try some of your own simple examples, and accidentally convince oneself that this symmetry exists.

History
Date	User	Action	Args
2019-05-22 02:09:15	Dennis Sweeney	set	recipients: + Dennis Sweeney, docs@python
2019-05-22 02:09:15	Dennis Sweeney	set	messageid: <1558490955.4.0.899055526874.issue37004@roundup.psfhosted.org>
2019-05-22 02:09:15	Dennis Sweeney	link	issue37004 messages
2019-05-22 02:09:15	Dennis Sweeney	create