Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SequenceMatcher.ratio() noncommutativity not well-documented #81185

Closed
sweeneyde opened this issue May 22, 2019 · 4 comments
Closed

SequenceMatcher.ratio() noncommutativity not well-documented #81185

sweeneyde opened this issue May 22, 2019 · 4 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes

Comments

@sweeneyde
Copy link
Member

BPO 37004
Nosy @tim-one, @terryjreedy, @miss-islington, @sweeneyde
PRs
  • bpo-37004: Documented asymmetry of string arguments in difflib.SequenceMatcher for ratio method #13482
  • [3.8] bpo-37004: Documented asymmetry of string arguments in difflib.SequenceMatcher for ratio method (GH-13482) #15157
  • [3.7] bpo-37004: Documented asymmetry of string arguments in difflib.SequenceMatcher for ratio method (GH-13482) #15158
  • [3.6] bpo-37004: Documented asymmetry of string arguments in difflib.SequenceMatcher for ratio method (GH-13482) #15159
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-08-07.16:13:49.300>
    created_at = <Date 2019-05-22.02:09:15.379>
    labels = ['3.7', '3.8', '3.9']
    title = 'SequenceMatcher.ratio() noncommutativity not well-documented'
    updated_at = <Date 2019-08-07.16:13:49.293>
    user = 'https://github.com/sweeneyde'

    bugs.python.org fields:

    activity = <Date 2019-08-07.16:13:49.293>
    actor = 'tim.peters'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-08-07.16:13:49.300>
    closer = 'tim.peters'
    components = []
    creation = <Date 2019-05-22.02:09:15.379>
    creator = 'Dennis Sweeney'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 37004
    keywords = ['patch']
    message_count = 4.0
    messages = ['343138', '349151', '349169', '349170']
    nosy_count = 5.0
    nosy_names = ['tim.peters', 'terry.reedy', 'docs@python', 'miss-islington', 'Dennis Sweeney']
    pr_nums = ['13482', '15157', '15158', '15159']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue37004'
    versions = ['Python 3.7', 'Python 3.8', 'Python 3.9']

    @sweeneyde
    Copy link
    Member Author

    I understand that the SequenceMatcher's ratio method does not guarantee that SequenceMatcher(None, a, b).ratio() == SequenceMatcher(None, b, a).ratio(). Below is a counterexample:

        # Example from https://mail.python.org/pipermail/python-list/2010-November/593063.html
        >>> SequenceMatcher(None, 'BRADY', 'BYRD').ratio()
        0.6666666666666666
        >>> SequenceMatcher(None, 'BYRD', 'BRADY').ratio()
        0.4444444444444444

    I was recently solving a problem that required a textual similarity ratio function and I wrongly assumed that SequenceMatcher treated both input strings symmetrically, which was an extremely difficult bug to find, especially because for many simple tests, the ratio IS symmetric:

        >>> SequenceMatcher(None, 'apple', 'banana').ratio()
        0.18181818181818182
        >>> SequenceMatcher(None, 'banana', 'apple').ratio()
        0.18181818181818182

    I would like to see a clearer warning of this asymmetry in the documentation for the difflib module. Perhaps something like

      .. note::
    
             Caution: The result of a :meth:`ratio` call is *NOT* symmetric with 
             respect to the order of the arguments. For instance::
                
                >>> SequenceMatcher(None, 'brady', 'byrd').ratio()
                0.6666666666666666
                >>> SequenceMatcher(None, 'byrd', 'brady').ratio()
                0.4444444444444444

    Without such a note near the ratio methods' documentations, it is far too easy to google for a Python stdlib functionality for computing text similarity, skip straight to the ratio method, look at the examples given, try some of your own simple examples, and accidentally convince oneself that this symmetry exists.

    @miss-islington
    Copy link
    Contributor

    New changeset e9cbcd0 by Miss Islington (bot) (sweeneyde) in branch 'master':
    bpo-37004: Documented asymmetry of string arguments in difflib.SequenceMatcher for ratio method (GH-13482)
    e9cbcd0

    @terryjreedy
    Copy link
    Member

    New changeset 1a3a40c by Terry Jan Reedy (Miss Islington (bot)) in branch '3.8':
    bpo-37004: Documented asymmetry of string arguments in difflib.SequenceMatcher for ratio method (GH-13482) (bpo-15157)
    1a3a40c

    @terryjreedy
    Copy link
    Member

    New changeset 7dafbe8 by Terry Jan Reedy (Miss Islington (bot)) in branch '3.7':
    bpo-37004: Documented asymmetry of string arguments in difflib.SequenceMatcher for ratio method (GH-13482) (bpo-15158)
    7dafbe8

    @tim-one tim-one added 3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes labels Aug 7, 2019
    @tim-one tim-one closed this as completed Aug 7, 2019
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants