Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SequenceMatcher bug with long sequences #48872

Closed
elibendersky mannequin opened this issue Dec 10, 2008 · 5 comments
Closed

SequenceMatcher bug with long sequences #48872

elibendersky mannequin opened this issue Dec 10, 2008 · 5 comments
Labels
stdlib Python modules in the Lib dir

Comments

@elibendersky
Copy link
Mannequin

elibendersky mannequin commented Dec 10, 2008

BPO 4622
Nosy @terryjreedy
Superseder
  • bpo-2986: difflib.SequenceMatcher not matching long sequences
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2010-06-25.21:55:55.085>
    created_at = <Date 2008-12-10.17:20:54.197>
    labels = ['library']
    title = 'SequenceMatcher bug with long sequences'
    updated_at = <Date 2010-06-25.21:55:55.083>
    user = 'https://bugs.python.org/elibendersky'

    bugs.python.org fields:

    activity = <Date 2010-06-25.21:55:55.083>
    actor = 'terry.reedy'
    assignee = 'none'
    closed = True
    closed_date = <Date 2010-06-25.21:55:55.085>
    closer = 'terry.reedy'
    components = ['Library (Lib)']
    creation = <Date 2008-12-10.17:20:54.197>
    creator = 'eli.bendersky'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 4622
    keywords = []
    message_count = 5.0
    messages = ['77559', '77561', '77565', '77566', '108635']
    nosy_count = 4.0
    nosy_names = ['terry.reedy', 'ggenellina', 'LambertDW', 'eli.bendersky']
    pr_nums = []
    priority = 'normal'
    resolution = 'duplicate'
    stage = None
    status = 'closed'
    superseder = '2986'
    type = None
    url = 'https://bugs.python.org/issue4622'
    versions = ['Python 2.5', 'Python 3.0']

    @elibendersky
    Copy link
    Mannequin Author

    elibendersky mannequin commented Dec 10, 2008

    Here's a reproduction of the error:

    Python 2.5.2 (r252:60911, Oct 20 2008, 09:11:31)
    [GCC 3.4.6 20060404 (Red Hat 3.4.6-10)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import difflib
    >>>
    >>> difflib.SequenceMatcher(None, [4] + [5] * 200, [5] * 200).ratio()
    0.0

    ratio() should be returning close to 1.0 here, not 0. This is only a
    problem for sequences longer than 200. The analogous run for 100:

    >>> difflib.SequenceMatcher(None, [4] + [5] * 100, [5] * 100).ratio()
    0.99502487562189057
    >>>

    I've managed to reproduce it on Linux, Windows (AS 2.5.2) and Try Python
    (http://try-python.mired.org/)

    @elibendersky elibendersky mannequin added the stdlib Python modules in the Lib dir label Dec 10, 2008
    @lambertdw
    Copy link
    Mannequin

    lambertdw mannequin commented Dec 10, 2008

    Python 3.0rc1+ similar.

    @ggenellina
    Copy link
    Mannequin

    ggenellina mannequin commented Dec 10, 2008

    Python 2.3.4 and later have this bug. But release 2.1.3 doesn't:

    Python 2.1.3 (#35, Apr  8 2002, 17:47:50) [MSC 32 bit (Intel)] on win32
    Type "copyright", "credits" or "license" for more information.
    >>> import difflib
    >>> difflib.SequenceMatcher(None, [4] + [5] * 500, [5] * 500).ratio()
    0.99900099900099903
    >>> difflib.SequenceMatcher(None, [4] + [5] * 200, [5] * 200).ratio()
    0.99750623441396513
    >>> difflib.SequenceMatcher(None, [4] + [5] * 100, [5] * 100).ratio()
    0.99502487562189057

    I don't have any 2.2 release to test right now.

    @ggenellina
    Copy link
    Mannequin

    ggenellina mannequin commented Dec 10, 2008

    bpo-2986 may be a duplicate of this; bpo-1528074 is relevant too.

    @terryjreedy
    Copy link
    Member

    This appears to be one of at least three duplicate issues: bpo-1528074, bpo-2986, and bpo-4622. I am closing two, leaving 2986 open, and merging the nearly disjoint nosy lists. (If no longer interested, you can delete yourself from 2986.) bpo-1711800 appears to be slightly different (if not, it could be closed also.)

    Whether or not a new feature is ever added (earliest, now, 3.2), it appears that the docs need improvement to at least explain the current behavior. If someone who understands the issue could open a separate doc issue (for 2.6/7/3.1/2) with a suggested addition, that would be great.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant