classification
Title: SequenceMatcher bug
Type: behavior Stage: resolved
Components: Versions: Python 3.5
process
Status: closed Resolution: duplicate
Dependencies: Superseder:
Assigned To: Nosy List: mcft, tim.peters
Priority: normal Keywords:

Created on 2018-03-20 20:09 by mcft, last changed 2018-03-20 20:35 by tim.peters. This issue is now closed.

Messages (2)
msg314163 - (view) Author: Martin (mcft) Date: 2018-03-20 20:09
difflib.SequenceMatcher fails to make a proper alignment between 2 sequences with only 3 single letter changes. Its performance is completely off with a similarity ratio of 0.16, in stead of the more accurate 0.99.

Here is a snippet to replicate the failure:
>>> aa_ref = 'MTLFTTLLVLIFERLFKLGEHWQLDHRLEAFFRRVKHFSLGRTLGMTIIAMGVTFLLLRALQGVLFNVPTLLVWLLIGLLCIGAGKVRLHYHAYLTAASRNDSHARATMAGELTMIHGVPAGCDEREYLRELQNALLWINFRFYLAPLFWLIVGGTWGPVTLMGYAFLRAWQYWLARYQTPHHRLQSGIDAVLHVLDWVPVRLAGVVYALIGHGEKALPAWFASLGDFHTSQYQVLTRLAQFSLAREPHVDKVETPKAAVSMAKKTSFVVVVVIALLTIYGALV'
>>> aa_seq = 'MTLFTTLLVLIFERLFKLGEHWQLDHRLEAFFRRVKHFSLGRTLCMTIIAMGVTFLLLRALQGVLFNVPTLLVWLLIGLLCIGAGKVRLHYHAYLTAASRNDSHAHATMAGELTMIHGVPAGCDEREYLRELQNALLWINFRFYLAPLFWLIVGGTWGPVTLMGYAFLRAWQYWLARYQTPHHRLQSGIDAVLHALDWVPVRLAGVVYALIGHGEKALPAWFASLGDFHTSQYQVLTRLAQFSLAREPHVDKVETPKAAVSMAKKTSFVVVVVIALLTIYGALV'
>>> sum(a!=b for a, b in zip(aa_ref, aa_seq))
3
>>> match = SequenceMatcher(a=aa_ref, b=aa_seq)
>>> match.ratio()
0.1619718309859155
>>> match.get_opcodes()
[('equal', 0, 43, 0, 43), ('delete', 43, 79, 43, 43), ('equal', 79, 81, 43, 45), ('replace', 81, 122, 45, 80), ('equal', 122, 123, 80, 81), ('replace', 123, 284, 81, 284)]
msg314165 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2018-03-20 20:35
Please see the response to issue31889.  Short course:  you need to pass `autojunk=False` to the SequenceMatcher constructor.
History
Date User Action Args
2018-03-20 20:35:51tim.peterssetstatus: open -> closed

nosy: + tim.peters
messages: + msg314165

resolution: duplicate
stage: resolved
2018-03-20 20:09:08mcftcreate