This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: difflib SequenceMatcher get_matching_blocks returns non-matching blocks in some cases
Type: Stage: resolved
Components: Versions: Python 3.6
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Snidhi, tim.peters
Priority: normal Keywords:

Created on 2020-10-07 07:34 by Snidhi, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (3)
msg378149 - (view) Author: Snidhi Sofpro (Snidhi) Date: 2020-10-07 07:34
---------- Demo case with unexpected results starting from matching block 3 (result of code that follows):

sys.version_info(major=3, minor=6, micro=9, releaselevel='final', serial=0) 

Matches between:
<a id="nhix_Rgstr" href="http://local:56067/register/200930162135700">
<a id="nhix_Rgstr" href="http://local:53813/register/20100517282450281">

Match(a=0, b=0, size=39)
same-> <a id="nhix_Rgstr" href="http://local:5
same=> <a id="nhix_Rgstr" href="http://local:5 

Match(a=43, b=43, size=12)
same-> /register/20
same=> /register/20 

Match(a=59, b=55, size=1)
same-> 1
same=> 0 

Match(a=66, b=56, size=2)
same-> 00
same=> 93 

Match(a=68, b=70, size=2)
same-> ">
same=>  


# ---------- code that results in the above:

def get_mblk(dpiy_Frst, dpiy_Scnd):
    import difflib;
    sqmn_o = difflib.SequenceMatcher(None, dpiy_Frst, dpiy_Scnd);
    mblk_ls = [ block for block in sqmn_o.get_matching_blocks()];
    for mblk in mblk_ls[:-1]: #exclude the last dummy block
        print(mblk);
        mtch_a = dpiy_Frst[mblk.a : mblk.a + mblk.size];
        mtch_b = dpiy_Frst[mblk.b : mblk.b + mblk.size];
        print('same->', mtch_a);
        print('same=>', mtch_b, '\n');
    #endfor
#endef get_mblk

# --- main --

s1='<a id="nhix_Rgstr" href="http://local:56067/register/200930162135700">'
s2='<a id="nhix_Rgstr" href="http://local:53813/register/20100517282450281">'

import sys; print(sys.version_info, '\n');
print("Matches between:"); print(s1); print(s2); print('\n');
get_mblk(s1, s2);
msg378167 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2020-10-07 15:41
I believe your testing code is in error, perhaps because it's so overly elaborate you've lost track of what it's doing.  Here's a straightforward test program:

    import difflib
    s1='<a id="nhix_Rgstr" href="http://local:56067/register/200930162135700">'
    s2='<a id="nhix_Rgstr" href="http://local:53813/register/20100517282450281">'
    d = difflib.SequenceMatcher(None, s1, s2)
    for m in d.get_matching_blocks():
        print(m, repr(s1[m.a : m.a + m.size]),
                 repr(s2[m.b : m.b + m.size]))

and its output under 3.9.0:

    Match(a=0, b=0, size=39) '<a id="nhix_Rgstr" href="http://local:5' '<a id="nhix_Rgstr" href="http://local:5'
    Match(a=43, b=43, size=12) '/register/20' '/register/20'
    Match(a=59, b=55, size=1) '1' '1'
    Match(a=66, b=56, size=2) '00' '00'
    Match(a=68, b=70, size=2) '">' '">'
    Match(a=70, b=72, size=0) '' ''

Your test program is obtaining the substrings to display from these two lines:

        mtch_a = dpiy_Frst[mblk.a : mblk.a + mblk.size];
        mtch_b = dpiy_Frst[mblk.b : mblk.b + mblk.size];

But BOTH of those are extracting substrings from `dply_Frst`. Looks like you intended to use the `dply_Scnd` argument in the second line instead.

If I change my test program to use `s1` for both substrings, then it matches the output you gave. But that doesn't make sense ;-)

So I'm closing this as not-a-bug. If I missed something relevant, feel free to re-open it.
msg378474 - (view) Author: Snidhi Sofpro (Snidhi) Date: 2020-10-12 07:33
Hello Tim Peters,

Like you pointed out:
(1) The test code clearly has a mistake.  
(2) With the mistake corrected, difflib.SequenceMatcher.get_matching_blocks() returns results as expected for the given test data.

I'm unable to readily recreate the issue where it was first observed because the application code from which the test code was derived has meanwhile gotten changed to make use of get_opcodes().

Will update this ticket if I find this to be an issue at all.

Thank you for looking into this so quickly & for the feedback.  
Contributors like you make working in the python ecosystem a professional pleasure.
Thanks again.
History
Date User Action Args
2022-04-11 14:59:36adminsetgithub: 86130
2020-10-12 07:33:16Snidhisetmessages: + msg378474
2020-10-07 15:41:43tim.peterssetstatus: open -> closed
resolution: not a bug
messages: + msg378167

stage: resolved
2020-10-07 08:27:35xtreaksetnosy: + tim.peters
2020-10-07 07:34:46Snidhicreate