Issue41964
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2020-10-07 07:34 by Snidhi, last changed 2022-04-11 14:59 by admin. This issue is now closed.
Messages (3) | |||
---|---|---|---|
msg378149 - (view) | Author: Snidhi Sofpro (Snidhi) | Date: 2020-10-07 07:34 | |
---------- Demo case with unexpected results starting from matching block 3 (result of code that follows): sys.version_info(major=3, minor=6, micro=9, releaselevel='final', serial=0) Matches between: <a id="nhix_Rgstr" href="http://local:56067/register/200930162135700"> <a id="nhix_Rgstr" href="http://local:53813/register/20100517282450281"> Match(a=0, b=0, size=39) same-> <a id="nhix_Rgstr" href="http://local:5 same=> <a id="nhix_Rgstr" href="http://local:5 Match(a=43, b=43, size=12) same-> /register/20 same=> /register/20 Match(a=59, b=55, size=1) same-> 1 same=> 0 Match(a=66, b=56, size=2) same-> 00 same=> 93 Match(a=68, b=70, size=2) same-> "> same=> # ---------- code that results in the above: def get_mblk(dpiy_Frst, dpiy_Scnd): import difflib; sqmn_o = difflib.SequenceMatcher(None, dpiy_Frst, dpiy_Scnd); mblk_ls = [ block for block in sqmn_o.get_matching_blocks()]; for mblk in mblk_ls[:-1]: #exclude the last dummy block print(mblk); mtch_a = dpiy_Frst[mblk.a : mblk.a + mblk.size]; mtch_b = dpiy_Frst[mblk.b : mblk.b + mblk.size]; print('same->', mtch_a); print('same=>', mtch_b, '\n'); #endfor #endef get_mblk # --- main -- s1='<a id="nhix_Rgstr" href="http://local:56067/register/200930162135700">' s2='<a id="nhix_Rgstr" href="http://local:53813/register/20100517282450281">' import sys; print(sys.version_info, '\n'); print("Matches between:"); print(s1); print(s2); print('\n'); get_mblk(s1, s2); |
|||
msg378167 - (view) | Author: Tim Peters (tim.peters) * | Date: 2020-10-07 15:41 | |
I believe your testing code is in error, perhaps because it's so overly elaborate you've lost track of what it's doing. Here's a straightforward test program: import difflib s1='<a id="nhix_Rgstr" href="http://local:56067/register/200930162135700">' s2='<a id="nhix_Rgstr" href="http://local:53813/register/20100517282450281">' d = difflib.SequenceMatcher(None, s1, s2) for m in d.get_matching_blocks(): print(m, repr(s1[m.a : m.a + m.size]), repr(s2[m.b : m.b + m.size])) and its output under 3.9.0: Match(a=0, b=0, size=39) '<a id="nhix_Rgstr" href="http://local:5' '<a id="nhix_Rgstr" href="http://local:5' Match(a=43, b=43, size=12) '/register/20' '/register/20' Match(a=59, b=55, size=1) '1' '1' Match(a=66, b=56, size=2) '00' '00' Match(a=68, b=70, size=2) '">' '">' Match(a=70, b=72, size=0) '' '' Your test program is obtaining the substrings to display from these two lines: mtch_a = dpiy_Frst[mblk.a : mblk.a + mblk.size]; mtch_b = dpiy_Frst[mblk.b : mblk.b + mblk.size]; But BOTH of those are extracting substrings from `dply_Frst`. Looks like you intended to use the `dply_Scnd` argument in the second line instead. If I change my test program to use `s1` for both substrings, then it matches the output you gave. But that doesn't make sense ;-) So I'm closing this as not-a-bug. If I missed something relevant, feel free to re-open it. |
|||
msg378474 - (view) | Author: Snidhi Sofpro (Snidhi) | Date: 2020-10-12 07:33 | |
Hello Tim Peters, Like you pointed out: (1) The test code clearly has a mistake. (2) With the mistake corrected, difflib.SequenceMatcher.get_matching_blocks() returns results as expected for the given test data. I'm unable to readily recreate the issue where it was first observed because the application code from which the test code was derived has meanwhile gotten changed to make use of get_opcodes(). Will update this ticket if I find this to be an issue at all. Thank you for looking into this so quickly & for the feedback. Contributors like you make working in the python ecosystem a professional pleasure. Thanks again. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:36 | admin | set | github: 86130 |
2020-10-12 07:33:16 | Snidhi | set | messages: + msg378474 |
2020-10-07 15:41:43 | tim.peters | set | status: open -> closed resolution: not a bug messages: + msg378167 stage: resolved |
2020-10-07 08:27:35 | xtreak | set | nosy:
+ tim.peters |
2020-10-07 07:34:46 | Snidhi | create |