Issue 43473: Junks in difflib

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/87639

classification

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:	docs@python	Nosy List:	docs@python, hubertbdlb, terry.reedy, tim.peters
Priority:	normal	Keywords:

Created on 2021-03-11 09:37 by hubertbdlb, last changed 2022-04-11 14:59 by admin.

Messages (2)
msg388491 - (view)	Author: Hubert Bonnisseur-De-La-Bathe (hubertbdlb)	Date: 2021-03-11 09:37
Reading first at the documentation of difflib, I thought that the use of junks would have produced the result s = SequenceMatcher(lambda x : x == " ", "abcd efgh", "abcdefgh") s.get_matching_blocks() >>> [Match(a=0, b=0, size=8)] At a second lecture, it is clear that such evaluation will return in fact two matches of length 4. Would it be nicer to have get_matching_block return the length 8 match ? Don't know if it's in the spirit of the lib, I'm just asking.
msg388595 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2021-03-13 07:27
Currently return tuple (i, j, n), means that a[i:i+n] == b[j:j+n], where both matching blocks are the same length. https://docs.python.org/3/library/difflib.html#difflib.SequenceMatcher.get_matching_blocks This would not be the case if a has an ignored space and b does not. Changing the current definition would break existing code and would require quadruples to return two different lengths. This would require either a new parameter for the function to select the behavior or a new function with a new name. Either option would require justification by actual use cases. I cannot see what they might be. An way to have junk chars completely ignored is to strip them from both strings before calling SequenceMatcher.

History
Date	User	Action	Args
2022-04-11 14:59:42	admin	set	github: 87639
2021-03-13 07:27:23	terry.reedy	set	nosy: + terry.reedy messages: + msg388595
2021-03-11 10:11:20	xtreak	set	nosy: + tim.peters
2021-03-11 09:37:51	hubertbdlb	create