Message 401920 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	tim.peters
Recipients	nalza001, tim.peters
Date	2021-09-16.05:02:22
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1631768543.07.0.00636362481393.issue45180@roundup.psfhosted.org>
In-reply-to

Content
Please stop re-opening this. The issue tracker is not a "help desk", and your confusions aren't necessarily Python bugs ;-) If you post something that looks like an actual bug, I'll re-open the report. SequenceMatcher works on sequences. HtmlFiff works on sequences OF sequences (typically lists of lines). Very different things. For example, h = difflib.HtmlDiff() h.make_file(['aaabbbbbbbbb'], ['aaacccccccc']) finds nothing at all in common. It finds that the two lines don't match, and then finds that the lines aren't "similar enough" to bother looking any deeper. But, obviously, they do share 'aaa' as a common prefix, and calling SequenceMatcher directly on the two strings will find their common prefix. There's no reason to imagine they'll produce the same results - they're not doing the same things. SequenceMatcher is used, in several places, as a building block to do the more complicated things HtmlDiff does. But HtmlDiff works on _lines_ first; SequenceMatcher has no concept of "line". As to 1-(delta/totalSize), I have no idea where that came from. What SequenceMatcher.ratio() returns is documented: """ Where T is the total number of elements in both sequences, and M is the number of matches, this is 2.0*M / T. Note that this is 1.0 if the sequences are identical, and 0.0 if they have nothing in common. """

Please stop re-opening this. The issue tracker is not a "help desk", and your confusions aren't necessarily Python bugs ;-) If you post something that looks like an actual bug, I'll re-open the report.

SequenceMatcher works on sequences.

HtmlFiff works on sequences OF sequences (typically lists of lines). Very different things. For example,

h = difflib.HtmlDiff()
h.make_file(['aaabbbbbbbbb'], ['aaacccccccc'])

finds nothing at all in common. It finds that the two lines don't match, and then finds that the lines aren't "similar enough" to bother looking any deeper. But, obviously, they do share 'aaa' as a common prefix, and calling SequenceMatcher directly on the two strings will find their common prefix.

There's no reason to imagine they'll produce the same results - they're not doing the same things. SequenceMatcher is used, in several places, as a building block to do the more complicated things HtmlDiff does. But HtmlDiff works on _lines_ first; SequenceMatcher has no concept of "line".

As to 1-(delta/totalSize), I have no idea where that came from. What SequenceMatcher.ratio() returns is documented:

"""
Where T is the total number of elements in both sequences, and M is the number of matches, this is 2.0*M / T. Note that this is 1.0 if the sequences are identical, and 0.0 if they have nothing in common.
"""

History
Date	User	Action	Args
2021-09-16 05:02:23	tim.peters	set	recipients: + tim.peters, nalza001
2021-09-16 05:02:23	tim.peters	set	messageid: <1631768543.07.0.00636362481393.issue45180@roundup.psfhosted.org>
2021-09-16 05:02:23	tim.peters	link	issue45180 messages
2021-09-16 05:02:22	tim.peters	create