classification
Title: difflib.SequenceMatcher and Match: code and doc bugs
Type: behavior Stage: patch review
Components: Documentation, Library (Lib) Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: terry.reedy Nosy List: rhettinger, terry.reedy, tim.peters
Priority: normal Keywords:

Created on 2011-06-21 20:00 by terry.reedy, last changed 2014-12-31 16:25 by akuchling.

Files
File name Uploaded Description Edit
12384-patch.txt akuchling, 2014-03-18 22:49
Messages (5)
msg138799 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-06-21 20:00
The basic problem: in 2.6, a namedtuple was introduced to difflib

from collections import namedtuple as _namedtuple
Match = _namedtuple('Match', 'a b size')

and used for the return values of SeqeunceMatcher.get_longest_match and .get_matching_blocks. Code, docstrings, and docs were only partially updated to match.

Code:

    def get_matching_blocks(self):
        """Return list of triples describing matching subsequences.
        Each triple is of the form (i, j, n), and means that
        ...'''
        if self.matching_blocks is not None:
            return self.matching_blocks
        ...
        self.matching_blocks = non_adjacent
        return map(Match._make, self.matching_blocks)

The two returns are different because only the second was changed.
The obvious fix is to change the first to match. Or perhaps self.matching_blocks (an undocumented cache) should be the map object.

Docstring and doc for .find_longest_match():

Both start
 "Find longest matching block ... returns (i, j, k) such that ... "
Doc (bug not docstring) explicitly says at the *bottom* of the entry "This method returns a named tuple Match(a, b, size)."
which is different from (i,j,n). For 2.7, the note is preceded by "Changed in version 2.6:"
The examples show the change before it is described.

I think that the current return should be accurately described at the *top* of the entry, not the bottom. 2.7 would then end with "Changed in version 2.6: return Match instead of tuple."

Docstring and doc for .get_matching_blocks():

See code snippet above for beginning of text. Unlike .find_longest_match, there is no mention of the changed return.

In 2.7, it is a list of Match triples.
In 3.x, it is an iterable (Map) of Match triples, because of the change in map() return.

For the latter reason, the example in the 3.x doc must be changed to
>>> list(s.get_matching_blocks())

The docstring was already changed to pass doctest. The untested doc was not.

I am not sure how to properly document the use of a namedtuple in the stdlib. Raymond, what do you think?
msg138804 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-06-21 22:15
I'll take a look at this when I get a chance (est. two weeks).
msg139179 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-06-26 14:17
Go head an patch the first self.matching_blocks to also return a named tuple.  Also, correct any doctests or examples using these.

The docs could also mention that list of namedtuples with fields a, b, and size is returned by get_matching_blocks().
msg184616 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013-03-19 08:06
More doc bugs: unified_diff and context_diff say 'lists of strings' when 'sequences of strings' is correct. Docstrings do say 'sequences'.

In 3.x, SequenceMatcher.get_matching_blocks return a map objects rather than a list. In spite of my original post, the items are tuples in 3.3, as least in one test. Named tuples in 3.4 would be good. I might call it an iterable of (names) triples instead of 'map object' as that is more relevant for the user. I think I should check all arg and return types in code, docstring, and doc for consistency and separately check on and as necessary upgrade namedtuple usage.
msg214035 - (view) Author: A.M. Kuchling (akuchling) * (Python committer) Date: 2014-03-18 22:49
Here's a patch fixing the first return in get_matching_blocks() and updating the docs.

I didn't change get_matching_blocks() to return a list again, assuming that we didn't want to do that.  (Raymond doesn't say to do so, at least.)
History
Date User Action Args
2014-12-31 16:25:04akuchlingsetnosy: - akuchling
2014-03-20 11:59:35rhettingersetnosy: + tim.peters
2014-03-18 22:49:21akuchlingsetfiles: + 12384-patch.txt

nosy: + akuchling
messages: + msg214035

stage: needs patch -> patch review
2013-03-19 08:06:19terry.reedysetpriority: low -> normal

messages: + msg184616
stage: needs patch
2011-06-26 14:17:16rhettingersetpriority: normal -> low
assignee: rhettinger -> terry.reedy
messages: + msg139179
2011-06-21 22:15:25rhettingersetmessages: + msg138804
2011-06-21 20:00:41terry.reedycreate