diff --git a/Doc/library/difflib.rst b/Doc/library/difflib.rst --- a/Doc/library/difflib.rst +++ b/Doc/library/difflib.rst @@ -27,7 +27,9 @@ little fancier than, an algorithm published in the late 1980's by Ratcliff and Obershelp under the hyperbolic name "gestalt pattern matching." The idea is to find the longest contiguous matching subsequence that contains no "junk" - elements (the Ratcliff and Obershelp algorithm doesn't address junk). The same + elements; these "junk" elements are ones that are uninteresting in some + sense, such as blank lines or whitespace. (Handling junk is an + extension to the Ratcliff and Obershelp algorithm.) The same idea is then applied recursively to the pieces of the sequences to the left and to the right of the matching subsequence. This does not yield minimal edit sequences, but does tend to yield matches that "look right" to people. @@ -210,7 +212,7 @@ Compare *a* and *b* (lists of strings); return a :class:`Differ`\ -style delta (a :term:`generator` generating the delta lines). - Optional keyword parameters *linejunk* and *charjunk* are for filter functions + Optional keyword parameters *linejunk* and *charjunk* are filtering functions (or ``None``): *linejunk*: A function that accepts a single string argument, and returns @@ -224,7 +226,7 @@ *charjunk*: A function that accepts a character (a string of length 1), and returns if the character is junk, or false if not. The default is module-level function :func:`IS_CHARACTER_JUNK`, which filters out whitespace characters (a - blank or tab; note: bad idea to include newline in this!). + blank or tab; it's a bad idea to include newline in this!). :file:`Tools/scripts/ndiff.py` is a command-line front-end to this function. @@ -624,6 +626,12 @@ length 1), and returns true if the character is junk. The default is ``None``, meaning that no character is considered junk. + These junk-filtering functions speed up matching to find + differences, not to mask differences, and they do not cause any + lines or characters to be ignored. Read the description of the + :meth:`~SequenceMatcher.find_longest_match` method's *isjunk* parameter for an + explanation. + :class:`Differ` objects are used (deltas generated) via a single method: diff --git a/Lib/difflib.py b/Lib/difflib.py --- a/Lib/difflib.py +++ b/Lib/difflib.py @@ -854,10 +854,9 @@ and return true iff the string is junk. The module-level function `IS_LINE_JUNK` may be used to filter out lines without visible characters, except for at most one splat ('#'). It is recommended - to leave linejunk None; as of Python 2.3, the underlying - SequenceMatcher class has grown an adaptive notion of "noise" lines - that's better than any static definition the author has ever been - able to craft. + to leave linejunk None; the underlying SequenceMatcher class has + an adaptive notion of "noise" lines that's better than any static + definition the author has ever been able to craft. - `charjunk`: A function that should accept a string of length 1. The module-level function `IS_CHARACTER_JUNK` may be used to filter out @@ -1300,17 +1299,18 @@ Compare `a` and `b` (lists of strings); return a `Differ`-style delta. Optional keyword parameters `linejunk` and `charjunk` are for filter - functions (or None): + functions, or can be None: - - linejunk: A function that should accept a single string argument, and + - linejunk: A function that should accept a single string argument and return true iff the string is junk. The default is None, and is - recommended; as of Python 2.3, an adaptive notion of "noise" lines is - used that does a good job on its own. + recommended; the underlying SequenceMatcher class has an adaptive + notion of "noise" lines. - - charjunk: A function that should accept a string of length 1. The - default is module-level function IS_CHARACTER_JUNK, which filters out - whitespace characters (a blank or tab; note: bad idea to include newline - in this!). + - charjunk: A function that accepts a character (string of length + 1), and returns true iff the character is junk. The default is + the module-level function IS_CHARACTER_JUNK, which filters out + whitespace characters (a blank or tab; note: it's a bad idea to + include newline in this!). Tools/scripts/ndiff.py is a command-line front-end to this function. @@ -1681,7 +1681,7 @@ tabsize -- tab stop spacing, defaults to 8. wrapcolumn -- column number where lines are broken and wrapped, defaults to None where lines are not wrapped. - linejunk,charjunk -- keyword arguments passed into ndiff() (used to by + linejunk,charjunk -- keyword arguments passed into ndiff() (used by HtmlDiff() to generate the side by side HTML differences). See ndiff() documentation for argument default values and descriptions. """