This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author tim.peters
Recipients chris.jerdonek, jaraco, tim.peters, xtreak
Date 2019-02-11.18:46:15
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1549910775.37.0.578936082231.issue35955@roundup.psfhosted.org>
In-reply-to
Content
difflib generally synchs on the longest contiguous matching subsequence that doesn't contain a "junk" element.  By default, `ndiff()`'s optional `charjunk` argument considers blanks and tabs to be junk characters.

In the strings:

"drwxrwxr-x 2 2000  2000\n"
"drwxr-xr-x 2 2000  2000\n"

the longest matching substring not containing whitespace is "rwxr-x", of length 6, starting at index 4 in the first string and at index 1 in the second.  So it's aligning the strings like so:

"drwxrwxr-x 2 2000  2000\n"
   "drwxr-xr-x 2 2000  2000\n"
     123456

That's why it wants to delete the 1:4 slice in the first string and insert "r-x" after the longest matching substring.

The default is aimed at improving results for human-readable text, like prose and Python code, where stuff between whitespace is often read "as a whole" (words, keywords, identifiers, ...).

For cases like this one, where character-by-character differences are important, it's often better to pass `charjunk=None`.  Then the longest matching substring is "xr-x 2 2000  2000" at the tail end of both strings, and you get the output you're expecting.
History
Date User Action Args
2019-02-11 18:46:17tim.peterssetrecipients: + tim.peters, jaraco, chris.jerdonek, xtreak
2019-02-11 18:46:15tim.peterssetmessageid: <1549910775.37.0.578936082231.issue35955@roundup.psfhosted.org>
2019-02-11 18:46:15tim.peterslinkissue35955 messages
2019-02-11 18:46:15tim.peterscreate