Message335252
difflib generally synchs on the longest contiguous matching subsequence that doesn't contain a "junk" element. By default, `ndiff()`'s optional `charjunk` argument considers blanks and tabs to be junk characters.
In the strings:
"drwxrwxr-x 2 2000 2000\n"
"drwxr-xr-x 2 2000 2000\n"
the longest matching substring not containing whitespace is "rwxr-x", of length 6, starting at index 4 in the first string and at index 1 in the second. So it's aligning the strings like so:
"drwxrwxr-x 2 2000 2000\n"
"drwxr-xr-x 2 2000 2000\n"
123456
That's why it wants to delete the 1:4 slice in the first string and insert "r-x" after the longest matching substring.
The default is aimed at improving results for human-readable text, like prose and Python code, where stuff between whitespace is often read "as a whole" (words, keywords, identifiers, ...).
For cases like this one, where character-by-character differences are important, it's often better to pass `charjunk=None`. Then the longest matching substring is "xr-x 2 2000 2000" at the tail end of both strings, and you get the output you're expecting. |
|
Date |
User |
Action |
Args |
2019-02-11 18:46:17 | tim.peters | set | recipients:
+ tim.peters, jaraco, chris.jerdonek, xtreak |
2019-02-11 18:46:15 | tim.peters | set | messageid: <1549910775.37.0.578936082231.issue35955@roundup.psfhosted.org> |
2019-02-11 18:46:15 | tim.peters | link | issue35955 messages |
2019-02-11 18:46:15 | tim.peters | create | |
|