Issue35955
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2019-02-10 14:11 by jaraco, last changed 2022-04-11 14:59 by admin.
Messages (13) | |||
---|---|---|---|
msg335154 - (view) | Author: Jason R. Coombs (jaraco) * ![]() |
Date: 2019-02-10 14:11 | |
In [this job](https://travis-ci.org/jaraco/cmdix/jobs/491246158), a project is using assertEqual to compare two directory listings that don't match in the group. But the hint markers pointing to the mismatch are pointing at positions that match: E AssertionError: '--w-[50 chars]drwxrwxr-x 2 2000 2000 4096 2019-02-10 14:[58 chars]oo\n' != '--w-[50 chars]drwxr-xr-x 2 2000 2000 4096 2019-02-10 14:[58 chars]oo\n' E --w-r---wx 1 2000 2000 999999 2019-02-10 14:02 bar E - drwxrwxr-x 2 2000 2000 4096 2019-02-10 14:02 biz E ? --- E + drwxr-xr-x 2 2000 2000 4096 2019-02-10 14:02 biz E ? +++ E - -rw-rw-r-- 1 2000 2000 100 2019-02-10 14:02 foo E ? --- E + -rw-r--r-- 1 2000 2000 100 2019-02-10 14:02 foo E ? +++ As you can see, it's the 'group' section of the flags that differ between the left and right comparison, but the hints point at the 'user' section for the left side and the 'world' section for the right side, even though they match. I observed this on Python 3.7.1. I haven't delved deeper to see if the issue exists on 3.7.2 or 3.8. |
|||
msg335155 - (view) | Author: Jason R. Coombs (jaraco) * ![]() |
Date: 2019-02-10 14:14 | |
I should acknowledge that I'm using pytest here also... and pytest may be the engine that's performing the reporting of the failed assertion. In fact, switching to simple assertions, I see the same behavior, so I now suspect the issue may lie with pytest and not unittest. |
|||
msg335156 - (view) | Author: Jason R. Coombs (jaraco) * ![]() |
Date: 2019-02-10 14:24 | |
I was able to replicate the issue using pytest and not unittest, so I've [reported the issue with that project](https://github.com/pytest-dev/pytest/issues/4765). |
|||
msg335158 - (view) | Author: Karthikeyan Singaravelan (xtreak) * ![]() |
Date: 2019-02-10 14:58 | |
Sorry to comment on a closed issue. I see the following behavior with difflib.ndiff which is used under the hood by unittest. The strings that differ by '-' and 'w' generate different output compared to 'a' and 'w'. I find the output for diff using '-' and 'w' little confusing and is this caused due to '-' which is also used as a marker in difflib? $ ./python.exe Python 3.8.0a1+ (heads/master:8a03ff2ff4, Feb 9 2019, 10:42:29) [Clang 7.0.2 (clang-700.1.81)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import difflib >>> print(''.join(difflib.ndiff(["drwxrwxr-x 2 2000 2000\n"], ["drwxr-xr-x 2 2000 2000\n"]))) - drwxrwxr-x 2 2000 2000 ? --- + drwxr-xr-x 2 2000 2000 ? +++ >>> print(''.join(difflib.ndiff(["drwxrwxr-x 2 2000 2000\n"], ["drwxraxr-x 2 2000 2000\n"]))) - drwxrwxr-x 2 2000 2000 ? ^ + drwxraxr-x 2 2000 2000 ? ^ |
|||
msg335231 - (view) | Author: Jason R. Coombs (jaraco) * ![]() |
Date: 2019-02-11 15:57 | |
I'm re-opening this issue as it does seem to apply stdlib (difflib.ndiff), which is why I encountered it both in unittest and pytest. Thanks xtreak for the distilled example. |
|||
msg335235 - (view) | Author: Karthikeyan Singaravelan (xtreak) * ![]() |
Date: 2019-02-11 16:12 | |
I have tried with different places where only '-' and 'w' differ. They seemed to produce correct diff except for this once case where the diff was confusing. |
|||
msg335246 - (view) | Author: Chris Jerdonek (chris.jerdonek) * ![]() |
Date: 2019-02-11 18:20 | |
Is this a duplicate of issue24780? |
|||
msg335247 - (view) | Author: Jason R. Coombs (jaraco) * ![]() |
Date: 2019-02-11 18:25 | |
I don't think so, because the issue happens on a single line diff... although it's plausible there's a common-mode fix. |
|||
msg335248 - (view) | Author: Karthikeyan Singaravelan (xtreak) * ![]() |
Date: 2019-02-11 18:32 | |
I am not sure this is a duplicate since the other issue was about newline at the end of strings. This is about the diff being little irrelevant even with newline in the end for strings. Sample program where change in 5th character gives the reported diff. import difflib for i in range(7): print(f"Change character at {i}") a = list("drwxrwxr-x 2 2000 2000\n") b = "drwxrwxr-x 2 2000 2000\n" a[i] = '-' a = ''.join(a) print(''.join(difflib.ndiff([a], [b]))) Change character at 0 - -rwxrwxr-x 2 2000 2000 ? ^ + drwxrwxr-x 2 2000 2000 ? ^ Change character at 1 - d-wxrwxr-x 2 2000 2000 ? ^ + drwxrwxr-x 2 2000 2000 ? ^ Change character at 2 - dr-xrwxr-x 2 2000 2000 ? ^ + drwxrwxr-x 2 2000 2000 ? ^ Change character at 3 - drw-rwxr-x 2 2000 2000 ? ^ + drwxrwxr-x 2 2000 2000 ? ^ Change character at 4 - drwx-wxr-x 2 2000 2000 ? ^ + drwxrwxr-x 2 2000 2000 ? ^ Change character at 5 - drwxr-xr-x 2 2000 2000 ? --- + drwxrwxr-x 2 2000 2000 ? +++ Change character at 6 - drwxrw-r-x 2 2000 2000 ? ^ + drwxrwxr-x 2 2000 2000 ? ^ |
|||
msg335252 - (view) | Author: Tim Peters (tim.peters) * ![]() |
Date: 2019-02-11 18:46 | |
difflib generally synchs on the longest contiguous matching subsequence that doesn't contain a "junk" element. By default, `ndiff()`'s optional `charjunk` argument considers blanks and tabs to be junk characters. In the strings: "drwxrwxr-x 2 2000 2000\n" "drwxr-xr-x 2 2000 2000\n" the longest matching substring not containing whitespace is "rwxr-x", of length 6, starting at index 4 in the first string and at index 1 in the second. So it's aligning the strings like so: "drwxrwxr-x 2 2000 2000\n" "drwxr-xr-x 2 2000 2000\n" 123456 That's why it wants to delete the 1:4 slice in the first string and insert "r-x" after the longest matching substring. The default is aimed at improving results for human-readable text, like prose and Python code, where stuff between whitespace is often read "as a whole" (words, keywords, identifiers, ...). For cases like this one, where character-by-character differences are important, it's often better to pass `charjunk=None`. Then the longest matching substring is "xr-x 2 2000 2000" at the tail end of both strings, and you get the output you're expecting. |
|||
msg335257 - (view) | Author: Karthikeyan Singaravelan (xtreak) * ![]() |
Date: 2019-02-11 19:04 | |
Thanks for the explanation. This seems to give the desired diff with charjunk=None passed to multiline string comparison helper. I am not sure how useful it would be to pass it to sequence and dict comparison that also use ndiff. I can open a PR if it's okay with the set of strings in the report as a test case. There are no test case failures in existing unittest folder test suite so this seems like a safe change to me. # With patch charjunk=None ./python.exe ../backups/bpo35955_1.py F ====================================================================== FAIL: test_foo (__main__.FooTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "../backups/bpo35955_1.py", line 6, in test_foo self.assertEqual("drwxrwxr-x 2 2000 2000\n", "drwxr-xr-x 2 2000 2000\n") AssertionError: 'drwxrwxr-x 2 2000 2000\n' != 'drwxr-xr-x 2 2000 2000\n' - drwxrwxr-x 2 2000 2000 ? ^ + drwxr-xr-x 2 2000 2000 ? ^ ---------------------------------------------------------------------- Ran 1 test in 0.003s FAILED (failures=1) # Without patch ➜ cpython git:(master) ✗ python3.7 ../backups/bpo35955_1.py F ====================================================================== FAIL: test_foo (__main__.FooTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "../backups/bpo35955_1.py", line 6, in test_foo self.assertEqual("drwxrwxr-x 2 2000 2000\n", "drwxr-xr-x 2 2000 2000\n") AssertionError: 'drwxrwxr-x 2 2000 2000\n' != 'drwxr-xr-x 2 2000 2000\n' - drwxrwxr-x 2 2000 2000 ? --- + drwxr-xr-x 2 2000 2000 ? +++ ---------------------------------------------------------------------- Ran 1 test in 0.002s FAILED (failures=1) |
|||
msg335268 - (view) | Author: Jason R. Coombs (jaraco) * ![]() |
Date: 2019-02-11 20:05 | |
Nice insight Tim. |
|||
msg335269 - (view) | Author: Tim Peters (tim.peters) * ![]() |
Date: 2019-02-11 20:07 | |
It's probably OK, but there's no "pure win" to be had here. There's generally more than one way to convert one string to another, and what "looks right" to humans depends a whole lot on context. For example, consider these strings: "private Thread currentThread;" "private volatile Thread currentThread;" "It's obvious" someone inserted "volatile" into the first string, and that's what ndiff's default says: - private Thread currentThread; + private volatile Thread currentThread; ? +++++++++ However, pass `charjunk=None` instead, and ndiff claims someone inserted "e volatil" after the "t" in "private": - private Thread currentThread; + private volatile Thread currentThread; ? +++++++++ Which is also a correct way, but - to human eyes - an insane way ;-) |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:11 | admin | set | github: 80136 |
2019-02-11 20:07:41 | tim.peters | set | messages: + msg335269 |
2019-02-11 20:05:52 | jaraco | set | messages: + msg335268 |
2019-02-11 19:04:34 | xtreak | set | messages: + msg335257 |
2019-02-11 18:46:15 | tim.peters | set | messages: + msg335252 |
2019-02-11 18:32:16 | xtreak | set | messages: + msg335248 |
2019-02-11 18:25:41 | jaraco | set | messages: + msg335247 |
2019-02-11 18:20:56 | chris.jerdonek | set | nosy:
+ chris.jerdonek messages: + msg335246 |
2019-02-11 16:12:19 | xtreak | set | versions:
+ Python 2.7, Python 3.7, Python 3.8 nosy: + tim.peters messages: + msg335235 type: behavior |
2019-02-11 15:57:44 | jaraco | set | status: closed -> open title: unittest assertEqual reports incorrect location of mismatch -> difflib reports incorrect location of mismatch messages: + msg335231 resolution: third party -> stage: resolved -> |
2019-02-10 14:58:26 | xtreak | set | nosy:
+ xtreak messages: + msg335158 |
2019-02-10 14:24:56 | jaraco | set | messages: + msg335156 |
2019-02-10 14:14:19 | jaraco | set | status: open -> closed resolution: third party stage: resolved |
2019-02-10 14:14:00 | jaraco | set | messages: + msg335155 |
2019-02-10 14:11:58 | jaraco | create |