classification
Title: Calling assertEquals for moderately long list takes too long
Type: resource usage Stage: patch review
Components: Library (Lib) Versions: Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: ezio.melotti Nosy List: Ankur.Ankan, Chris AtLee, Elena.Oat, Eric Lafontaine, Jacek.Bzdak, Puneeth.Chaganti, ankurankan, ezio.melotti, gregory.p.smith, haypo, levkivskyi, michael.foord, nnja, paul.moore, pitrou, rbcollins, serhiy.storchaka, steve.dower, tim.golden, zach.ware
Priority: high Keywords: easy, needs review, patch

Created on 2013-10-10 13:04 by Jacek.Bzdak, last changed 2017-02-08 08:36 by haypo.

Files
File name Uploaded Description Edit
unittest_scse.py Jacek.Bzdak, 2013-10-10 13:04 SCSE
issue19217.diff ezio.melotti, 2013-10-14 01:30 review
issue19217-1.diff Puneeth.Chaganti, 2014-07-28 21:44 review
issue19217-2.diff Elena.Oat, 2014-08-06 19:34 review
issue19217-3.diff ezio.melotti, 2014-08-07 03:42 review
issue19217-alt.diff serhiy.storchaka, 2014-08-10 08:14 review
issue19217-profile.txt rbcollins, 2014-10-24 03:24 profile output of the test case
cpython-issue19217.diff Chris AtLee, 2016-02-05 16:51 Use difflib.unified_diff instead of ndiff review
unittest_unified_diff.patch haypo, 2017-02-08 08:36 review
unified_diff.py haypo, 2017-02-08 08:36
Messages (23)
msg199383 - (view) Author: Jacek Bzdak (Jacek.Bzdak) Date: 2013-10-10 13:04
Call to assertEquals(list1, list2) does not finish (takes more than couple of minutes), for lists that containt 10000 elements if all list elements are different. The same call in python2.6 finishes instanteneously. 

This occours even if error message is truncated using maxDiff.
msg199384 - (view) Author: Jacek Bzdak (Jacek.Bzdak) Date: 2013-10-10 13:05
I have attached a simple test case.
msg199466 - (view) Author: Michael Foord (michael.foord) * (Python committer) Date: 2013-10-11 12:11
Ouch. Looking.
msg199833 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2013-10-14 01:30
Here's a proof of concept to fix the issue (still lacks tests and breaks one existing test).

The problem is that assertSequenceEqual computes the diff regardless of the size of the sequences, and then truncates it if it's too long.  Producing the diff takes time, so checking the size of the sequences and skip the diff if they are too long solves the problem.

In the patch I used an hardcoded and arbitrary value of 30 elements -- the other attributes like maxDiff and _diffThreshold can't be used here.
The value is so low because if the sequence doesn't fit in one line, the output will list one element per line, resulting in 30+ lines of output which is not very readable IMHO.  Also note that regardless of the this diff, the error message indicates what is the position of the first element that differs.
The threshold value I picked could be moved to a _seqDiffThreshold (and possibly _diffThreshold can be renamed to _strDiffThreshold).

This issue is related to #11763.  All other methods should be checked for similar issues.
msg199855 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-10-14 09:04
Perhaps a combination of len(pprint.pformat(seq1).splitlines()) and len(pprint.pformat(seq2).splitlines()) (minimum, maximum or production) is better metric?
msg200770 - (view) Author: Jacek Bzdak (Jacek.Bzdak) Date: 2013-10-21 12:50
The patch works for me, thanks Ezio!
msg200827 - (view) Author: Michael Foord (michael.foord) * (Python committer) Date: 2013-10-21 18:51
pprint is also likely to have performance issues. I agree with Ezio that a diff consisting of more than 30(x2) lines is not likely to be directly useful anyway. 

A test for the changed behaviour would be nice.
msg216480 - (view) Author: Nina Zakharenko (nnja) Date: 2014-04-16 15:57
The cause of this has been identified as a bug in libdiff.compare(). See issue 21253 for more information.
msg217426 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-04-28 22:13
Ezio, do you intend to add a test to your patch?
msg217563 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2014-04-30 00:55
I don't think I will have time for this for a while, so if anyone wants to provide tests they are welcomed to do it.
The related #11763 already has some tests, so those might be adapted for this issue.
msg224193 - (view) Author: Puneeth Chaganti (Puneeth.Chaganti) Date: 2014-07-28 21:44
Hi, 

The attached patch is an attempt to write tests for this issue, and get all the tests passing.  

Since a new threshold has been introduced, `maxDiff` doesn't always make a difference and this required changing the `testAssertSequenceEqualMaxDiff` test. Serhiy Storchaka suggestion makes sense, since we want to be showing the diff when it is smaller than the error message being shown, but I'm not sure what's the best way to do this.
msg224965 - (view) Author: Elena Oat (Elena.Oat) * Date: 2014-08-06 19:34
Attached is the patch that tries to solve the issue for the strings, tuples, lists and dictionaries. Tuples, lists and dictionaries use the same value for the threshold. There's a helper method in the tests that is generic to all mentioned types. 
This issue was discussed during the Helsinki Python sprint.
msg224983 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2014-08-07 03:42
Thanks Puneeth for the initial tests and Elena for expanding the fix and the tests to cover assertDictEqual too.
I reworked a bit Elena's patch to make the checks in case.py more consistent and simplified the tests.

With the attached patch, unittest_scse.py works fine, producing this output:
FF
======================================================================
FAIL: test_compare (__main__.SCSE)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "unittest_scse.py", line 14, in test_compare
    self.assertEqual(arr1, arr2)
AssertionError: Lists differ: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,[29953 chars]1, 1] != [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,[29953 chars]2, 2]

First differing element 0:
1
2


======================================================================
FAIL: test_compare_2 (__main__.SCSE)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "unittest_scse.py", line 24, in test_compare_2
    self.assertEqual(arr1, arr2)
AssertionError: Lists differ: [1, 1[29932 chars] 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] != [1, 1[29932 chars] 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2]

First differing element 9999:
1
2


----------------------------------------------------------------------
Ran 2 tests in 0.010s

FAILED (failures=2)
msg225126 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-08-10 08:14
Here is alternative patch. It outputs line diff even for large sequences/dicts/strings.

With the attached patch, unittest_scse.py works fine, producing this output:

FF
======================================================================
FAIL: test_compare (__main__.SCSE)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "unittest_scse.py", line 14, in test_compare
    self.assertEqual(arr1, arr2)
AssertionError: Lists differ: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,[29953 chars]1, 1] != [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,[29953 chars]2, 2]

First differing element 0:
1
2

- [1,
-  1,
-  1,
-  1,
-  1,
-  1,
-  1,
-  1,
-  1,
-  1,
-  1,
-  1,
-  1,
-  1,
-  1,
-  1,
-  1,
-  1,
-  1,
-  1,
-  1,
-  1,
-  1,
-  1,
-  1,
-  1,
-  1,
-  1,
-  1,
-  1,
-  1,
-  1,
+ [2,
+  2,
+  2,
+  2,
+  2,
+  2,
+  2,
+  2,
+  2,
+  2,
+  2,
+  2,
+  2,
+  2,
+  2,
+  2,
+  2,
+  2,
+  2,
+  2,
+  2,
+  2,
+  2,
+  2,
+  2,
+  2,
+  2,
+  2,
+  2,
+  2,
+  2,
+  2,
...
*** Diff is truncated.  Set self.maxDiffItems to None to see it all.

======================================================================
FAIL: test_compare_2 (__main__.SCSE)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "unittest_scse.py", line 24, in test_compare_2
    self.assertEqual(arr1, arr2)
AssertionError: Lists differ: [1, 1[29932 chars] 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] != [1, 1[29932 chars] 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2]

First differing element 9999:
1
2

-  1]
+  2]

----------------------------------------------------------------------
Ran 2 tests in 1.813s

FAILED (failures=2)
msg225130 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-08-10 09:17
Thanks for your comments Ezio. Yes, this patch is rather only a demo. Sorry, but I'm not very interesting in completing this patch (at least not right now). I only want to see more detailed error reports.

There are some ideas about improving diffs reporting. First, we can iterative decrease slices size until diff size become less maxDiff. And output truncated diff instead of omitting it at all. Second, we can extend difflib by functions which allow to limit a number of reported differences.
msg225132 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2014-08-10 09:30
I thought some more about this, and I think we can do better.  
Since _diffThreshold only affects strings, the goal of this issue is to extend the check to other types.  Instead of doing this by adding more attributes and behaviors, I would like to keep things as simple and consistent as possible (since they are already quite complicated), and possibly making them even simpler.

I'll try to summarize the current situation and propose some possible solutions to achieve this goal.

Currently we have:
* _diffThreshold (used to avoid computing the diff, only affects strings);
* maxDiff (used to "hide" diffs that are too big, after they have been computed);

These two attributes deal with two different but related problems:
* _diffThreshold tries to avoid /calculating/ expensive diffs;
* maxDiff tries to avoid /printing/ big and unreadable diffs;

These are different because one acts before and the other after, but they are also related because the threshold directly affects the size of the diff.

By picking some ideas from Serhiy comments and patch, I think we should:
1) try to have a single threshold for all types, and use line-based counting for strings (so if the threshold is 32, this means 32 elements in a list, 32 items in a dict, 32 lines in a string);
2) make this threshold public, so that users can control this behavior (even though we should be careful about what we add -- we can always keep it private and expose it later);
3) factor out the diff computation/truncation function (like Serhiy did), and use it consistently, so that we can get rid of _truncateMessage and simplify the formatting.
4) if possible, try to unify the threshold and the maxDiff -- if the threshold is low the diff will be small, if it's high/None the diff will be big;

Now, doing 1) shouldn't be an issue, 2) is also doable if we agree on it, and 3) is also not a problem since these are internal functions.  If we decide to go for 4) as well there are at least two options:
a) repurpose maxDiff to indicate the number of items/lines;
b) introduce a new attribute (e.g. maxDiffItems) and deprecate maxDiff;

Option a) might be doable, and even if it introduces a change in behavior it might be acceptable since it affects the output of the messages in case of failure, and I don't think anyone is relying on an exact output (also because tests shouldn't be failing).  Moreover, the most common usage of maxDiff is setting it to None, and having the threshold to None means that the full diff will be computed and printed, leaving the behavior unchanged.

Thoughts?


> Second, we can extend difflib by functions which allow to limit a number
> of reported differences.

I was thinking about having a lazy, generator-based diffing, so we could generate as many diff lines we need and stop once we reach the threshold.  I'm not sure if difflib already has something similar but anyway this is a separate issue.  Also, if we unify maxDiff and the threshold we won't need this, since all that is computer is also printed.
msg225134 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-08-10 09:53
> 1) try to have a single threshold for all types, and use line-based counting
> for strings (so if the threshold is 32, this means 32 elements in a list,
> 32 items in a dict, 32 lines in a string);

You forgot about strings with few but very long lines. We should hide or 
truncate too long lines, and this is not trivial issue. Actually we should 
more control on difflib's machinery and use something like _common_shorten_repr 
to appropriate truncate similar lines.

> Option a) might be doable, and even if it introduces a change in behavior it
> might be acceptable since it affects the output of the messages in case of
> failure, and I don't think anyone is relying on an exact output (also
> because tests shouldn't be failing).  Moreover, the most common usage of
> maxDiff is setting it to None, and having the threshold to None means that
> the full diff will be computed and printed, leaving the behavior unchanged.

This is too much for bug fix. We should fix this issue (do not calculate diffs 
between too long sequences) and preserve as much details as possible. Omitting 
the diff at all when it is outputted with current code (but very slowly) is a 
regression. It would be better to output truncated diff.

Then we can refactor and improve diffs reporting in other issues.
msg229858 - (view) Author: Robert Collins (rbcollins) * (Python committer) Date: 2014-10-23 07:13
A few thoughts.

Adding a new public symbol seems inappropriate here: this is a performance issue that is well predictable and we should cater for that (given difflibs current performance).

I'll note in passing that both bzr and hg have much higher performance difference algorithms that we could pick up and includes as a replacement SequenceMatcher, which might significantly reduce the threshold at which we need to default-cap things - but such a threshold will still exist.

I totally agree that _diffThreshold should apply to non-string sequences - anything where we're going to hit high-order complexity outputting the difference. That said, I speculate that perhaps we'd be better off outputting both objects in some structured fashion and letting a later process render them (for things like CI systems and test databases, where fidelity of reproduction is more important than having the output fit on one screen. This is a different issue though and something we should revisit later.

That suggests to me though that the largest diff we output should be chosen based on the textual representation of the diff - we're doing it for human readability. Whereas the threshold for calculating a diff at all should be based on performance. It can be very expensive to calculate a diff on large sequences, but the diff might be much much larger than the sequence length indicates [because each item in the sequence may be very large]. Perhaps thats over thinking it?

Anyhow- short term, surely just making the threshold apply to any sequenced type is sufficient to fix the bug?
msg229910 - (view) Author: Robert Collins (rbcollins) * (Python committer) Date: 2014-10-24 03:24
Oh, I got a profile from the test case for my own interest.

6615 seconds ..

some highlights that jumped out at me

    20001    0.127    0.000 6610.025    0.330 difflib.py:868(compare)

which means we're basically using ndiff, which is cubic rather than quadratic - the same performance issue the html module had.

I think we'll probably make a better tradeoff by using unified_diff instead which is quadratic.
msg259674 - (view) Author: Chris AtLee (Chris AtLee) Date: 2016-02-05 16:48
Switching this to unified_diff allows the test case to finish nearly instantly.

The downside is that the output is changed in the case of a difference found:

FF
======================================================================
FAIL: test_compare (__main__.SCSE)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_case.py", line 14, in test_compare
    self.assertEqual(arr1, arr2)
AssertionError: Lists differ: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,[29953 chars]1, 1] != [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,[29953 chars]2, 2]

First differing element 0:
1
2

Diff is 100034 characters long. Set self.maxDiff to None to see it.

======================================================================
FAIL: test_compare_2 (__main__.SCSE)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_case.py", line 24, in test_compare_2
    self.assertEqual(arr1, arr2)
AssertionError: Lists differ: [1, 1[29932 chars] 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] != [1, 1[29932 chars] 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2]

First differing element 9999:
1
2

--- 
+++ 
@@ -9997,4 +9997,4 @@
  1,
  1,
  1,
- 1]
+ 2]

----------------------------------------------------------------------
Ran 2 tests in 0.098s

FAILED (failures=2)
------------------------------------------------------------
msg261712 - (view) Author: Robert Collins (rbcollins) * (Python committer) Date: 2016-03-14 02:17
The new output seems ok to me?
msg287264 - (view) Author: Eric Lafontaine (Eric Lafontaine) * Date: 2017-02-07 23:26
Hi all,

What do we do with this ticket?

Patch were provided, People have agreed (I think).  So what's missing to close it (or pass to the next step)?  It's going to be a year that a high priority ticket has no update and I would like to accelerate it if I can :).  Let me know what I can do.

Regards,
Eric Lafontaine
msg287284 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-02-08 08:36
unittest_unified_diff.patch: Rebased patch for the default branch. My patch updates also unit tests.

The patch changes the test output. If we decide to apply the patch, I propose to only apply it to the default branch (Python 3.7).

The bug report is about a test which fails. I'm not sure that it's a real blocker issue that Python is slow when a test fails. At least, it should be fast when a test pass. I mean that I like the current output, I'm not sure about the new output.

Example with attached unified_diff.py.

Before:
@@@@@@@@@@@@@@@@@
F
======================================================================
FAIL: test_x (__main__.Test)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "unified_diff.py", line 5, in test_x
    self.assertEqual([], [None])
AssertionError: Lists differ: [] != [None]

Second list contains 1 additional elements.
First extra element 0:
None

- []
+ [None]

----------------------------------------------------------------------
Ran 1 test in 0.001s

FAILED (failures=1)
@@@@@@@@@@@@@@@@@

With the patch:
@@@@@@@@@@@@@@@@@
haypo@selma$ ./python unified_diff.py 
F
======================================================================
FAIL: test_x (__main__.Test)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "unified_diff.py", line 5, in test_x
    self.assertEqual([], [None])
AssertionError: Lists differ: [] != [None]

Second list contains 1 additional elements.
First extra element 0:
None

--- 
+++ 
@@ -1 +1 @@
-[]
+[None]

----------------------------------------------------------------------
Ran 1 test in 0.001s

FAILED (failures=1)
@@@@@@@@@@@@@@@@@


The patch adds the following header which can be suprising:
@@@@@@@@@@@@@@@@@
--- 
+++ 
@@ -1 +1 @@
@@@@@@@@@@@@@@@@@


Maybe we should pass a "file name" to unified_diff() to get something like:
@@@@@@@@@@@@@@@@@
--- expected
+++ got
@@ -1 +1 @@
@@@@@@@@@@@@@@@@@
History
Date User Action Args
2017-02-08 08:36:59hayposetfiles: + unified_diff.py
2017-02-08 08:36:47hayposetfiles: + unittest_unified_diff.patch

messages: + msg287284
2017-02-08 00:02:46ezio.melottisetkeywords: + needs review
2017-02-07 23:26:36Eric Lafontainesetnosy: + Eric Lafontaine
messages: + msg287264
2016-04-12 18:54:43zach.waresethgrepos: - hgrepo338
2016-04-12 16:28:39berker.peksagsetmessages: - msg263252
2016-04-12 16:28:29berker.peksagsetnosy: - supriyanto maftuh, supriyantomaftuh

components: - Build, Documentation, Unicode, Windows, XML
versions: - Python 2.7, Python 3.2, Python 3.3, Python 3.4, Python 3.5
2016-04-12 15:43:36supriyantomaftuhsetversions: + Python 3.2, Python 3.3, Python 3.6
nosy: + supriyanto maftuh, tim.golden, supriyantomaftuh, zach.ware, steve.dower, paul.moore

messages: + msg263252

hgrepos: + hgrepo338
components: + Build, Documentation, Unicode, Windows, XML
2016-03-14 02:17:46rbcollinssetmessages: + msg261712
2016-02-05 16:51:58Chris AtLeesetfiles: + cpython-issue19217.diff
2016-02-05 16:48:15Chris AtLeesetnosy: + Chris AtLee
messages: + msg259674
2015-11-19 02:41:28gregory.p.smithsetnosy: + gregory.p.smith
2015-06-29 12:10:13levkivskyisetnosy: + levkivskyi
2014-10-24 03:24:29rbcollinssetfiles: + issue19217-profile.txt

messages: + msg229910
2014-10-23 07:13:23rbcollinssetnosy: + rbcollins
messages: + msg229858
2014-08-10 09:53:51serhiy.storchakasetmessages: + msg225134
2014-08-10 09:30:02ezio.melottisetmessages: + msg225132
2014-08-10 09:17:56serhiy.storchakasetmessages: + msg225130
2014-08-10 08:14:47serhiy.storchakasetfiles: + issue19217-alt.diff

stage: commit review -> patch review
messages: + msg225126
versions: + Python 3.5, - Python 3.3
2014-08-07 03:42:09ezio.melottisetfiles: + issue19217-3.diff

messages: + msg224983
stage: test needed -> commit review
2014-08-06 19:34:04Elena.Oatsetfiles: + issue19217-2.diff
nosy: + Elena.Oat
messages: + msg224965

2014-07-28 21:44:05Puneeth.Chagantisetfiles: + issue19217-1.diff
nosy: + Puneeth.Chaganti
messages: + msg224193

2014-05-19 14:33:15Ankur.Ankansetnosy: + Ankur.Ankan
2014-05-19 14:26:29ankurankansetnosy: + ankurankan
2014-04-30 00:55:59ezio.melottisetkeywords: + easy

messages: + msg217563
stage: patch review -> test needed
2014-04-28 22:13:22pitrousetnosy: + pitrou
messages: + msg217426
2014-04-16 15:57:05nnjasetnosy: + nnja
messages: + msg216480
2013-10-21 18:51:10michael.foordsetmessages: + msg200827
2013-10-21 12:50:42Jacek.Bzdaksetmessages: + msg200770
2013-10-21 05:50:17rhettingersetpriority: normal -> high
2013-10-14 09:04:44serhiy.storchakasetmessages: + msg199855
2013-10-14 01:30:31ezio.melottisetfiles: + issue19217.diff
versions: + Python 2.7
messages: + msg199833

assignee: michael.foord -> ezio.melotti
keywords: + patch
stage: needs patch -> patch review
2013-10-11 12:19:09hayposetnosy: + haypo
2013-10-11 12:11:52michael.foordsetmessages: + msg199466
2013-10-11 04:54:43rhettingersetassignee: michael.foord
2013-10-10 14:30:27serhiy.storchakasetnosy: + serhiy.storchaka
2013-10-10 13:29:28ezio.melottisetnosy: + ezio.melotti, michael.foord
stage: needs patch

versions: + Python 3.4, - Python 3.1, Python 3.2
2013-10-10 13:05:33Jacek.Bzdaksetmessages: + msg199384
2013-10-10 13:04:46Jacek.Bzdakcreate