classification
Title: unittest's assertItemsEqual() method makes too many assumptions about its input
Type: Stage: resolved
Components: Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: michael.foord Nosy List: MarkRoddy, ezio.melotti, flox, georg.brandl, gregory.p.smith, michael.foord, python-dev, rhettinger
Priority: low Keywords: patch

Created on 2010-10-30 05:26 by rhettinger, last changed 2013-04-29 08:42 by flox. This issue is now closed.

Files
File name Uploaded Description Edit
py3k.10242.patch MarkRoddy, 2010-11-21 00:38 Patch against py3k with Raymond's suggested fix review
py27.10242.patch MarkRoddy, 2010-11-21 00:41 Patch against release27-maint branch with Raymond's suggested fix review
nice_output.diff rhettinger, 2010-11-27 10:13 Alternate output format review
Messages (21)
msg119960 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-10-30 05:26
class T(unittest.TestCase):
    def test_items_equal(self):
        # this test fails, but should succeed
        a = [{2,4}, {1,2}]
        b = a[::-1]
        self.assertItemsEqual(a, b)

This method has a fast-path using sorted() and a slow-path that doesn't care about cross-type comparisons.  The fast-path is incorrect though and gets the wrong answer in the above test case.
msg120098 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-10-31 23:21
I'll fix this in 2.7.

For 3.2, may remove the method entirely (for the reasons discussed on python-dev).
msg120104 - (view) Author: Michael Foord (michael.foord) * (Python committer) Date: 2010-11-01 02:56
As this has been released in 2.7 (and unittest2) I don't think it can be just removed in 3.2 - it would make porting code from Python 2 to 3 more painful. Very happy for you to fix in Python 2.7. Please let me know when it goes in so that I can keep unittest2 up to date.
msg120199 - (view) Author: Michael Foord (michael.foord) * (Python committer) Date: 2010-11-02 01:40
On Python-dev Nick Coghlan suggests a fix for the fast-path that would allow us to keep it whilst fixing this bug (not tested yet but adding this note not to lose it). The issue of the name of this method is in 10273.


Looking at assertItemsEqual, I'd be inclined to insert a check that falls back to the "unorderable_list_difference" approach in the case where "expected != sorted(reversed(expected))".

As far as I can tell, that check is a valid partial ordering detector
for any sequence that contains one or more pairs of items for which
LT, EQ and GE are all False:

>>> seq = [{1}, {2}]
>>> seq[0] < seq[1]
False
>>> seq[0] == seq[1]
False
>>> seq[0] > seq[1]
False
>>> sorted(seq)
[{1}, {2}]
>>> sorted(reversed(sorted(seq)))
[{2}, {1}]

Obviously, if the sequence doesn't contain any such items (e.g. all
subsets of each other), then it will look like a total ordering and
use the fast path. I see that as an upside :)
msg120352 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-11-03 23:06
Suggestions:

* new name:  assertCountEqual(a, b)
  or:        assertElementCountEqual(a, b)

  this name captures the essential service:

  - unordered comparison where duplicates matter
  - inputs are "elements", 
    not "items" which means key/value pairs


* O(n) implementation with O(n**2) fallback:

   try:
     a_cnt = collections.Counter(a)
     b_cnt = collections.Counter(b)
   except TypeError:
     # do current O(n**2) fallback
   else:
       if a_cnt == b_cnt:
          # test passed
       else:
          in_a_but_not_in_b = a - b
          in_b_but_not_in_a = b - a
          # display nice diff

* documentation should emphasize the new name:

  assertElementCountEqual(a, b)
  obsolete alias:  assertItemsEqual(a, b)
msg120355 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-11-03 23:33
> I'd be inclined to insert a check that falls back 
> to the "unorderable_list_difference" approach in 
> the case where "expected != sorted(reversed(expected))".

Too fragile and subtle.  The method need to be absolutely straight-forward.
msg120356 - (view) Author: Michael Foord (michael.foord) * (Python committer) Date: 2010-11-03 23:36
assertElementCountEqual is good name and I like your implementation suggestion. I'll put this in. I think the implementation fix can go into 2.7 as well but not the rename/aliasing.
msg120373 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-11-04 02:41
One nuance, it may be better to implement as:

     a_cnt = collections.Counter(iter(a))
     b_cnt = collections.Counter(iter(b))

That will bypass the special handling the Counter constructor has if the  argument is a Mapping.
msg121818 - (view) Author: Mark Roddy (MarkRoddy) * Date: 2010-11-21 00:38
Adding patch for py3k which implements Raymond's suggested fix which utilizes collections.Counter.  

Have not changed the name of the assertion method as this seems as though it may be outside the scope of this issue, but I can produce another patch with the name change if desired.  Though how the old name should be deprecated needs to be addressed.

Note that it may make sense to reconsider the current failure message from this test to take into account the fact that the test could be failing due to the number of entries for a single value differing between the two sequences.  In this case the "Expected, bug missing..." or "Unexpected but present..." messages are still utilized but could be miss leading.

Also, holding off on porting to unittest2 until the above issues have been addressed (or at least confirmed to be non-issues).
msg121819 - (view) Author: Mark Roddy (MarkRoddy) * Date: 2010-11-21 00:41
Adding patch for release27-maint branch which implements Raymond's suggested fix which utilizes collections.Counter.  

Has the same issues addressed with the py3k patch.
msg122511 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-11-27 09:34
Applied in r86828.

The output could still be made nicer, perhaps something along the lines of:

expected 6, got 4: 'wand of fireballs'
expected 2, got 7: 'ring of invisibility'
  . . .
msg122514 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-11-27 10:13
Attaching possible code for nicer output.
msg122782 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2010-11-29 08:51
ISTM that the new name is worse than the old name.  I hadn't followed this issue, heard assertCountEqual the first time today, and couldn't guess what it does.  I'd have assumed that it checks only for equality of the number of items in a sequence, not for equality of the actual items.

I appreciate that it's hard finding good short names, but just because the implementation uses collections.Counter does not mean that Count needs to be in the method name :)
msg123712 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2010-12-10 01:40
I have to agree that the name assertCountEqual does not work well for me as something I can read and really comprehend what it is going to do without searching for the docs or implementation to double check.  (not that assertItemsEqual did either).  'Count' does not strongly imply to me that it is expecting sequences or really tell me what it will be testing.

Brainstorming based on other suggestions i've seen and some i've made up:

  assertCountEqual             [in 3.2beta1]
  assertCountsEqual
  assertElementCountEqual      [michael.foord]
  assertElementCountsEqual
  assertItemCountEqual
  assertItemCountsEqual
  assertItemsEqual             [old, agreed to replace this]

When it comes down to Item vs Element I do like the sound of Element even though it is longer to type.

Should it be singular 'Count' (Dracula?) or plural/possessive 'Counts'?

To me "assertCountEqual" makes me think of the other assertFooEqual methods and wonder what data structure type a "Count" is.  You could argue that calling it assertCounterEqual would make sense in reference to collections.Counter but I do not think that actually ready any more explanatory when reading.

I'm sorry that this is a bikeshed.  But if we're gonna change the paint color, during the beta is a good time.

my problem with assertElementCountEqual is that being singular I could read a statement such as "self.assertElementCountEqual(listA, setB)" and assume that it is the same as "self.assertEqual(len(listA), len(setB))"

assertElementCountsEqual by virtue of the mere 's' implies to me that it is not doing a len(listA) but is instead counting up the individual elementS and comparing those counts.   So after all this rambling, I think that's my vote.

Agree/disagree/indifferent?
msg124360 - (view) Author: Michael Foord (michael.foord) * (Python committer) Date: 2010-12-19 15:53
Improved implementation committed to 2.7 revision 87407. Method name unchanged there.
msg124430 - (view) Author: Michael Foord (michael.foord) * (Python committer) Date: 2010-12-21 12:43
This is committed to 2.7 and 3.2 (using the old name assertItemsEqual in 2.7). As we're well into the beta cycle I don't think we can change the name in 3.2.

The current failure output is very nice for comparing sequences like [1, 2, 3] vs [1, 2, 4]:

AssertionError: Expected, but missing:
    [4]
Unexpected, but present:
    [3]


It is less good for sequences like [2, 2] vs [2, 2, 2]:

AssertionError: Expected, but missing:
    [2]
msg125009 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-01-01 21:52
The improved output format in 3.2 still needs to be backported to 2.7.
msg131028 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2011-03-15 19:41
assertCountEqual has been released in 3.2 as the new name.  close this?
msg131190 - (view) Author: Michael Foord (michael.foord) * (Python committer) Date: 2011-03-16 22:47
I need to check that the implementation has been backported to 2.7 completely / correctly. Raymond made some changes (output format and a bugfix) that may not have been backported yet.
msg131202 - (view) Author: Roundup Robot (python-dev) Date: 2011-03-17 00:34
New changeset d8eaeee1c063 by Michael Foord in branch '2.7':
Issue #10242: backport of more fixes to unittest.TestCase.assertItemsEqual
http://hg.python.org/cpython/rev/d8eaeee1c063
msg131204 - (view) Author: Michael Foord (michael.foord) * (Python committer) Date: 2011-03-17 00:38
Fixes backported to assertItemsEqual in 2.7.
History
Date User Action Args
2013-04-29 08:42:48floxsetnosy: + flox
2011-03-17 05:33:17ezio.melottisetstatus: open -> closed
nosy: georg.brandl, rhettinger, gregory.p.smith, ezio.melotti, michael.foord, MarkRoddy, python-dev
2011-03-17 00:38:54michael.foordsetnosy: georg.brandl, rhettinger, gregory.p.smith, ezio.melotti, michael.foord, MarkRoddy, python-dev
messages: + msg131204
resolution: fixed
stage: resolved
2011-03-17 00:34:08python-devsetnosy: + python-dev
messages: + msg131202
2011-03-16 22:47:36michael.foordsetnosy: georg.brandl, rhettinger, gregory.p.smith, ezio.melotti, michael.foord, MarkRoddy
messages: + msg131190
2011-03-15 19:41:11gregory.p.smithsetnosy: georg.brandl, rhettinger, gregory.p.smith, ezio.melotti, michael.foord, MarkRoddy
messages: + msg131028
2011-01-01 21:52:53rhettingersetpriority: normal -> low

versions: - Python 3.2
messages: + msg125009
nosy: georg.brandl, rhettinger, gregory.p.smith, ezio.melotti, michael.foord, MarkRoddy
2010-12-21 12:43:03michael.foordsetnosy: georg.brandl, rhettinger, gregory.p.smith, ezio.melotti, michael.foord, MarkRoddy
messages: + msg124430
2010-12-19 15:53:47michael.foordsetnosy: georg.brandl, rhettinger, gregory.p.smith, ezio.melotti, michael.foord, MarkRoddy
messages: + msg124360
2010-12-10 01:40:35gregory.p.smithsetnosy: + gregory.p.smith
messages: + msg123712
2010-11-29 08:51:24georg.brandlsetnosy: + georg.brandl
messages: + msg122782
2010-11-27 10:13:37rhettingersetfiles: + nice_output.diff
assignee: rhettinger -> michael.foord
resolution: fixed -> (no value)
messages: + msg122514
2010-11-27 09:34:39rhettingersetpriority: high -> normal
resolution: fixed
messages: + msg122511
2010-11-25 22:13:19rhettingersetassignee: michael.foord -> rhettinger
2010-11-21 00:41:15MarkRoddysetfiles: + py27.10242.patch

messages: + msg121819
2010-11-21 00:38:26MarkRoddysetfiles: + py3k.10242.patch

nosy: + MarkRoddy
messages: + msg121818

keywords: + patch
2010-11-04 02:41:56rhettingersetmessages: + msg120373
2010-11-03 23:36:06michael.foordsetmessages: + msg120356
2010-11-03 23:33:19rhettingersetmessages: + msg120355
2010-11-03 23:06:47rhettingersetassignee: rhettinger -> michael.foord
messages: + msg120352
2010-11-02 01:40:17michael.foordsetmessages: + msg120199
2010-11-01 02:56:55michael.foordsetassignee: michael.foord -> rhettinger
messages: + msg120104
2010-11-01 02:52:17michael.foordsetassignee: rhettinger -> michael.foord
2010-11-01 00:31:07ezio.melottisetnosy: + ezio.melotti
2010-10-31 23:21:53rhettingersetassignee: michael.foord -> rhettinger
messages: + msg120098
2010-10-30 05:26:10rhettingercreate