This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: collections.Counter equality test thrown-off by zero counts
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.2, Python 3.3
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: eric.snow, mark.dickinson, meador.inge, rhettinger, slwebber
Priority: low Keywords:

Created on 2012-03-03 10:10 by rhettinger, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (8)
msg154827 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2012-03-03 10:10
>>> from collections import Counter
>>> x=Counter(a=10,b=0,c=3)
>>> y=Counter(a=10,c=3)
>>> x == y
False
>>> all(x[k]==y[k] for k in set(x) | set(y))
True
msg167533 - (view) Author: Stephen Webber (slwebber) Date: 2012-08-06 02:44
This is intentional handling of non-existant variables, and is not resticted to '==' operations. Returning the value of a Counter parameter that has not yet been set returns 0 by default.

See the documentation here:
http://docs.python.org/library/collections.html

"Counter objects have a dictionary interface except that they return a zero count for missing items instead of raising a KeyError:"

Since this is intended behavior, I recommend this bug become closed.
msg167595 - (view) Author: Meador Inge (meador.inge) * (Python committer) Date: 2012-08-07 01:44
Raymond, Stephen's analysis seems correct.  Are we missing something or can this issue be closed?
msg167605 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2012-08-07 07:15
> Raymond, Stephen's analysis seems correct.  Are we missing something or
> can this issue be closed?

Well, depending on how you think about Counters, the current behaviour of equality definitely leads to some surprises.  For example:

>>> Counter(a = 3) + Counter(b = 0) == Counter(a = 3, b = 0)
False

OTOH, if we're consistent about regarding a count of 0 as 'equivalent' to a missing element, then __nonzero__ / __bool__ probably needs changing, too.

>>> c = Counter(a = 0)
>>> bool(c)
True
>>> bool(c + c)
False
msg167656 - (view) Author: Meador Inge (meador.inge) * (Python committer) Date: 2012-08-08 04:16
Ah, good examples Mark.  So, why is it ever useful keep a key with a value of zero?  In other words, why:

>>> Counter(a=0)
Counter({'a': 0})

instead of:

>>> Counter(a=0)
Counter()

?

The latter seems more consistent to me.
msg167668 - (view) Author: Stephen Webber (slwebber) Date: 2012-08-08 06:18
Hmm, that is odd behavior indeed.

I think having keys that point to zero values is important for iterating over a set. For example:

>>> x = Counter(a=10, b=0)
>>> for k in set(x):
...     x[k] += 1
... 
>>> x
Counter({'a': 11, 'b': 1})

is probably preferable to

>>> x = Counter(a=10, b=0)
>>> for k in set(x):
...     x[k] += 1
... 
>>> x
Counter({'a': 11})

Perhaps to ensure intuitive behavior we could ensure that

>>> Counter(a = 3) + Counter(b = 0) == Counter(a = 3, b = 0)
True

by aggregating all keys into the new Counter object, even those with zero values? I would be happy to make such a patch, as it would be good experience for me to learn. Would this be an acceptable solution, and is there other odd behavior at work here?
msg167674 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2012-08-08 07:40
At its most basic, a Counter is simply a dictionary with a __missing__ method that supplies a default of zero.  It is intentional that everything else behaves as much like a regular dictionary as possible.  You're allowed to store *anything* in the dict values even if those values don't represent numbers.  A consequence is that equality is taken to mean the same a regular dict equality.

The unary-plus is provided as a way to eliminate zeros from a Counter prior to doing a Counter equality test.

Other designs were possible (such as my Bag class mentioned in the docs).  This one was selected for its versatility, but it does present challenges with respect to zeros, negatives, fractions, etc.  I recognize your concern but find it to be at odds with the basic design of the class.  You might be happier with a Multiset class that restricts itself to positive integer counts.
msg167690 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2012-08-08 14:44
I'd missed that unary + (new in 3.3).  That's pretty cool.
History
Date User Action Args
2022-04-11 14:57:27adminsetgithub: 58390
2012-08-08 14:44:12eric.snowsetnosy: + eric.snow
messages: + msg167690
2012-08-08 07:40:26rhettingersetpriority: normal -> low
status: open -> closed
resolution: rejected
messages: + msg167674
2012-08-08 06:18:19slwebbersetmessages: + msg167668
2012-08-08 04:16:23meador.ingesetmessages: + msg167656
2012-08-07 07:15:25mark.dickinsonsetnosy: + mark.dickinson
messages: + msg167605
2012-08-07 01:44:14meador.ingesetnosy: + meador.inge
messages: + msg167595
2012-08-06 02:44:45slwebbersetnosy: + slwebber
messages: + msg167533
2012-03-03 10:10:03rhettingercreate