Issue 14182: collections.Counter equality test thrown-off by zero counts

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/58390

classification

Title:	collections.Counter equality test thrown-off by zero counts
Type:	behavior	Stage:
Components:	Library (Lib)	Versions:	Python 3.2, Python 3.3

process

Status:	closed	Resolution:	rejected
Dependencies:		Superseder:
Assigned To:	rhettinger	Nosy List:	eric.snow, mark.dickinson, meador.inge, rhettinger, slwebber
Priority:	low	Keywords:

Created on 2012-03-03 10:10 by rhettinger, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (8)
msg154827 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2012-03-03 10:10
>>> from collections import Counter >>> x=Counter(a=10,b=0,c=3) >>> y=Counter(a=10,c=3) >>> x == y False >>> all(x[k]==y[k] for k in set(x) \| set(y)) True
msg167533 - (view)	Author: Stephen Webber (slwebber)	Date: 2012-08-06 02:44
This is intentional handling of non-existant variables, and is not resticted to '==' operations. Returning the value of a Counter parameter that has not yet been set returns 0 by default. See the documentation here: http://docs.python.org/library/collections.html "Counter objects have a dictionary interface except that they return a zero count for missing items instead of raising a KeyError:" Since this is intended behavior, I recommend this bug become closed.
msg167595 - (view)	Author: Meador Inge (meador.inge) *	Date: 2012-08-07 01:44
Raymond, Stephen's analysis seems correct. Are we missing something or can this issue be closed?
msg167605 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2012-08-07 07:15
> Raymond, Stephen's analysis seems correct. Are we missing something or > can this issue be closed? Well, depending on how you think about Counters, the current behaviour of equality definitely leads to some surprises. For example: >>> Counter(a = 3) + Counter(b = 0) == Counter(a = 3, b = 0) False OTOH, if we're consistent about regarding a count of 0 as 'equivalent' to a missing element, then __nonzero__ / __bool__ probably needs changing, too. >>> c = Counter(a = 0) >>> bool(c) True >>> bool(c + c) False
msg167656 - (view)	Author: Meador Inge (meador.inge) *	Date: 2012-08-08 04:16
Ah, good examples Mark. So, why is it ever useful keep a key with a value of zero? In other words, why: >>> Counter(a=0) Counter({'a': 0}) instead of: >>> Counter(a=0) Counter() ? The latter seems more consistent to me.
msg167668 - (view)	Author: Stephen Webber (slwebber)	Date: 2012-08-08 06:18
Hmm, that is odd behavior indeed. I think having keys that point to zero values is important for iterating over a set. For example: >>> x = Counter(a=10, b=0) >>> for k in set(x): ... x[k] += 1 ... >>> x Counter({'a': 11, 'b': 1}) is probably preferable to >>> x = Counter(a=10, b=0) >>> for k in set(x): ... x[k] += 1 ... >>> x Counter({'a': 11}) Perhaps to ensure intuitive behavior we could ensure that >>> Counter(a = 3) + Counter(b = 0) == Counter(a = 3, b = 0) True by aggregating all keys into the new Counter object, even those with zero values? I would be happy to make such a patch, as it would be good experience for me to learn. Would this be an acceptable solution, and is there other odd behavior at work here?
msg167674 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2012-08-08 07:40
At its most basic, a Counter is simply a dictionary with a __missing__ method that supplies a default of zero. It is intentional that everything else behaves as much like a regular dictionary as possible. You're allowed to store anything in the dict values even if those values don't represent numbers. A consequence is that equality is taken to mean the same a regular dict equality. The unary-plus is provided as a way to eliminate zeros from a Counter prior to doing a Counter equality test. Other designs were possible (such as my Bag class mentioned in the docs). This one was selected for its versatility, but it does present challenges with respect to zeros, negatives, fractions, etc. I recognize your concern but find it to be at odds with the basic design of the class. You might be happier with a Multiset class that restricts itself to positive integer counts.
msg167690 - (view)	Author: Eric Snow (eric.snow) *	Date: 2012-08-08 14:44
I'd missed that unary + (new in 3.3). That's pretty cool.

History
Date	User	Action	Args
2022-04-11 14:57:27	admin	set	github: 58390
2012-08-08 14:44:12	eric.snow	set	nosy: + eric.snow messages: + msg167690
2012-08-08 07:40:26	rhettinger	set	priority: normal -> low status: open -> closed resolution: rejected messages: + msg167674
2012-08-08 06:18:19	slwebber	set	messages: + msg167668
2012-08-08 04:16:23	meador.inge	set	messages: + msg167656
2012-08-07 07:15:25	mark.dickinson	set	nosy: + mark.dickinson messages: + msg167605
2012-08-07 01:44:14	meador.inge	set	nosy: + meador.inge messages: + msg167595
2012-08-06 02:44:45	slwebber	set	nosy: + slwebber messages: + msg167533
2012-03-03 10:10:03	rhettinger	create