classification
Title: Implement partial order on Counter
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: Aaron.Meurer, Saimadhav.Heblikar, cool-RR, eric.smith, ethan.furman, josh.r, mark.dickinson, pitrou, r.david.murray, rhettinger, scoder, serhiy.storchaka, steven.daprano
Priority: normal Keywords: patch

Created on 2014-09-29 19:33 by cool-RR, last changed 2015-10-20 20:19 by Aaron.Meurer. This issue is now closed.

Files
File name Uploaded Description Edit
patch.patch cool-RR, 2014-10-03 09:08 review
patch2.patch cool-RR, 2014-10-03 20:47 review
issue22515.stoneleaf.patch.01 ethan.furman, 2014-10-03 23:53 review
issue22515_partial_order_counter_v2.diff Saimadhav.Heblikar, 2014-10-04 06:57 review
issue22515_partial_order_counter_v3.diff serhiy.storchaka, 2014-10-04 19:39 review
Messages (66)
msg227819 - (view) Author: Ram Rachum (cool-RR) * Date: 2014-09-29 19:33
I suggest implementing `Counter.__lt__` which will be a partial order, similarly to `set.__lt__`. That is, one counter will be considered smaller-or-equal to another if for any item in the first counter, the second counter has an equal or bigger amount of that item.
msg227824 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2014-09-29 20:15
On Mon, Sep 29, 2014 at 07:33:21PM +0000, Ram Rachum wrote:
> I suggest implementing `Counter.__lt__` which will be a partial order, 
> similarly to `set.__lt__`. 

Since Counter is described as a multiset, this sounds reasonable to me.
msg227827 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-09-29 20:24
What should be result of following operations?

Counter({'a': 0}) < Counter({})
Counter({}) < Counter({'a': 0})
Counter({'a': -1}) < Counter({})
Counter({}) < Counter({'a': -1})
msg227828 - (view) Author: Ram Rachum (cool-RR) * Date: 2014-09-29 20:26
I suggest they be ignored like in `elements`.
msg227829 - (view) Author: Ram Rachum (cool-RR) * Date: 2014-09-29 20:30
(I mean, the non-positive values should be ignored.)
msg227831 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-09-29 20:35
There is other definition of the <= operator for sets: "A <= B" is equivalent to "len(A - B) == 0". Extending to Counter this can mean "len(A.subtract(B).elements()) == 0".
msg227842 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-09-29 21:15
What would be the result of

  Counter({'a':1, 'b':2}) < Counter({'a':2, 'b':1})

?
msg227844 - (view) Author: Ram Rachum (cool-RR) * Date: 2014-09-29 21:20
False, like with sets.
msg227845 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-09-29 21:21
Thanks, Ethan.  I had a feeling that this wasn't well defined but I couldn't come up with an example :)
msg227846 - (view) Author: Ram Rachum (cool-RR) * Date: 2014-09-29 21:24
David, there's nothing here that isn't well defined. It's simply a partial order, not a total order. We have the same for sets.
msg227847 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-09-29 22:06
set's don't have values, and you are wanting to implement the partial ordering based on the values.  (side-note: how does partial-ordering work for sets?)

> That is, one counter will be considered smaller-or-equal to another if for any
> item in the first counter, the second counter has an equal or bigger amount of
> that item.

According to your definition, my example should have returned True, which is clearly nonsensical.

Even if you changed the definition to:

  For every item in the first counter, that item's value is less than the
  corresponding item in the second counter.

You have situations like:

  Counter({'a':1, 'b':1}) < Counter({'a':2})

I just don't think there is one interpretation that is going to be correct most of the time.
msg227848 - (view) Author: Ram Rachum (cool-RR) * Date: 2014-09-29 22:30
Ethan, I don't understand what the problem is. I also don't understand your side note question "how does partial-ordering work for sets?" I'm not sure what you're asking. 

>> That is, one counter will be considered smaller-or-equal to another if for any
>> item in the first counter, the second counter has an equal or bigger amount of
>> that item.

> According to your definition, my example should have returned True, which is clearly nonsensical.

In your example the first counter had `b` twice, while the second counter had it only once. So the result is `False`. This comes pretty directly from my definition, I'm not sure where the confusion is. 

Regarding your new example: Since `Counter` works like a `defaultdict`, the second counter has a zero quantity of `b`. So the answer is again `False`.
msg227849 - (view) Author: Ram Rachum (cool-RR) * Date: 2014-09-29 22:32
To put it another way: In Python sets, `a <= b` iff `b` has everything that `a` has, and possibly more. I'm proposing the exact same definition for counters. True, the implementation might be different because sets are not dicts and don't have `.values()`, but that is just the implementation, which is not the important thing here.
msg227850 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-09-29 22:36
That is certainly a better definition.  If you carefully reread your original post you will see that you wrote something different ('any' instead of 'every').  Also, your OP uses '__lt__' but talks about 'less-than-or-equal -- somewhat confusing.  ;)
msg227851 - (view) Author: Ram Rachum (cool-RR) * Date: 2014-09-29 22:44
You're right, sorry. I meant the mathematical "for any" which means "for every": https://en.wikipedia.org/wiki/List_of_mathematical_symbols (See "for any;")
msg227852 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-09-29 22:54
The request sounds ok to me. As in the corner case of negative values, it seems that Counter currently doesn't elide zero values, e.g.:

>>> Counter({'a': 0}) == Counter({})
False

Therefore, we should have:

>>> Counter({'a': -1}) < Counter({'a': 0})
True

but:

>>> Counter({'a': -1}) < Counter({})
False

(I must admit, the current semantics of Counter look a bit ad hoc and not terribly consistent... halfway between a regular mapping and a multiset)
msg227857 - (view) Author: Ram Rachum (cool-RR) * Date: 2014-09-29 23:13
Shall I write a patch?
msg227858 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-09-29 23:17
I think the final call is Raymond's (whether to accept the patch), but go ahead.  Worst case scenario is you'll have your own thoroughly tested PartiallyOrderedCounter you can use in your own code.  :)
msg227860 - (view) Author: Ram Rachum (cool-RR) * Date: 2014-09-29 23:21
If that's the case I'd prefer Raymond to first say whether this feature is generally welcome before spending my time on making a patch.
msg228071 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2014-10-01 14:18
Is there some particular problem you're trying to solve, which this would make easier?

Without a use case, I'm -1.
msg228072 - (view) Author: Ram Rachum (cool-RR) * Date: 2014-10-01 14:23
I needed it for an internal calculation in a combinatorics package I'm writing. Do you want me to take the time to explain the calculation? It's quite complex.
msg228077 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2014-10-01 14:50
No need to explain it. It sounds like it's not generally useful, so I'm still -1.
msg228082 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-10-01 16:04
Even if you don't find it useful, Eric, it doesn't take up the method space. You can very easily ignore it, there's no cognitive burden.
msg228083 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-10-01 16:27
If it is so specialized as to only be needed in complex combinatorial calculations, does it belong in the general-purpose part of the language?  After all, we have the math and cmath modules for more specialized arithmetic operations.
msg228084 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-10-01 16:29
Curiousity question:  What happens if you try to sort a list of partially ordered Counters?
msg228085 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-10-01 16:30
> If it is so specialized as to only be needed in complex combinatorial calculations

How do you know it is "only needed in complex combinatorial calculations"?

> What happens if you try to sort a list of partially ordered Counters?

Try it with partially ordered sets.
msg228086 - (view) Author: Ram Rachum (cool-RR) * Date: 2014-10-01 16:42
I don't see why it's so hard to imagine how this will be used. Say I have a counter signifying how many of each product I have in my warehouse, and I have another counter saying how many of each product a client wants. I may use `counter2 <= counter1` to check whether there are enough products in stock. Sounds straightforward to me.
msg228087 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-10-01 16:57
--> s1 = set([1])
--> s2 = set([1, 2])
--> s3 = set([1, 2, 3])
--> s4 = set([2])
--> s5 = set([2, 3])
--> s6 = set([3])

--> l = [s1, s2, s3, s4, s5, s6]
--> sorted(l)
[{1}, {2}, {1, 2}, {3}, {2, 3}, {1, 2, 3}]

--> s1 < s4
False

--> s4 < s2
True

--> s1 < s2
True

--> s4 < s6
False

--> l = [s1, s4]
--> sorted(l)
[{1}, {2}]

--> sorted(l)
[{1}, {2}]


Looks horribly messy to me.  In the last example we can see that neither s1 nor s4 are smaller, yet s1 is consistently put first.

On the other hand, I suppose it's okay for Counter to have this behavior since it's already in sets.
msg228088 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2014-10-01 16:59
Ethan said:

> If it is so specialized as to only be needed in complex combinatorial 
> calculations, does it belong in the general-purpose part of the 
> language?

It's a multi-set, a general purpose and fairly fundamental data type.

https://en.wikipedia.org/wiki/Set_%28abstract_data_type%29#Multiset

And later:

> Curiousity question:  What happens if you try to sort a list of 
> partially ordered Counters?

The same thing that happens when you sort a list of any partially 
ordered objects, such as sets:

py> sorted([{1, 2, 3, 4}, {2, 4}, {1, 3}, {2, 3, 4}, {1, 2, 3}])
[{2, 4}, {1, 3}, {2, 3, 4}, {1, 2, 3}, {1, 2, 3, 4}]

You get some order, but since sorting assumes a total order, not just 
partial order, the result isn't really meaningful, and will very likely 
depend on the initial order of the items. If that worries you, then 
don't sort items that implement only a partial order.
msg228089 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-10-01 17:10
I'll go with +0.5.  :)

If this goes in, I think a missing key in either Counter should default to 0 for purposes of the ordering.
msg228090 - (view) Author: Ram Rachum (cool-RR) * Date: 2014-10-01 17:13
If/when there's general agreement that this functionality should be merged in (assuming the implementation is acceptable), let me know and I'll be happy to write the code and tests.
msg228094 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-10-01 17:18
> I'll go with +0.5.  :)

I was going to make a joke about Counters only accepting integral values, but the constructor is actually quite laxist:

>>> Counter({'a': 2.5})
Counter({'a': 2.5})
>>> Counter({'a': 2.5 + 1j})
Counter({'a': (2.5+1j)})
>>> Counter({'a': 'b'})
Counter({'a': 'b'})

(!)
msg228096 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-10-01 17:21
That does seem odd -- how can you have 'b' of something?
msg228097 - (view) Author: Ram Rachum (cool-RR) * Date: 2014-10-01 17:24
The real problem is that `Counter` was put into the standard library with such lenient conditions on how it should be used-- basically looking at it as a `dict` subclass rather than a counter, putting emphasis on implementation rather than purpose, which was a mistake that we'll now have to deal with forever.
msg228098 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2014-10-01 17:28
> That does seem odd -- how can you have 'b' of something?

Don't ask this kind of question as long as any math guys are listening.

But seriously, I consider this proposal reasonable. If the problem is that the counter values can be non-integers, then why not just raise an exception if someone tries to test the ordering with illegal (i.e. unordered or uncomparable) values? Data quality issues are best handled on user side.
msg228099 - (view) Author: Ram Rachum (cool-RR) * Date: 2014-10-01 17:31
Stefan: If you'd want to raise an exception, you'll need to define in which cases. Should it raise an exception for decimal numbers? Complex numbers? etc.

Personally I'd enjoy it raising exceptions in as many situations as possible (so we could keep Counter usage focused to actual counter usage and not just wild dicts) but I doubt it'll sit well with people who might use Counters with weird values.
msg228100 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-10-01 17:35
I have to disagree.  The intent is clearly expressed in the docs [1].  However, if I have a need to deal with partial amounts (say, 2.5 apples because I gave half of one to my horse ;), Counter will still work with that:

  --> treats = Counter({'carrots':12, 'apples':3, 'sugar_cubes':100})
  --> treats
  Counter({'sugar_cubes': 100, 'carrots': 12, 'apples': 3})
  --> treats['apples'] -= 0.5
  --> treats
  Counter({'sugar_cubes': 100, 'carrots': 12, 'apples': 2.5})

At least, it will until we fix that bug.  ;)
msg228101 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2014-10-01 17:36
I'd raise an exception whenever the values of the Counter turn out to be uncomparable. If users manage to fill Counters with non-number values that are comparable amongst each other, or even if each key has a value of a different type, why not just support that?

All that is really required for this is that you can compare a) keys for equality and b) values of identical keys for ordering, right?
msg228102 - (view) Author: Ram Rachum (cool-RR) * Date: 2014-10-01 17:38
Ah, now I understand you Stefan. That sounds like a good scheme; let any comparison errors between the values simply propagate up.
msg228105 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-10-01 17:51
[1] https://docs.python.org/3/library/collections.html#collections.Counter
msg228236 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-10-02 16:15
I, myself, wrote:
-----------------
> At least, it will until we fix that bug.  ;)


Happily, reading the whole documentation revealed that non-integer values are allowed, so it's not a bug.  :)
msg228312 - (view) Author: Ram Rachum (cool-RR) * Date: 2014-10-03 09:08
Attached patch of implementation with tests. I haven't actually run this (because I don't know how to run Python's test suite) so there are likely to be bugs. if there's general agreement about this direction I'll figure out how to run the test so I could ensure it works.
msg228374 - (view) Author: Ram Rachum (cool-RR) * Date: 2014-10-03 20:47
Fixed patch attached. Still needs someone to run it and see whether there are any bugs.
msg228391 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-10-03 22:05
Testing the patch...
msg228398 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-10-03 22:24
In going through the tests I have found an edge-case that I am unsure of how to handle:

  Counter(a=1) < Counter(a=1, b=1)

On the one hand, they both have the same value for 'a'; on the other hand, the second counter has more elements...

So should the result be False or True?
msg228401 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-10-03 22:25
>   Counter(a=1) < Counter(a=1, b=1)

Do an analogy with sets and the answer will be obvious.
msg228402 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-10-03 22:29
I am not a mathematician -- feel free to include the answer with your hints.  ;)

If my analogy is correct, this situation is similar to

  {1} < {1, 2}

Which is True.  Can someone verify that I am correct?
msg228403 - (view) Author: Ram Rachum (cool-RR) * Date: 2014-10-03 22:30
Yes, correct.
msg228411 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-10-03 23:53
New patch attached.
msg228429 - (view) Author: Saimadhav Heblikar (Saimadhav.Heblikar) * Date: 2014-10-04 06:57
I mentioned my wording for the comment about lesser than partial ordering using the code review tool.

In the attached patch(based on issue22515.stoneleaf.patch.01), I have added test for the case when the other object is not a mapping.

I have also tried to refactor the __le__ and __ge__ methods.

Please comment on the changes.
Regards
msg228433 - (view) Author: Ram Rachum (cool-RR) * Date: 2014-10-04 07:38
I don't really understand the refactoring. For example why is `len(self) <= len(other)` needed? For which case? Note that `len` would also catch keys with `0` value which shouldn't have an effect.
msg228436 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-10-04 07:57
There are some properties of set comparison:

(a < b) == (a <= b and a != b)
(a <= b) == (a < b or a == b)
(a <= b and b <= a) == (a == b)
(a < b and b < a) == False
(a <= b) == (a - b == set())
if (a <= b and b <= c) then (a <= c)
(a <= b and a <= c) == (a <= (b & c))
(a <= c and b <= c) == ((a | b) <= c)

Is this true for Counter?
msg228440 - (view) Author: Ram Rachum (cool-RR) * Date: 2014-10-04 08:38
> There are some properties of set comparison:
> 
> (a < b) == (a <= b and a != b)

I don't think so, because counter equality is based on dict equality so it doesn't ignore zero values.

> (a <= b) == (a < b or a == b)

No, ditto.

> (a <= b and b <= a) == (a == b)

No, ditto.

> (a < b and b < a) == False

Yes.

> (a <= b) == (a - b == set())

Yes.

> if (a <= b and b <= c) then (a <= c)

Yes.

> (a <= b and a <= c) == (a <= (b & c))

Yes.

> (a <= c and b <= c) == ((a | b) <= c)

Yes.
msg228441 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2014-10-04 09:58
On Sat, Oct 04, 2014 at 07:57:16AM +0000, Serhiy Storchaka wrote:

> There are some properties of set comparison:
> 
> (a < b) == (a <= b and a != b)
> (a <= b) == (a < b or a == b)
> (a <= b and b <= a) == (a == b)
> (a < b and b < a) == False
> (a <= b) == (a - b == set())
> if (a <= b and b <= c) then (a <= c)
> (a <= b and a <= c) == (a <= (b & c))
> (a <= c and b <= c) == ((a | b) <= c)
> 
> Is this true for Counter?

I believe these are all true for multisets.

But Counter is not *just* a multiset (a.k.a. bag) since the "counts" are 
not limited to non-negative integers (the natural numbers).

py> C = Counter()
py> C['a'] = -3
py> C['b'] = 'spam'
py> C['c'] = 0.5
py> C
Counter({'c': 0.5, 'b': 'spam', 'a': -3})

I don't know of any standard definition for subset and superset for 
multisets like C above. I think that we should raise an exception in 
cases like that: TypeError for cases where any "count" is non-numeric, 
and ValueError for cases where any count is not a non-negative integer.

(Later, if somebody comes up with a sensible meaning for sub/superset of 
such a Counter, we can relax that restriction.)

Another issue: are missing elements treated as if they have count 0? 
E.g. what are these?

Counter({'a': 0, 'b': 1}) <= Counter({'a': 1, 'b': 1})
Counter({'a': 0, 'b': 1}) <= Counter({'b': 1})

According to these pages:

http://singularcontiguity.wordpress.com/2008/05/07/multiset-equality-and-submultisets/
http://singularcontiguity.wordpress.com/2008/04/28/multiset-membership/

I would argue that they should both return True (based on the idea that 
the "support" of a multiset is the set of those elements with non-zero 
count). But I don't know how authoritative that blog is. (The author 
even mentions that he only has a passing familiarity with some aspects 
of multisets.)

Anyone familiar with Haskell? What does this bag class do?

http://hackage.haskell.org/package/uulib-0.9.8/docs/UU-DData-IntBag.html

This may also be helpful. Sets and bags in Z language:

http://www.cs.ox.ac.uk/people/michael.wooldridge/teaching/soft-eng/lect08.pdf
http://www.cs.ox.ac.uk/people/michael.wooldridge/teaching/soft-eng/lect14.pdf
msg228442 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-10-04 10:08
At least this condition is false for current implementation:

(a < b and b < a) == False

Example: a = Counter({'a': -1}), b = Counter({'b': -1}).
msg228443 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2014-10-04 10:13
On Sat, Oct 04, 2014 at 08:38:17AM +0000, Ram Rachum wrote:
> 
> Ram Rachum added the comment:
> 
> > There are some properties of set comparison:
> > 
> > (a < b) == (a <= b and a != b)
> 
> I don't think so, because counter equality is based on dict equality so it doesn't ignore zero values.

That's a very good point! That suggests that we shouldn't treat Counters 
as multisets (a.k.a. bags) and define subset/superset operators. They're 
*almost* multisets, but not quite the same, since:

Counter({'a': 0}) != Counter({})

but the formal definition of multiset suggests those should be equal:

https://en.wikipedia.org/wiki/Multiset#Formal_definition

Perhaps we should rethink this. I'm starting to wonder whether we need a 
PEP to define exactly how subset and superset should be defined for 
Counters. This is precisely the trouble with rushing to add code before 
we know what the code is supposed to do... :-)
msg228444 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-10-04 10:43
Just because something is discussed doesn't mean it needs a PEP.
Also, see http://bugs.python.org/issue22515#msg227852
msg228445 - (view) Author: Ram Rachum (cool-RR) * Date: 2014-10-04 10:43
I see that in my implementation for __lt__ and __gt__ I forgot to check whether the other counter has elements that this counter doesn't, which could flip `found_strict_difference`.
msg228461 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2014-10-04 15:24
On Sat, Oct 04, 2014 at 10:43:02AM +0000, Antoine Pitrou wrote:

> Just because something is discussed doesn't mean it needs a PEP.

I know that in general discussions do not need a PEP, but we seem to be 
rushing awfully fast into writing code before we even know what the code 
is supposed to do. Perhaps "need a PEP" was too strong, but we surely 
need to do something to decide on constistent and well-defined 
behaviour. As you pointed out earlier, Counters aren't precisely 
multisets, they're more like a hybrid mapping/multiset. It's not clear 
what behaviour subset and superset should have for such a hybrid. By 
rushing into code before deciding on what the code is supposed to do, we 
end up with inconsistent behaviour.

Here are a pair of Counters which compare "is subset", but not "is 
subset or equal" using Ram's patch:

py> Counter(a=1, b=0) < Counter(a=2)
True
py> Counter(a=1, b=0) <= Counter(a=2)
False

On the other hand, here's a pair which compares "is subset or equal", 
but not ("is subset" or "equal"):

py> Counter(a=0) <= Counter(b=0)
True
py> (Counter(a=0) < Counter(b=0)) or (Counter(a=0) == Counter(b=0))
False

Adding elements with count=0 changes whether a set is a subset or not:

py> Counter(a=0) < Counter(b=0)
False
py> Counter(a=0) < Counter(b=0, c=0)
True
py> Counter(a=0, b=0) < Counter(b=0, c=0)
False

I'm not convinced that mixed Counter and regular dict comparisons should 
be supported, but Ram's patch supports any Mapping type. Is that a good 
idea? There's been no discussion on that question.

Can somebody explain to me what this means? How is this meaningfully a 
subset operation?

py> Counter(a="eggs") < Counter(a="spam")
True

I supported adding these operations to Counter, but the last thing I 
want is to enshrine a set of ad-hoc and inconsistent semantics that 
don't match standard multiset behaviour and are defined by the 
implementation, instead of having the implementation defined by the 
desired semantics.
msg228462 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-10-04 15:39
Le 04/10/2014 17:24, Steven D'Aprano a écrit :
> 
> I know that in general discussions do not need a PEP, but we seem to be 
> rushing awfully fast into writing code before we even know what the code 
> is supposed to do.

Writing a patch is useful practice because it's a way to discover latent
problems (especially when writing unit tests).

> py> Counter(a=1, b=0) < Counter(a=2)
> True
> py> Counter(a=1, b=0) <= Counter(a=2)
> False

That looks wrong to me.

> py> Counter(a=0) <= Counter(b=0)
> True
> py> (Counter(a=0) < Counter(b=0)) or (Counter(a=0) == Counter(b=0))
> False

Ditto.

> py> Counter(a=0) < Counter(b=0)
> False
> py> Counter(a=0) < Counter(b=0, c=0)
> True

Yuck.

> I'm not convinced that mixed Counter and regular dict comparisons should 
> be supported, but Ram's patch supports any Mapping type. Is that a good 
> idea? There's been no discussion on that question.

That sounds wrong to me as well.

> Can somebody explain to me what this means? How is this meaningfully a 
> subset operation?
> 
> py> Counter(a="eggs") < Counter(a="spam")
> True

I agree this looks silly. However, the Counter doc says:

"""The multiset methods are designed only for use cases with positive
values. [...] There are no type restrictions, but the value type needs
to support addition, subtraction, and comparison."""

Which really sounds like an excuse to not perform any type checking and
silently allow nonsensical multiset behaviour with non-integer keys. We
can follow that lead :-)
msg228463 - (view) Author: Ram Rachum (cool-RR) * Date: 2014-10-04 15:54
Hi everybody,

I agree we have a mess on our hands, with lots of cases where comparison is weird. To be honest I've given up on `collections.Counter` at this point. The fact that it's a "hybrid mapping/multiset" is the cause of all the problems, so I just made my own class that is defined as a multiset with only positive integer values, and I'm going to use that and define methods on it without having to worry about these weird cases. Unless someone else wants to champion this issue and resolve the edge cases, it can be closed as far as I'm concerned.

I really wish that `collections.Counter` wouldn't have been added to the standard library as a "hybrid mapping/multiset"; I think that it shows emphasis on the implementation rather than the usage, while the usage is more important. But there's nothing I can do about that. (If there was a place where we can make notes for Python 4, I'd put a note there about it.)
msg228477 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-10-04 18:43
We are not "rushing into code", we are getting some code, and tests, written to help define what we are talking about.

So far the feedback has given some more tests to add, and some things to change (such as only comparing with other Counters).

The big question I have is how do we handle <= and >= ?  Normal Counter == treats a key with a zero value as being unequeal to a Counter without that same key, but I don't think that makes sense for partial ordering.
msg228487 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-10-04 19:39
This implementation obeys more set properties.
msg229065 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-10-11 07:37
> I suggest implementing `Counter.__lt__` which will be a 
> partial order, similarly to `set.__lt__`.

This would be reasonable if the counter were a strict stand-alone multiset or bag; however, the core design is much looser than that (supporting negative counts for example).

It would be more accurate to think about the Counter as a dict that returns 0 for missing keys and has few extra methods that can be handy in the content of counting things.

Accordingly, a partial ordering operator wouldn't be well-defined for some of the intended use cases. 

The addition of subset/superset operations was considered and rejected during the original design phase (the omission was intentional).  It is better to leave this to the user to decide what they want to do and how they intend to do so.  After all, it is as easily to work with as a regular dict.  You can access it directly or subclass it to fit your own needs.

The starting point for this class (when first proposed and endorsed by Guido) was:

   class Counter(dict):
       def __missing__(self, key):
           return 0

That core idea is dirt simple and it is not hard to build your own variants if the one in collections doesn't meet your needs.
msg231365 - (view) Author: Ram Rachum (cool-RR) * Date: 2014-11-19 10:27
Hi everyone,

Just wanted to follow up here that I've created my own dedicated bag class.

Documentation: https://combi.readthedocs.org/en/stable/bags.html

Install using `pip install combi`

It's packed with features. There are a ton of arithmetic operations defined and the comparisons methods we talked about in this issue. There are also frozen and ordered variants.
msg253251 - (view) Author: Aaron Meurer (Aaron.Meurer) Date: 2015-10-20 20:19
I can't believe this issue was closed. Why can't Counter.__lt__(self, other) just be all(self[i] < other[i] for i in self)? 

Just because Counter supports weird stuff shouldn't bill it out. To follow that logic, we should also remove Counter.subtract

>>> Counter(a='b') - Counter(a='c')
Traceback (most recent call last):
  File "<ipython-input-5-31b2df7f8ff1>", line 1, in <module>
    Counter(a='b') - Counter(a='c')
  File "/Users/aaronmeurer/anaconda/lib/python3.5/collections/__init__.py", line 709, in __sub__
    newcount = count - other[elem]
TypeError: unsupported operand type(s) for -: 'str' and 'str'

It's super annoying that Counter supports all the basic set operations *except* subset. A reminder that this *does* work:

>>> Counter(a=2) | Counter(a=1)
Counter({'a': 2})

And also fails on weird values:

>>> Counter(a='b') | Counter(a='c')
Traceback (most recent call last):
  File "<ipython-input-8-f9768ad92117>", line 1, in <module>
    Counter(a='b') | Counter(a='c')
  File "/Users/aaronmeurer/anaconda/lib/python3.5/collections/__init__.py", line 730, in __or__
    if newcount > 0:
TypeError: unorderable types: str() > int()
History
Date User Action Args
2016-09-28 09:18:28serhiy.storchakalinkissue28296 superseder
2015-10-20 20:19:07Aaron.Meurersetnosy: + Aaron.Meurer
messages: + msg253251
2014-11-19 10:27:17cool-RRsetmessages: + msg231365
2014-10-11 07:37:32rhettingersetstatus: open -> closed
resolution: rejected
messages: + msg229065
2014-10-11 06:50:31rhettingersetassignee: rhettinger
2014-10-04 19:39:37serhiy.storchakasetfiles: + issue22515_partial_order_counter_v3.diff

messages: + msg228487
2014-10-04 18:43:58ethan.furmansetmessages: + msg228477
2014-10-04 15:54:18cool-RRsetmessages: + msg228463
2014-10-04 15:39:04pitrousetmessages: + msg228462
2014-10-04 15:24:23steven.dapranosetmessages: + msg228461
2014-10-04 10:43:39cool-RRsetmessages: + msg228445
2014-10-04 10:43:02pitrousetmessages: + msg228444
2014-10-04 10:13:25steven.dapranosetmessages: + msg228443
2014-10-04 10:08:33serhiy.storchakasetmessages: + msg228442
2014-10-04 09:58:52steven.dapranosetmessages: + msg228441
2014-10-04 09:00:35mark.dickinsonsetnosy: + mark.dickinson
2014-10-04 08:38:17cool-RRsetmessages: + msg228440
2014-10-04 07:57:16serhiy.storchakasetmessages: + msg228436
2014-10-04 07:38:04cool-RRsetmessages: + msg228433
2014-10-04 06:57:55Saimadhav.Heblikarsetfiles: + issue22515_partial_order_counter_v2.diff
nosy: + Saimadhav.Heblikar
messages: + msg228429

2014-10-03 23:53:26ethan.furmansetfiles: + issue22515.stoneleaf.patch.01

messages: + msg228411
2014-10-03 22:30:32cool-RRsetmessages: + msg228403
2014-10-03 22:29:52ethan.furmansetstage: test needed -> patch review
2014-10-03 22:29:39ethan.furmansetmessages: + msg228402
2014-10-03 22:25:19pitrousetmessages: + msg228401
2014-10-03 22:24:14ethan.furmansetmessages: + msg228398
2014-10-03 22:05:45ethan.furmansetmessages: + msg228391
2014-10-03 20:47:56cool-RRsetfiles: + patch2.patch

messages: + msg228374
2014-10-03 09:08:19cool-RRsetfiles: + patch.patch
keywords: + patch
messages: + msg228312
2014-10-02 16:15:45ethan.furmansetmessages: + msg228236
2014-10-01 17:51:33ethan.furmansetmessages: + msg228105
2014-10-01 17:38:50cool-RRsetmessages: + msg228102
2014-10-01 17:36:35scodersetmessages: + msg228101
2014-10-01 17:35:49ethan.furmansetmessages: + msg228100
2014-10-01 17:31:52cool-RRsetmessages: + msg228099
2014-10-01 17:28:42scodersetnosy: + scoder
messages: + msg228098
2014-10-01 17:24:13cool-RRsetmessages: + msg228097
2014-10-01 17:21:58ethan.furmansetmessages: + msg228096
2014-10-01 17:18:48pitrousetmessages: + msg228094
2014-10-01 17:13:36cool-RRsetmessages: + msg228090
2014-10-01 17:10:08ethan.furmansetmessages: + msg228089
stage: test needed
2014-10-01 16:59:27steven.dapranosetmessages: + msg228088
2014-10-01 16:57:58ethan.furmansetmessages: + msg228087
2014-10-01 16:42:06cool-RRsetmessages: + msg228086
2014-10-01 16:30:14pitrousetmessages: + msg228085
2014-10-01 16:29:04ethan.furmansetmessages: + msg228084
2014-10-01 16:27:33ethan.furmansetmessages: + msg228083
2014-10-01 16:04:59pitrousetmessages: + msg228082
2014-10-01 14:50:21eric.smithsetmessages: + msg228077
2014-10-01 14:23:47cool-RRsetmessages: + msg228072
2014-10-01 14:18:10eric.smithsetnosy: + eric.smith
messages: + msg228071
2014-09-29 23:21:30cool-RRsetmessages: + msg227860
2014-09-29 23:17:52ethan.furmansetnosy: + rhettinger
type: enhancement
messages: + msg227858
2014-09-29 23:13:24cool-RRsetmessages: + msg227857
2014-09-29 22:54:15pitrousetnosy: + pitrou
messages: + msg227852
2014-09-29 22:44:47cool-RRsetmessages: + msg227851
2014-09-29 22:36:39ethan.furmansetmessages: + msg227850
2014-09-29 22:32:58cool-RRsetmessages: + msg227849
2014-09-29 22:30:54cool-RRsetmessages: + msg227848
2014-09-29 22:16:19josh.rsetnosy: + josh.r
2014-09-29 22:06:02ethan.furmansetmessages: + msg227847
2014-09-29 21:24:53cool-RRsetmessages: + msg227846
2014-09-29 21:21:14r.david.murraysetnosy: + r.david.murray
messages: + msg227845
2014-09-29 21:20:59cool-RRsetmessages: + msg227844
2014-09-29 21:15:34ethan.furmansetnosy: + ethan.furman
messages: + msg227842
2014-09-29 20:35:26serhiy.storchakasetmessages: + msg227831
2014-09-29 20:30:54cool-RRsetmessages: + msg227829
2014-09-29 20:26:45cool-RRsetmessages: + msg227828
2014-09-29 20:24:23serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg227827
2014-09-29 20:15:30steven.dapranosetnosy: + steven.daprano
messages: + msg227824
2014-09-29 19:33:21cool-RRcreate