This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Add __matmul__ to collections.Counter
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.7
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: Jim Fasarakis-Hilliard, Jáchym Barvínek, levkivskyi, rhettinger, terry.reedy
Priority: normal Keywords:

Created on 2017-04-28 10:19 by Jáchym Barvínek, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (3)
msg292522 - (view) Author: Jáchym Barvínek (Jáchym Barvínek) Date: 2017-04-28 10:19
The class collections.Counter should semantically contain only numbers, so it makes sense to define dot product od Counters, something like this:

def __matmul__(self, other):
  return sum(self[x] * other[x] for x in self.keys() | other.keys())

I find this useful ocassionaly.
msg292547 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-04-28 20:11
Periodically, I've looked at possibly adding these kind of extensions. 

The argument in favor in that it is easy to do and is an obvious extension with plausible utility.  

The main arguments against is that it represents feature creep far removed from the original intended use cases (the tool is primary about counting and trys not to venture into elementwise arithmetic on sparse vectors.)   

It is tempting to add a new feature that might sometimes be useful, but we should also worry that usability and learnability are impaired if the class becomes less cohesive, less thematic, and less focused on unified design goals.

The other factor against adding the feature is that since the Counter is just a dict subclass, it is easy for users to just manipulate the data directly.  We're not really adding much that a person can't already easily do themselves.
msg292564 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2017-04-29 05:04
The '|' should be '&' to avoid useless summing of 0 products.

I think this should be rejected.  Python's number and collections classes provide basic operations from which users can build application-specific functions.  Or in the case of floats, we provide a separate module of specialized functions.  Or in the case of itertools, a section of recipes that build on the basic iterators in the module.

In this case, it makes little sense to me to provide an 'inner product' of Counters (bags, multisets).

>>> a, b, c, d = 'a', 'b', 'c', 'd'
>>> c1 = C([a,a,a,b,b,c])
>>> c2 = C([a, c,c,c, d,d,d])
>>> sum(c1[x]*c2[x] for x in c1.keys() & c2.keys())
6

Even if the keys are counts and one thinks of the counters as sparse vectors, they are not really matrices.  Hence @ does not exactly fit.

If one does want sparse vectors implemented as dicts, they do not always have to be Counters.  A function would not require that.

def sparse_inner_prod(v1, v2):
    return sum(v1[x]*v2[x] for x in v1.keys() & v2.keys())

This only requires that v1 and v2 both have keys and __getitem__ methods.
History
Date User Action Args
2022-04-11 14:58:45adminsetgithub: 74382
2017-05-01 03:30:25rhettingersetstatus: open -> closed
resolution: rejected
stage: test needed -> resolved
2017-04-29 05:04:56terry.reedysetnosy: + terry.reedy

messages: + msg292564
stage: test needed
2017-04-28 21:13:04levkivskyisetnosy: + levkivskyi
2017-04-28 20:11:48rhettingersetmessages: + msg292547
2017-04-28 10:48:45Jim Fasarakis-Hilliardsetnosy: + Jim Fasarakis-Hilliard
2017-04-28 10:48:27Jim Fasarakis-Hilliardsetversions: - Python 3.3, Python 3.4, Python 3.5, Python 3.6
2017-04-28 10:20:35serhiy.storchakasetassignee: rhettinger

nosy: + rhettinger
2017-04-28 10:19:20Jáchym Barvínekcreate