Add collections.counts() #44815

bethard · 2007-04-07T21:38:28Z

BPO	1696199
Nosy	@birkenfeld, @rhettinger, @amauryfa, @pitrou, @ned-deily
Files	collections_counts.diff: Patch adding collections.counts() counter6.diff: Full patch with tests and improved docs counter7.diff

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/rhettinger'
closed_at = <Date 2009-01-12.23:05:13.896>
created_at = <Date 2007-04-07.21:38:28.000>
labels = ['type-feature', 'library']
title = 'Add collections.counts()'
updated_at = <Date 2009-01-12.23:05:13.895>
user = 'https://bugs.python.org/bethard'

bugs.python.org fields:

activity = <Date 2009-01-12.23:05:13.895>
actor = 'rhettinger'
assignee = 'rhettinger'
closed = True
closed_date = <Date 2009-01-12.23:05:13.896>
closer = 'rhettinger'
components = ['Library (Lib)']
creation = <Date 2007-04-07.21:38:28.000>
creator = 'bethard'
dependencies = []
files = ['2345', '12702', '12706']
hgrepos = []
issue_num = 1696199
keywords = ['patch']
message_count = 17.0
messages = ['31726', '31727', '31728', '31729', '31730', '79461', '79467', '79471', '79512', '79650', '79661', '79663', '79690', '79694', '79696', '79698', '79707']
nosy_count = 6.0
nosy_names = ['georg.brandl', 'rhettinger', 'amaury.forgeotdarc', 'pitrou', 'bethard', 'ned.deily']
pr_nums = []
priority = 'normal'
resolution = 'accepted'
stage = 'test needed'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue1696199'
versions = ['Python 3.1', 'Python 2.7']

bethard · 2007-04-07T21:38:28Z

As suggested in http://mail.python.org/pipermail/python-list/2007-April/433986.html this is a patch to add a counts() function to the collections module. Usage looks like:

>>> items = 'acabbacba'
>>> item_counts = counts(items)
>>> for item in 'abcd':
...     print item, item_counts[item]
...
a 4
b 3
c 2
d 0

Yes, it's only a 4-line function, but it's a frequently re-written 4-line function.

gvanrossum · 2007-04-07T23:39:25Z

Does it have to be a defaultdict? I.e. is it important that item_counts['d'] not raise KeyError?

bethard · 2007-04-07T23:48:26Z

I think it's okay if it's not a defaultdict. That was the simplest implementation, but I certainly have no problem calling d.get() when necessary. Should I change the implementation to use a dict()?

gvanrossum · 2007-04-08T00:23:30Z

I guess it's simplicity of implementation vs. simplicity of use. And I'm not even sure which is easier to use. It's just that defaultdicts are a very new thing and still feel "weird" -- even though I pushed for the implementation based on popular demand I'm not a user myself. Perhaps ask around on python-dev?

bethard · 2007-04-19T15:15:06Z

A summary of the python-dev thread (http://mail.python.org/pipermail/python-dev/2007-April/072502.html)

Since the number of times an unseen item was seen is 0, most people felt returning 0 was more natural behavior than raising KeyError.

There was some discussion of alternate names, but most people were fine with counts().

Raymond suggested making it a classmethod of dict, but people were a little concerned about adding to dict's already complex API, and since the result of counts() needed to return 0s instead of raising KeyErrors, it wouldn't really have the same behavior as a plain dict() anyway.

rhettinger · 2009-01-09T11:04:35Z

Attaching an updated patch for Py2.7.

Kept OP's simple constructor call but renamed it from counts() to
Counter():
item_counts = Counter('acabbacba')
Followed Guido's advice and avoided subclassing from defaultdict().
Instead, subclassed from dict, keeping its familiar API.
Avoided KeyError issue by defining __missing__().
Added most_common() method to address a principal use case --
patterned after Smalltalk's sortedByCount() method.
Added elements() method requested by Alex Martelli -- patterned after
C++, Smalltalk, and Knuth's examples.
Made necessary subclass overrides to dict methods: __repr__, update,
fromkeys, and copy.
Wrote docstrings, doctests, and motivating examples.
Verified that instances are copyable, deepcopyable, and picklable.

Working on docs and unittests.

Nice example (most common words in a text file):

>>> import re
>>> words = re.findall('\w+', open('hamlet.txt').read().lower())
>>> Counter(words).most_common(10)
[('the', 1143), ('and', 966), ('to', 762), ('of', 669), ('i', 631),
('you', 554), ('a', 546), ('my', 514), ('hamlet', 471), ('in', 451)]

rhettinger · 2009-01-09T12:13:51Z

Added a few more in-module tests.

pitrou · 2009-01-09T14:36:53Z

Some comments:

Counter sounds like a really strange name for a container. Why not
call it "Bag" or "Multiset" (or "CountingSet"?)
why are the unittests inline rather than in Lib/test? inline tests
don't get executed by the buildbots, nor by people running the test
suite at home

rhettinger · 2009-01-09T21:36:44Z

The counts/counter moniker emerged from the python-dev discussion and
I'm basically happy with it since the typical usage is c=Counter(myseq)
with no other non-dict accesses (mostly just c[elem]+=1 and print
c[elem]). It's a simple counter with a dict interface so the name
shouldn't suggest anything more complicated than that.

To me, MultiSet or CountingSet is too offputtingly computer-sciency and
misleadingly suggests a set-like API instead of a dict interface. I know
several programmers who don't know the terms, bag or multiset, but they
intuitively understand what a counter does. Am open to calling it a Bag
but I rather like the self-descriptiveness and simplicity of Counter.

As noted previously, standalone unittests are forthcoming (and a doc
patch). Wanted to get the API and sample use cases worked-out first.

Thanks for looking at the initial patch.

rhettinger · 2009-01-12T08:01:07Z

Georg, could you give this a once over before I commit? Thanks.

birkenfeld · 2009-01-12T10:40:29Z

Yes, I'll have a look this evening.

rhettinger · 2009-01-12T11:16:10Z

Attaching an update with improved docs. Thanks for looking at this.

amauryfa · 2009-01-12T18:20:57Z

the typical usage is c=Counter(myseq) with no other non-dict accesses
(mostly just c[elem]+=1 and print c[elem])

Isn't collections.defaultdict(lambda:0) enough for this purpose?

bethard · 2009-01-12T18:55:09Z

The whole point was to have a function (or class) that accumulates a
sequence and counts it. collections.defaultdict(lambda: 0) doesn't
achieve this on its own because it only knows how to handle sequences of
(key, value):

>>> d = collections.defaultdict(lambda: 0)
>>> d.update('aaabbac')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: dictionary update sequence element #0 has length 1; 2 is
required

The feature request here was mainly a request to provide an abbreviation
of a very common 4-line function.

ned-deily · 2009-01-12T19:25:09Z

In counter6.diff line 56

"Assigning a count of zero or reducing the count to the zero leaves the"

suggest s/the zero/zero/

birkenfeld · 2009-01-12T19:40:16Z

Attaching new patch with small changes:

Don't describe a class with "Returns ..." as if it was a function.
Indent interposed paragraphs so that the method descriptions still
belong to the .. class directive.
Fixed Ned's typo.
Note that elements() and most_common() order elements arbitrarily.
Reorder the seealso's a bit; the first line is bold.

Questions:

Steven's counter.update('abcdee') wouldn't work -- I guess one is
supposed to use counter.update(Counter('abcdee')). Should that be noted
in the docs?
Is it needed to sort the items for the __repr__?

rhettinger · 2009-01-12T23:05:13Z

Thanks for the review comments. Incorporated all suggested changes and
did some other minor tidying-up. Extended the update example to include
c.update(Counter('abcdee')). Committed as r68559 .

Decided to leave __repr__() with a sort. Though it's not strictly
necessary, it's a nice convenience saving a user from doing the same
thing mentally. Also, it helps with doctests by giving a reliable
ordering when element counts are non-equal. For datasets small enough
that someone would want to display their repr, the time spent sorting is
negligible.

bethard mannequin assigned rhettinger Apr 7, 2007

bethard mannequin added the stdlib Python modules in the Lib dir label Apr 7, 2007

bethard mannequin assigned rhettinger Apr 7, 2007

bethard mannequin added the stdlib Python modules in the Lib dir label Apr 7, 2007

rhettinger added type-feature A feature request or enhancement labels Jan 9, 2009

rhettinger assigned birkenfeld and unassigned rhettinger Jan 12, 2009

rhettinger closed this as completed Jan 12, 2009

rhettinger assigned rhettinger and unassigned birkenfeld Jan 12, 2009

rhettinger closed this as completed Jan 12, 2009

rhettinger assigned rhettinger and unassigned birkenfeld Jan 12, 2009

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add collections.counts() #44815

Add collections.counts() #44815

bethard mannequin commented Apr 7, 2007

bethard mannequin commented Apr 7, 2007

gvanrossum commented Apr 7, 2007

bethard mannequin commented Apr 7, 2007

gvanrossum commented Apr 8, 2007

bethard mannequin commented Apr 19, 2007

rhettinger commented Jan 9, 2009

rhettinger commented Jan 9, 2009

pitrou commented Jan 9, 2009

rhettinger commented Jan 9, 2009

rhettinger commented Jan 12, 2009

birkenfeld commented Jan 12, 2009

rhettinger commented Jan 12, 2009

amauryfa commented Jan 12, 2009

bethard mannequin commented Jan 12, 2009

ned-deily commented Jan 12, 2009

birkenfeld commented Jan 12, 2009

rhettinger commented Jan 12, 2009

Add collections.counts() #44815

Add collections.counts() #44815

Comments

bethard mannequin commented Apr 7, 2007

bethard mannequin commented Apr 7, 2007

gvanrossum commented Apr 7, 2007

bethard mannequin commented Apr 7, 2007

gvanrossum commented Apr 8, 2007

bethard mannequin commented Apr 19, 2007

rhettinger commented Jan 9, 2009

rhettinger commented Jan 9, 2009

pitrou commented Jan 9, 2009

rhettinger commented Jan 9, 2009

rhettinger commented Jan 12, 2009

birkenfeld commented Jan 12, 2009

rhettinger commented Jan 12, 2009

amauryfa commented Jan 12, 2009

bethard mannequin commented Jan 12, 2009

ned-deily commented Jan 12, 2009

birkenfeld commented Jan 12, 2009

rhettinger commented Jan 12, 2009