classification
Title: collections.counter examples are misleading
Type: Stage: resolved
Components: Documentation Versions: Python 3.6, Python 3.4, Python 3.5, Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: anthony-flury, cheryl.sabella, docs@python, rhettinger
Priority: normal Keywords:

Created on 2018-02-05 00:34 by anthony-flury, last changed 2018-02-06 07:20 by anthony-flury. This issue is now closed.

Messages (5)
msg311630 - (view) Author: Anthony Flury (anthony-flury) * Date: 2018-02-05 00:34
The first example given for collections.Counter is misleading - the documentation ideally should show the 'best' (one and only one) way to do something and the example is this : 

>>> # Tally occurrences of words in a list
>>> cnt = Counter()
>>> for word in ['red', 'blue', 'red', 'green', 'blue', 'blue']:
...     cnt[word] += 1
>>> cnt
Counter({'blue': 3, 'red': 2, 'green': 1})

clearly this could simply be : 

>>> # Tally occurrences of words in a list
>>> cnt = Counter(['red', 'blue', 'red', 'green', 'blue', 'blue'])
>>> cnt
Counter({'blue': 3, 'red': 2, 'green': 1})

(i.e. the iteration through the array is unneeded in this example).

The 2nd example is better in showing the 'entry-level' use of the Counter class.

There possibly does need to be a simple example of when you might manually increment the Counter class - but I don't think that the examples given illustrate that in a useful way; and I personally haven't come across a use-case for manually incrementing the Counter class entires that couldn't be accomplished with a comprehension or generator expression passed directly to the Counter constructor.
msg311643 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-02-05 04:36
Thanks for the suggestion.  I respectfully disagree.  The "core" functionality of Counter is the ability to write c['x'] += 1 without risking a KeyError.  The add-on capability is to process an entire iterable all at once.   This is analogous to the list() builtin- where the core ability is to write s.append(e) and there is a convenience of calling list(iterable).

Another reason the first example goes first because it is simple.  It shows counting in isolation with no other distractions (an in-vitro example).

The second example is in a more complex environment incorporating file access and regular expressions (an in-vivo example).

FWIW, there are plenty of examples of using the += style.  Here's one I use in my Python courses:

    'Scan a log file from a NASA server'

    import collections, re, pprint

    visited = collections.Counter()
    with open('notes/nasa_19950801.log') as f:
        for line in f:
            mo = re.search(r'GET\s+(\S+)\s+200', line)
            if mo is not None:
                url = mo.group(1)
                visited[url] += 1

    pprint.pprint(visited.most_common(20))

I've had good luck with people understanding the docs as-is, so I'm going to decline the suggestion.  I do appreciate you taking the time to share your thoughts.
msg311666 - (view) Author: Anthony Flury (anthony-flury) * Date: 2018-02-05 13:11
Raymond, 
I completely understand your comment but I do disagree.

My view would be that the documentation of the stdlib should document the entry level use cases.
The first example given uses nothing special from the Counter class - you could implement exactly the same with a defaultdict(int) - the only difference would be that output will read defaultdict(<type 'int'>,{'blue': 3, 'red': 2, 'green': 1}).

I think the examples in the documentation should at least demonstrate something important on the class being documented - and the first example doesn't.

I am very tempted to re-open - but I wont - no benefit in bouncing the status as we discuss this.
msg311702 - (view) Author: Cheryl Sabella (cheryl.sabella) * (Python committer) Date: 2018-02-06 02:21
You know, I'm not sure if I had ever seen that example before.  When you click Counter at the top of the page, it goes right to the class definition, which is past the example.

Having said that, I really like the example.  Until now, I didn't realize what Raymond said above about Counters (that the core ability is to write c['x'] += 1 without a KeyError).  So, thanks to this report, I learned that today!

One thing that did surprise me in the example is that I expected the repr to be in insertion order in 3.7.  The class description says 'It is an unordered collection where elements are stored as dictionary keys' and I was wondering if that was still true since dicts now have a guaranteed order.  I tried it on the example, which still printed Counter({'blue': 3, 'red': 2, 'green': 1})!  Of course it makes sense after looking at the code because it calls `most_common` in the repr, but I hadn't realized that before.  So, two things learned about Counter today.   :-)

Anyway, writing this here to ask about the wording regarding 'unordered collection'.

Thanks!
msg311710 - (view) Author: Anthony Flury (anthony-flury) * Date: 2018-02-06 07:20
Cheryl : 
When you iterate around a counter instance it does return keys in the order they are first encountered/inserted - so I agree with you that it is an ordered collection from Python 3.7 onwards (although the iteration and the repr are ordered in different orders.
History
Date User Action Args
2018-02-06 07:20:23anthony-flurysetmessages: + msg311710
2018-02-06 02:21:09cheryl.sabellasetnosy: + cheryl.sabella
messages: + msg311702
2018-02-05 13:11:56anthony-flurysetmessages: + msg311666
2018-02-05 04:36:16rhettingersetstatus: open -> closed

assignee: docs@python -> rhettinger

nosy: + rhettinger
messages: + msg311643
resolution: not a bug
stage: resolved
2018-02-05 00:34:34anthony-flurycreate