classification
Title: sum() does not return only the sum of a sequence of numbers + PEP8 reccomandation
Type: behavior Stage: resolved
Components: Documentation Versions: Python 3.3, Python 3.4, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: docs@python, ezio.melotti, marco.buttu, python-dev, r.david.murray, ronaldoussoren
Priority: normal Keywords:

Created on 2013-07-10 13:11 by marco.buttu, last changed 2013-07-10 20:24 by r.david.murray. This issue is now closed.

Messages (10)
msg192805 - (view) Author: Marco Buttu (marco.buttu) * Date: 2013-07-10 13:11
The documentaion of sum():

    Returns the sum of a sequence of numbers (NOT strings) plus the 
    value of parameter 'start' (which defaults to 0).  
    When the sequence is empty, returns start.

A. According to the PEP-8 it should be: "Return the sum...", and
   "When the sequence is empty, return start.", like the other docs. 
   For instance:

   >>> print(len.__doc__)
   len(object) -> integer

   Return the number of items of a sequence or mapping.

B. When the second argument is a tuple or a list, you can add sequences
   of sequences:

   >>> sum([['a', 'b', 'c'], [4]], [])
   ['a', 'b', 'c', 4]
   >>> sum(((1, 2, 3), (1,)), (1,))
   (1, 1, 2, 3, 1)

C. sum() takes not just sequences:

   >>> sum({1: 'one', 2: 'two'})
   3

Maybe it is not a good idea to give a complete description of sum() in the docstring, but perhaps something "good enough". In any case, I think the lack of the PEP-8 recommendation should be fixed.
msg192808 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-07-10 13:51
Perhaps we could add something like "Also works, though possibly inefficiently, on any iterable whose elements support addition".  The biggest part of the sphinx docs for this are about what to use instead, and that doesn't really seem appropriate for a docstring.  So it may indeed be best to just not mention it in the docstring.
msg192811 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2013-07-10 14:04
There's an annoyingly long discussion about sum() on python-ideas.

IMHO the documentation should mention, as it does now, that sum is intended to be used with a sequence of numbers even it does work with most objects that support the + operator (such as by implementing __add__). In particular, using sum with a sequence of lists or tuples is extremely inefficient.

The fact that sum({1:'a', 2: 'b'}) works is a side effect of the how python works with sequences and IMHO doesn't have to be documented in every function that accepts a sequence as an argument.
msg192814 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-07-10 14:44
OK, so your vote is to leave the doc string alone (except for the PEP8 changes), right?
msg192815 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2013-07-10 14:50
Yes, the docstring isn't meant to be exhaustive documentation. The manual is more exhaustive and, as you noted, already contains links to alternatives.
msg192816 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2013-07-10 15:13
Currently sum() is intended to work with numbers, explicitly forbids strings (as noted in the docstring), but also works with other types (even though it's inefficient).

If we want to document this, a possible wording might be:
    Returns the sum of a sequence of numbers plus the 
    value of parameter 'start' (which defaults to 0).  
    When the sequence is empty, returns start.
    Using sum() with a sequence of strings is not allowed,
    and might be inefficient with sequences of other types.

We should also consider that the implementation/behavior might change in future, but we can always update the docstring again.

+1 on the PEP 8 changes.
msg192818 - (view) Author: Marco Buttu (marco.buttu) * Date: 2013-07-10 15:21
By reading the Ronald's comment, I realized it is better to keep it simple, so I agree with him.

The "extremely inefficient" reason seems to be less important (Python 3.3):

$ python -m timeit -s "a=['a']*10000; b=['b']*10000; a+b"
100000000 loops, best of 3: 0.00831 usec per loop
$ python -m timeit -s "a=['a']*10000; b=['b']*10000; sum([a, b], [])"
100000000 loops, best of 3: 0.0087 usec per loop
msg192819 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2013-07-10 15:33
Appending a sequence of lists with sum is inefficient because it (currently) does a lot of copying, and that gets noticable when you sum a larger number of lists

Note how using sum for add 200 lists is more than twice as long as adding 100 lists:

ronald@gondolin[0]$ python -m timeit -s "lists=[['a']*100 for i in range(100)]" "sum(lists, [])"
100 loops, best of 3: 2.04 msec per loop


ronald@gondolin[0]$ python -m timeit -s "lists=[['a']*100 for i in range(200)]" "sum(lists, [])"
100 loops, best of 3: 9.2 msec per loop


Also note how using itertools.chain is both a lot faster and behaves better:

ronald@gondolin[0]$ python -m timeit -s "import itertools; lists=[['a']*100 for i in range(100)]" "list(itertools.chain.from_iterable(lists))"
10000 loops, best of 3: 165 usec per loop


ronald@gondolin[0]$ python -m timeit -s "import itertools; lists=[['a']*100 for i in range(100)]" "list(itertools.chain.from_iterable(lists))"
10000 loops, best of 3: 155 usec per loop

(I used python2.7 for this, the same behavior can be seem with python 3).

See also #18305, which proposed a small change to how sum works which would fix the performance problems for summing a sequence of lists (before going too far and proposing to add special-case tuples and string)
msg192831 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-07-10 20:23
New changeset 4b3b87719e2c by R David Murray in branch '3.3':
#18424: PEP8ify the tense of the sum docstring.
http://hg.python.org/cpython/rev/4b3b87719e2c

New changeset 38b42ffdf86b by R David Murray in branch 'default':
Merge: #18424: PEP8ify the tense of the sum docstring.
http://hg.python.org/cpython/rev/38b42ffdf86b

New changeset c5f5b5e89a94 by R David Murray in branch '2.7':
#18424: PEP8ify the tense of the sum docstring.
http://hg.python.org/cpython/rev/c5f5b5e89a94
msg192832 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-07-10 20:24
Ok, pep8 changes committed.
History
Date User Action Args
2013-07-10 20:24:50r.david.murraysetstatus: open -> closed
type: behavior
messages: + msg192832

resolution: fixed
stage: resolved
2013-07-10 20:23:47python-devsetnosy: + python-dev
messages: + msg192831
2013-07-10 15:33:08ronaldoussorensetmessages: + msg192819
2013-07-10 15:21:31marco.buttusetmessages: + msg192818
2013-07-10 15:13:37ezio.melottisetnosy: + ezio.melotti
messages: + msg192816
2013-07-10 14:50:08ronaldoussorensetmessages: + msg192815
2013-07-10 14:44:41r.david.murraysetmessages: + msg192814
2013-07-10 14:04:42ronaldoussorensetnosy: + ronaldoussoren
messages: + msg192811
2013-07-10 13:52:11r.david.murraysetversions: + Python 3.4, - Python 3.1, Python 3.2
2013-07-10 13:51:38r.david.murraysetnosy: + r.david.murray
messages: + msg192808
2013-07-10 13:11:34marco.buttucreate