This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Improve error message for string concatenation via `sum`
Type: behavior Stage:
Components: Interpreter Core Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Phillip.M.Feldman@gmail.com, mpaolini, steven.daprano, veky
Priority: normal Keywords:

Created on 2020-09-07 19:21 by Phillip.M.Feldman@gmail.com, last changed 2022-04-11 14:59 by admin.

Messages (8)
msg376526 - (view) Author: Phillip M. Feldman (Phillip.M.Feldman@gmail.com) Date: 2020-09-07 19:21
I'm not sure whether this is a bug or a feature request, but it seems as though the following should produce the same result:

In [1]: 'a' + 'b' + 'c'
Out[1]: 'abc'

In [2]: sum(('a', 'b', 'c'))
TypeError Traceback (most recent call last)
in
----> 1 sum(('a', 'b', 'c'))

TypeError: unsupported operand type(s) for +: 'int' and 'str'

The error message is confusing (there is no integer).
msg376533 - (view) Author: Marco Paolini (mpaolini) * Date: 2020-09-07 21:33
This happens because the default value for the start argument is zero , hence the first operation is `0 + 'a'`
msg376534 - (view) Author: Marco Paolini (mpaolini) * Date: 2020-09-07 21:49
also worth noting, the start argument is type checked instead. Maybe we could apply the same checks to the items of the iterable?

python3 -c "print(sum(('a', 'b', 'c'), start='d'))"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
TypeError: sum() can't sum strings [use ''.join(seq) instead]


see https://github.com/python/cpython/blob/c96d00e88ead8f99bb6aa1357928ac4545d9287c/Python/bltinmodule.c#L2310
msg376536 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2020-09-07 22:16
As Marco says, the exception message is because the default value for start is 0, and you can't concatenate strings to the integer 0.

You get the same error if you try to concatenate lists:

    py> sum([[], []])
    TypeError: unsupported operand type(s) for +: 'int' and 'list'


However, even if you provide a default of the empty string, "", sum will still reject string arguments. This is intentional, as repeatedly concatenating strings may be extremely inefficient and slow, depending on the specific circumstances.


The default of 0 is documented, as is the intention that sum be used only for numeric addition. See `help(sum)` or the docs on the website.
msg376537 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2020-09-07 22:24
Marco, sum should be as fast as possible, so we don't want to type check every single element. But if it is easy enough, it might be worth checking the first element, and if it fails, report:

    cannot add 'type' to start value

where 'type' is the type of the first element. If that is str, then concatenate

    (use ''.join(iterable) instead)

to the error message.
msg376538 - (view) Author: Marco Paolini (mpaolini) * Date: 2020-09-07 22:32
I was thinking to just clarify a bit the error message that results from Py_NumberAdd. This won't make it slower in the "hot" path

doing something like (not compile tested, sorry)

--- a/Python/bltinmodule.c
+++ b/Python/bltinmodule.c
@@ -2451,8 +2451,13 @@ builtin_sum_impl(PyObject *module, PyObject *iterable, PyObject *start)
         Py_DECREF(result);
         Py_DECREF(item);
         result = temp;
-        if (result == NULL)
+        if (result == NULL) {
+         if (PyUnicode_Check(item) || PyBytes_Check(item) || PyByteArray_Check(item))
+             PyErr_SetString(PyExc_TypeError,
+                   "sum() can't sum bytes, strings or byte-arrays [use .join(seq) instead]");
+           }
             break;
+       }
     }
     Py_DECREF(iter);
     return result;
msg376539 - (view) Author: Phillip M. Feldman (Phillip.M.Feldman@gmail.com) Date: 2020-09-07 23:27
I'd forgotten about ''.join; this is a good solution.  I withdraw my
comment.

On Mon, Sep 7, 2020 at 3:25 PM Steven D'Aprano <report@bugs.python.org>
wrote:

>
> Steven D'Aprano <steve+python@pearwood.info> added the comment:
>
> Marco, sum should be as fast as possible, so we don't want to type check
> every single element. But if it is easy enough, it might be worth checking
> the first element, and if it fails, report:
>
>     cannot add 'type' to start value
>
> where 'type' is the type of the first element. If that is str, then
> concatenate
>
>     (use ''.join(iterable) instead)
>
> to the error message.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue41740>
> _______________________________________
>
msg376542 - (view) Author: Vedran Čačić (veky) * Date: 2020-09-08 03:50
The fact that you've forgotten about it is exactly why sum tries to educate you (despite Python being "the language of consenting adults" in most other aspects). The problem (why it doesn't do a good job in that aspect) is that people usually expect sum to act like a 2-arg form of functools.reduce, while in fact it acts like a 3-arg form, with 0 as the initializer.

I doubt that Python will change regarding that, but you can sharpen your intuition by asking yourself: what do you expect sum([]) to be? If 0, then you're inconsistent. :-)
History
Date User Action Args
2022-04-11 14:59:35adminsetgithub: 85906
2020-09-08 03:50:18vekysetnosy: + veky
messages: + msg376542
2020-09-07 23:27:44Phillip.M.Feldman@gmail.comsetmessages: + msg376539
2020-09-07 22:32:09mpaolinisetmessages: + msg376538
2020-09-07 22:24:45steven.dapranosetmessages: + msg376537
2020-09-07 22:16:23steven.dapranosetnosy: + steven.daprano

messages: + msg376536
title: string concatenation via `sum` -> Improve error message for string concatenation via `sum`
2020-09-07 21:49:37mpaolinisetmessages: + msg376534
2020-09-07 21:33:49mpaolinisetnosy: + mpaolini
messages: + msg376533
2020-09-07 19:21:29Phillip.M.Feldman@gmail.comcreate