This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Make general function sum() use Numpy's sum when obviously possible
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.8, Python 3.7, Python 3.6, Python 3.4, Python 3.5, Python 2.7
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: rhettinger, serhiy.storchaka, urielias
Priority: normal Keywords:

Created on 2018-02-21 14:06 by urielias, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (3)
msg312495 - (view) Author: Uri Elias (urielias) Date: 2018-02-21 14:06
True at least to PY2.7 and 3.5 - given x is a numpy array, say np.random.rand(int(1e6)), then sum(x) is much slower (for 1e6 elements - 2 orders of magnitude) than x.sum(). 
Now, while this is understandable behaviour, I wander how hard it is to add a condition that if argument is a Numpy object then use its own sum. 
I think many programmers aren't aware of that, so all in all it can improve the performance of a lot of existing code.
msg312497 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-02-21 17:54
> I think many programmers aren't aware of that, so all in all 
> it can improve the performance of a lot of existing code.

We could create a __sum__ dunder method to allow classes to override the normal behavior of sum, but that isn't worth it because the speed issue has almost nothing to do with summation.  For example, the timings would also be slow for min(a), max(a), list(a), set(a), etc where a=np.random.rand(int(1e6)).

In general, if anything outside of numpy loops over a numpy array, then every datum has to be converted to a typed, reference-counted python object before the function can begin to do its work.  These are the facts-of-life when dealing with numpy.  The usual advice is to manipulate numpy arrays only with numpy tools because they all know how to operate on the data natively without allocating Python objects for every datum.
msg312502 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-02-21 18:44
I concur with Raymond.

NumPy is a third-party library, Python builtins can't depend on it. This could be solved by introducing a special protocol, but this isn't free. It adds a burden on writing and maintaining code and tests, and adds a runtime overhead of checking if the protocol is supported. This will slowdown sum() for every other collection. It can even slowdown it for short NumPy arrays.

And, as Raymond has noted, if introduce the __sum__ protocol, why not introduce protocols for min(), max(), sort(), all(), and for any other builtin? This has the same drawbacks as for sum(), but multiplied by many times.

If you use NumPy arrays just use NumPy array methods. Otherwise what is the reason of using NumPy?
History
Date User Action Args
2022-04-11 14:58:58adminsetgithub: 77076
2018-02-21 23:41:12rhettingersetstatus: open -> closed
resolution: rejected
stage: resolved
2018-02-21 18:44:40serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg312502
2018-02-21 17:54:40rhettingersetnosy: + rhettinger
messages: + msg312497
2018-02-21 17:49:45ppperrysettype: performance -> enhancement
2018-02-21 17:49:29ppperrysettype: enhancement -> performance
2018-02-21 17:48:42ppperrysetcomponents: + Library (Lib), - 2to3 (2.x to 3.x conversion tool)
2018-02-21 14:06:26urieliascreate