This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: alternate fast builtin sum
Type: performance Stage:
Components: Interpreter Core Versions: Python 3.4
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: MrJean1, pitrou, rhettinger
Priority: normal Keywords: patch

Created on 2008-05-07 19:53 by MrJean1, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
bltinmodule2.c.diff MrJean1, 2008-05-07 19:53 Forward diff against rev 62728 from the trunk
Messages (7)
msg66369 - (view) Author: Jean Brouwers (MrJean1) Date: 2008-05-07 19:53
The attached patch bltmodule2.c.diff is a different implementation of 
the fast summation code for your consideration.

It uses three separate sums to add ints, floats and other objects.  All 
ints are accumulated into a C long, reset even after overflow.  All 
floats are added into a C double without mixing ints.  Other objects are 
handled as before.

This version has been tested with Python 2.5.2 as well and passed the 
existing regressions.

/Jean Brouwers
msg66378 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-05-07 22:09
What is the benefit of this new implementation? If it is faster, can you
give us any performance numbers? (for example using timeit)
msg66382 - (view) Author: Jean Brouwers (MrJean1) Date: 2008-05-08 00:21
I did not compare performance (yet).  If there is a particular test 
case, I am happy to measure that.  Also, I made sure that the results 
and errors (!) for non-numeric items are the same as for the original, 
Slow Summation case.

The version I submitted is perhaps only slightly faster for most int and 
float cases, since it never creates and adds the 0 int default for the 
optional 'start' argument.

For other int and float cases it will certainly be faster.  Here are a 
few (extreme) examples.

1) Assume a long list consisting of one float followed by all ints.  The 
existing code will switch to the float loop at the very first float 
encountered and then use C double add (plus the FPE protection overhead) 
to add all subsequent ints.  My version will use C long adds for all 
ints, C double add for the float plus one C double add to combine the 
long and double results.

2) Consider a list of just 2 items, one float and one int (and no 
'start' argument).  The current code will first create a result object 
for the 'start' default 0.  Then, it enters the int loop (since result 
is an int, 0) and gets the first item, the float.  Since the item is not 
an int, it is added to the current result using PyNumber_Add.  Next, it 
falls into the float loop, since the result is now a float, presumably.  
Next, it gets the int item and adds that int as C double to the C double 
in f_result.  The end result C double f_result is returned as float.  
Quite costly for just one float and one int and maybe worse than than 
Slow Summation.

3) My version does not create a 0 int for the missing 'start'.  That is 
only created when absolutely needed at some later time.  In addition, 
all int items are added in C long i_sum and all floats are added in C 
double f_sum, regardless of the order they occur in the sequence.  At 
the end of the iteration loop, the C long i_sum is added to C double 
f_sum once and a single float object is created and returned.  That is 
very efficient and without any unnecessary overhead.

In other words, for sequences with mixed int and float objects the 
difference can be significant, depending on where the very first float 
item is located in the sequence and how may int follow.

However, for sequences with items of the same type, all int or all 
float, there will be little or no difference.  

/Jean Brouwers

PS) There are two additional things to consider for the float case:

- Ooverflow (ERANGE) and value (EDOM) errors in C double adds

- Using a compensating summation for floats, like Kahan's method.
msg66387 - (view) Author: Jean Brouwers (MrJean1) Date: 2008-05-08 01:44
Here is one, simple comparison between 3 different builtin_sum's, all 
measured on the same machine, MacOS X 10.4.11 (Intel Duo):

1) Python 2.5.1 ActiveState with standard Slow Summation:

% python2.5.1 -mtimeit --setup="x=[1.0e12, 7]*1000000" "sum(x)"
10 loops, best of 3: 177 msec per loop


2) Python 2.6a2 (built from source) with the existing Fast Summation:

% python2.6a2 -mtimeit --setup="x=[1.0e12, 7]*1000000" "sum(x)"
10 loops, best of 3: 52.5 msec per loop


3) Python 2.5.2 (built from source) with the submitted Fast Summation:

% python2.5.2 -mtimeit --setup="x=[1.0e12, 7]*1000000" "sum(x)"
10 loops, best of 3: 48.2 msec per loop
msg66389 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-05-08 02:12
Will take a look at this one but am inclined to reject it.  The timing 
benefits are marginal at best and only help in the atypical case of 
mixed floats and ints. The patch changes the type and ordering of 
intermediate sums, resulting in different answers than the standard sum
().

In contrast, the existing fast sum() was carefully designed to mimick 
the operations of the original sum(), so it always produces the same 
results.
msg66409 - (view) Author: Jean Brouwers (MrJean1) Date: 2008-05-08 07:13
The one example given may not be convincing, other ones may be.

What is it in the submitted version which is not designed carefully or 
which may not produce the same results?
msg66410 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-05-08 07:45
The approach of using separate accumulations is intrinsically flawed if 
you want the same result as a regular sum().  Here's a small dataset 
that shows why you can't accumulate separate sums by type:

>>> d = [1000000000000000, -
1000000000000000.0, .0000000000000001, .0000000000000001]
>>> sum(d)
2e-16
>>> d[0] + sum(d[1:])
0.0

Closing this one.  The approach is doomed and the possible benefits 
over the current approach are microscopic at best and only apply to 
unusual corner cases. 

I also prefer the current approach because it is easily extended to 
LongLongs and it is easy to show that the code is correct (at least 
with respect to producing the same result as a regular sum()).
History
Date User Action Args
2022-04-11 14:56:34adminsetgithub: 47034
2012-12-06 16:24:23eric.araujosetversions: + Python 3.4, - Python 2.6
2008-05-08 07:45:09rhettingersetstatus: open -> closed
resolution: rejected
messages: + msg66410
2008-05-08 07:13:47MrJean1setmessages: + msg66409
2008-05-08 02:12:36rhettingersetassignee: rhettinger
messages: + msg66389
nosy: + rhettinger
2008-05-08 01:45:13MrJean1setmessages: + msg66387
2008-05-08 00:21:12MrJean1setmessages: + msg66382
2008-05-07 22:09:42pitrousetnosy: + pitrou
messages: + msg66378
2008-05-07 19:53:42MrJean1create