classification
Title: Full precision summation
Type: performance Stage:
Components: Extension Modules Versions: Python 2.6
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: mark.dickinson Nosy List: MrJean1, haypo, jcea, mark.dickinson, rhettinger, terry.reedy, tim.peters
Priority: normal Keywords: patch

Created on 2008-05-11 16:16 by MrJean1, last changed 2008-12-11 19:54 by mark.dickinson. This issue is now closed.

Files
File name Uploaded Description Edit
cmathmodule.c.2.6a3.diff MrJean1, 2008-05-11 16:16
mathmodule.c.2.6a3.diff MrJean1, 2008-05-11 16:17
test_math_sum1.py MrJean1, 2008-05-11 16:17
mathmodule.c.2.6a3.take2.diff MrJean1, 2008-05-12 16:22
floatsum.c MrJean1, 2008-05-12 17:35
msum.py mark.dickinson, 2008-05-16 14:48 Version of msum that takes care of overflow
msum3.py MrJean1, 2008-05-16 15:57 Modified msum.py with overflow file
crsum.py mark.dickinson, 2008-05-16 19:43 Correctly rounded sum of nonadjacent floats
msum4.py mark.dickinson, 2008-05-17 16:28 msum, with overflow, correct rounding and special-value handling
mathmodule4.c.2.6a3.diff MrJean1, 2008-05-18 21:21 msum4 patch for mathmodule.c of Python 2.6a3
cmathmodule4.c.2.6a3.diff MrJean1, 2008-05-18 21:22 msum4 patch for cmathmodule.c of Python 2.6a3
test_math_sum4.py MrJean1, 2008-05-18 21:23 msum4.py tests for math- and cmathmodule.c
mathmodule5.c.2.6a3.diff MrJean1, 2008-05-19 22:18 update #5 of mathmodule.c for Python 2.6a3
cmathmodule5.c.2.6a3.diff MrJean1, 2008-05-19 22:18 update #5 of cmathmodule.c for Python 2.6a3
test_math_sum5.py MrJean1, 2008-05-19 22:20 update #5 of the test script
mathmodule11.c.2.6a3.diff MrJean1, 2008-05-22 16:05 rev 11 patch for mathmodule.c for Python 2.6a3
test_math_sum11.py MrJean1, 2008-05-22 16:06 rev 11 of test_math_sum.py for Python 2.6a3
mathmodule12.c.2.6a3.diff MrJean1, 2008-05-22 16:11 rev 12 of test_math_sum.py for Python 2.6a3
test_math_sum12.py MrJean1, 2008-05-22 16:12 rev 12 of test_math_sum.py for Python 2.6a3
mathmodule19.c.2.6a3.diff MrJean1, 2008-05-23 00:12 rev 19 patch for mathmodule.c for Python 2.6a3
test_math_sum19.py MrJean1, 2008-05-23 00:13 rev 19 of test_math_sum.py for Python 2.6a3
mathsum_IA32.patch mark.dickinson, 2008-06-03 01:19
fsum11.patch mark.dickinson, 2008-08-14 11:10
Messages (47)
msg66641 - (view) Author: Jean Brouwers (MrJean1) Date: 2008-05-11 16:16
Attached are 2 patches and a test script adding a function sum to the 
math and cmath modules of Python 2.6a3.  The sum is calculated using a 
full precision summation method.

The test script compares the result of the functions with the original 
implementation in Python.  All tests pass with 4 different builds of 
Python 2.6a3:

- GNU gcc 4.0.1 on MacOS X 10.4.11 (Intel Core Duo), 32-bit
- GNU gcc 4.1.2 on RHEL 3 update 7 (Opteron), 64-bit
- Sun C 5.8 on Solaris 10 (Opteron), both 32- and 64-bit
msg66655 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-05-11 19:31
This looks pretty good at first glance.  Will review more throughly 
later this week.  It does need docs and unittests.
msg66663 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-05-11 20:41
Some comments/questions:

(1) It seems wasteful to wrap every addition in PyFPE_START/END_PROTECT, 
and to check for NaNs and infinities after every addition.  I'd wrap the 
whole thing in a single PyFPE_START/END_PROTECT, replace _math_sum_add 
with an inline addition, and just let NaNs and Infs sort themselves out.

If the result comes out finite (as it surely will in almost all 
applications), then all the summands were necessarily finite and there's 
nothing more to do.  If the result comes out as an infinity or NaN, you 
need to decide whether it's appropriate to return a NaN, an infinity, or 
to raise OverflowError or ValueError.  I'm not sure it's worth trying to 
do the right thing for all special value cases, but if you do want to 
follow 'spirit of IEEE 754' rules for special values, they should look 
something like this:

  (1) if the summands include a NaN, return a NaN
  (2) else if the summands include infinities of both signs, raise
      ValueError,
  (3) else if the summands include infinities of only one sign, return
      infinity with that sign,
  (4) else (all summands are finite) if the result is infinite, raise
      OverflowError.  (The result can never be a NaN if all summands
      are finite.)

Note that some sums involving overflow won't be computed correctly:  
e.g. [1e308, 1e308, -1e308] will likely sum to infinity instead of
returning 1e308.  I don't see any easy way around this, and it's 
probably not worth worrying about.

(2) The algorithm is only guaranteed to work correctly assuming IEEE 754 
semantics.  Python currently doesn't insist on IEEE 754 floating point, 
so what should happen on non IEEE-754 machines?

(3) Rather than duplicating the math module code in cmathmodule.c, why 
not have the complex version simply sum real parts and imaginary parts 
separately, using a version of the code that's already in mathmodule.c?
msg66666 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-05-11 20:49
One more question:

What are the use cases for an exact summation algorithm?  That is, in what 
situations does one care about exactness rather than simply accuracy?  I 
know that loss of accuracy is a problem in things like numeric integration 
routines, but something like Kahan summation (faster and simpler, but not 
exact) usually takes care of that.
msg66734 - (view) Author: Jean Brouwers (MrJean1) Date: 2008-05-12 16:22
Mark,
Thank you very much for your comments.  Here is my initial response to the
first 3.

(1) Attached is an attempt to address the 1st issue (just the
mathmodule). The macros PyFPE_START_PROTECT/_END_PROTECT have been moved
outside the main loop and the errno is set following the IEEE 754 rules as
you suggested.

One related issue is testing these, how can a NaN and +/-Infinity float
object be created in Python?

(2) On your 2nd comment, supporting non-IEEE floating point, perhaps the
Kahan method should be used in that case.  If so, the next question is how
to detect that?  There are two symbols in pyconfig.h HAVE_IEEEFP_H and
HAVE_LIBIEEE.  Are those the proper ones to determine IEEE floating point
support?

(3) On the 3rd comment, Raymond and I did discus using a single function to
be called by the math and cmath modules.  The question is where should that
function reside?  The math and cmath modules are not the right place since
both are loadable modules.  It will have to be somewhere inside the Python
main/core.

Also, depending on the implementation of that function, it may require
iterating the complex sequence twice.  And that will force the C complex
numbers to be created twice by the PyComplex_AsCComplex() call.  Would that
be a concern?

/Jean Brouwers

On Sun, May 11, 2008 at 1:49 PM, Mark Dickinson <report@bugs.python.org>
wrote:

>
> Mark Dickinson <dickinsm@gmail.com> added the comment:
>
> One more question:
>
> What are the use cases for an exact summation algorithm?  That is, in what
> situations does one care about exactness rather than simply accuracy?  I
> know that loss of accuracy is a problem in things like numeric integration
> routines, but something like Kahan summation (faster and simpler, but not
> exact) usually takes care of that.
>
> __________________________________
> Tracker <report@bugs.python.org>
> <http://bugs.python.org/issue2819>
> __________________________________
>
msg66745 - (view) Author: Jean Brouwers (MrJean1) Date: 2008-05-12 17:35
My apologies for the messy post.  I replied to the email instead of 
posting my response.

/Jean Brouwers

PS) Attached is *an* example of the math_sum() and cmath_sum() functions 
using the same, shared function float_sum().  Perhaps, that resides in 
Objects/floatobject.c?
msg66749 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-05-12 18:42
> One related issue is testing these, how can a NaN and +/-Infinity 
> float object be created in Python?

In 2.6 and 3.0 (but not 2.5 and older), float('nan'), float('inf') and 
float('-inf') should all work reliably across platforms (or at least 
those platforms that support infs and nans).  If they don't it's a bug.

> (2) On your 2nd comment, supporting non-IEEE floating point, perhaps
> the Kahan method should be used in that case.  If so, the next
> question is how to detect that?

Actually, I think you could probably just leave the algorithm exactly as 
it is, but put a warning in the documentation that the exactness only 
applies in the presence of IEEE 754 semantics.  Practically everybody's 
on an IEEE 754 platform anyway.

> There are two symbols in pyconfig.h HAVE_IEEEFP_H and
> HAVE_LIBIEEE.  Are those the proper ones to determine IEEE floating
> point support?

I'm not sure that either of these is the right thing.  Neither is 
defined on my MacBook, for example.

> (3) On the 3rd comment, Raymond and I did discus using a single 
> function to be called by the math and cmath modules.

I think you're right that it's easier to just duplicate the code.
It's a nice feature that this function only has to pass once through the 
data, and it doesn't seem worth losing that to save a little bit of code 
duplication.

I still wonder whether there's a way to avoid incorrectly signaling 
overflow in the case where the result is finite. (e.g. sum([1e308, 
1e308, -1e308])).
msg66750 - (view) Author: Jean Brouwers (MrJean1) Date: 2008-05-12 18:45
It turns out, float('nan') creates a Nan and float('[+|-]inf') creates a 
[signed] Infinity object.  That answers my earlier question.

/Jean Brouwers
msg66753 - (view) Author: Jean Brouwers (MrJean1) Date: 2008-05-12 19:16
The current results are quite "interesting"

>>> math.sum([1e308, 1e308, -1e308])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: math range error

>>> math.sum([1e308, -1e308, 1e308])
1e+308


Handling this case would require holding the finite, prior sum at an 
intermediate overflow in a separate array, say overflows.  Then, add the 
partials which may create additional overflows.  Finally, keep adding 
the overflows (accurately?) until none remain or until none can be added 
without overflow.

/Jean Brouwers
msg66755 - (view) Author: Jean Brouwers (MrJean1) Date: 2008-05-12 19:45
There may be another reason to use a single summation function.  The 
summation functions does need (a copy of) the is_error() function from 
the math module.

The cmath module has a similar function called math_error() which 
slightly different from is_error().

It would be better, more consistent if both modules used the same 
function, say is_error() moved** to file Object/floatobject.c

Then, the math and cmath module can use ist_error() instead of each 
their own.  Also, the summation function can use it and function 
float_pow() in floatobject.c could.

/Jean Brouwers

**) and renamed to e.g. float_error().
msg66764 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-05-12 23:27
Rather keep it in mathmodule.c, not floatobject.c.
We should keep extension code out of the core types.
msg66844 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-05-15 05:00
When you need full precision, the Kahan approach helps but doesn't make 
guarantees and can sometimes hurt (it makes some assumptions about the 
data).  One use case in is computing stats like a mean where many of 
the larger magnitude entries tend to cancel out but only after clipping 
bits off of the lower magnitude components.
msg66947 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-05-16 14:47
Here's (msum.py) an example in Python of one fairly straightforward way of 
dealing with overflow correctly, without needing more than one pass 
through the data, and without significant slowdown in the normal case.
(The Python code is needlessly inefficient in places, notably in that 
partials[1:] creates a new list;  this is obviously not a problem in C.)

The idea is essentially just to maintain the sum modulo integer multiples 
of 2.**1024.  partials[0] is reserved for keeping track of the current 
multiple of 2.**!024.  So at each stage, the sum so far is 
sum(partials[1:], 0.0) + 2.**1024 * partials[0].

I'm 97.3% convinced that the proof of correctness goes through:  it's 
still true with this modification that partials always consists of 
nonadjacent, nonzero values of increasing magnitude.  One of the keys to proving this is to note that for any value x between 2**1023 and 2**1024, 
both x-2**1023 and x-2**1024 are exactly representable.

---

There's one more 'nice-to-have' that I think should be considered:  it 
would be nice if the result of msum were always correctly rounded.  One 
aspect of correct rounding is that it provides a guarantee that the sum is 
independent of the order of the summands, so

msum(list) == msum(sorted(list))

would hold true.
msg66953 - (view) Author: Jean Brouwers (MrJean1) Date: 2008-05-16 15:57
Two tests failed with Python 2.6a3 on MacOS X Intel.

Test 11 failed: 9007199254740992.0 vs 9007199254740991.0 expected for 
[9007199254740992.0, -0.5, -5.5511151231257827e-17].

Test 12 failed: inf vs 1.7976931348623157e+308 expected for 
[8.9884656743115785e+307, -1.0, 8.9884656743115795e+307].

/Jeab Brouwers
msg66954 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-05-16 16:03
> Test 11 failed: 9007199254740992.0 vs 9007199254740991.0 expected for 
> [9007199254740992.0, -0.5, -5.5511151231257827e-17].

Yes: that's the lack of correct rounding rearing its ugly head...

> Test 12 failed: inf vs 1.7976931348623157e+308 expected for 
> [8.9884656743115785e+307, -1.0, 8.9884656743115795e+307].

I'm still trying to work out how to get around this one;  again, the
result's not out by much, but it would be nice to be able to guarantee 
correctly rounded answers *all* the time, instead of just *most* of the 
time.
msg66965 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-05-16 19:43
Ensuring correct rounding isn't as onerous as I expected it to be.  
crsum.py is a snippet of Python code showing how to add nonadjacent floats 
and get the correctly rounded result.
msg67006 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-05-17 16:27
Okay, just to show it's possible:

Here (msum4.py) is a modified version of Raymond's recipe that deals 
correctly with:

 (1) intermediate overflows
 (2) special values (infs and nans) in the input, and
 (3) always gives correctly rounded results.

The file contains more tests, and a proof of correctness.  The algorithm 
still makes only a single pass through the given iterable, and there 
should be minimal slowdown in the common case.  It's still only 60-70 
lines of Python code, so I don't think it would be unreasonable to aim 
to include these modifications in the C version.
msg67021 - (view) Author: Jean Brouwers (MrJean1) Date: 2008-05-18 01:40
I intend to submit a C version of msum4 shortly.
msg67034 - (view) Author: Jean Brouwers (MrJean1) Date: 2008-05-18 19:25
Attached is the patch for the the mathmodule.c file of Python 2.6a3 
containing the C version of the msum() function from Marks's msum4.py 
Python implementation.

Please review the C code, in particular the setting of _do_sum_pow_2 for 
FLT_RADIX not equal to 2.

The results of the C and Python versions match (on 32-bit MacOS X 
Intel), using the test_math_sum4.py script.  That includes all the tests 
from msum4.py and from Raymond's recipe.

More testing and the cmath version are still to be done.

/Jean Brouwers
msg67045 - (view) Author: Jean Brouwers (MrJean1) Date: 2008-05-18 21:21
Attached are updated patches for both the mathmodule.c and cmathmodule.c 
files for Python 2.6a3 and for the test_math_sum4.py script.  Please 
ignore the ones posted earlier.
msg67053 - (view) Author: Jean Brouwers (MrJean1) Date: 2008-05-19 01:21
The tests of the test_math_sum4.py script all pass with the latest math- 
and cmathmodule.c files for 4 different builds of Python 2.6a3:

- GNU gcc 4.0.1 on MacOS X 10.4.11 (Intel Core Duo), 32-bit
- GNU gcc 4.1.2 on RHEL 3 update 7 (Opteron), 64-bit
- Sun C 5.8 on Solaris 10 (Opteron), both 32- and 64-bit

Also interesting might be that the tests run 25+ times faster with the C 
math.sum compared to the Python implementation of msum().
 
/Jean Brouwers
msg67067 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-05-19 17:56
The latest mathmodule patch (file 10371) looks pretty good to me.  I 
haven't looked at the complex version yet.  A few comments:

 - for setting _do_sum_pow_2 I think you should use ldexp instead of pow;  
there's a much greater chance of pow giving inexact results.

 - trying to support FLT_RADIX != 2 seems futile;  for a start, the whole 
algorithm and its proof would need to be reexamined.  Any code specific to 
FLT_RADIX != 2 would be very difficult to test---all of the Python 
buildbots are IEEE 754, for example.  I'm not sure what the best solution 
is:  perhaps math.sum should only be included when FLT_RADIX = 2, or (even 
better) when IEEE 754 is in use.  It shouldn't be too hard to add autoconf 
tests to detect the float format.

 - there are long lines in _do_sum, in violation of PEP 7;  maybe 
factoring out the memory-management code for the partials array would 
help?

 - it would be nice to have some brief comments before each of the _do_sum 
functions indicating its purpose.
msg67080 - (view) Author: Jean Brouwers (MrJean1) Date: 2008-05-19 22:18
Here is another patch for the mathmodule.c and cmathmodule.c files updated 
with the suggested and other modifications.  Note, if FLT_RADIX is not 2, 
the sum functions still exist but will raise a NotImplementedError.

An update of the test script is also attached.  The tests all pass on the 
same 4 Python 2.6a3 builds mentioned before.

/Jean Brouwers
msg67197 - (view) Author: Jean Brouwers (MrJean1) Date: 2008-05-22 16:04
Attached is revision 11 of the mathmodule.c patch for Python 2.6a3.  

This one includes Raymond's full precision summation, Mark's rounding 
partials addition and correct non-finites and error handling.

However, intermediate overflow will raise an OverflowError and a 
FLT_RADIX not equal 2 cause a NotImplementedError.

An updated test_math_sum11.py script is also attached.  All test cases 
pass on 5 different builds of Python 2.6a3:

- 32-bit MacOS X 10.4.11 (Intel) with gcc 4.0.1
- 32-bit MacOS X 10.3.9 (PPC) with gcc 3.3
- 64-bit RHEL 3 update 7 (Opteron) using gcc 4.1.2
- 32- and 64-bit Solaris 10 (Opteron) with Sun C 5.8.

/Jean Brouwers
msg67198 - (view) Author: Jean Brouwers (MrJean1) Date: 2008-05-22 16:11
Here is rev 12 of the mathmodule.c patch.  It is the same as rev 11 but 
with additional code removed as requested:

- no FLT_RADIX 2 check
- no errno illustration in _do_sum_add2()
- no _do_sum() callback function argument
- no option 'start' argument for math.sum.

The test_math.sum12.py script and results are the same as rev 11.

/Jean Brouwers
msg67210 - (view) Author: Jean Brouwers (MrJean1) Date: 2008-05-23 00:12
Here is another, cleaner revision 19 of the same mathmodule.c patch and 
the corresponding test_math_sum19.py script.

/Jean Brouwers
msg67211 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-05-23 00:22
Nice work Jean.  Marking the patch accepted. Mark please go ahead with 
commit.  

Once the commit has settled for a couple of days, go ahead with a 
separate patch to cover the rest of 754R logic for special values. 

After that one settles, close this issue and open a new one for a 
comparable patch in cmath.
msg67212 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-05-23 01:37
math module patch committed, r63542.

I'm still working on converting the tests to the unittest framework.
msg67213 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-05-23 02:37
Tests committed in r63543
msg67647 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-06-03 01:19
Here's a patch that fixes the problems with math.sum on x86 machines that 
aren't using SSE2.

On these machines, the patched version of math.sum does almost the entire 
calculation using long doubles instead of doubles.  Then in the final 
summation, the top long double is split into a double and a residue long 
double.
msg70295 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-07-26 10:38
Here's a patch giving an alternative implementation of math.fsum; it's 
based on Tim Peter's suggestions, works mostly with integer arithmetic, 
and so bypasses problems with double rounding and extended precision 
floats.

The patch is experimental:  it doesn't have sufficient tests, has no 
documentation, and it adds math.fsum alongside the current math.sum, to 
make it easy to compare the two implementations.

On my MacBook, math.fsum is a factor of 2-3 times slower than math.sum.
It's also longer and distinctly less elegant.  So its only real 
advantage is that it should fix the difficulties on x86 hardware.

We *really* need to sort math.sum out, one way or another, before the 
next beta.  Georg recently discovered another problem on x86/Linux:  see 
issue 3421.

Some options:

 (1) leave math.sum as it is, skip all tests on x86/Linux, and document
     the current behaviour.

 (2) investigate a version of math.sum that plays with the FPU control 
word to force 53-bit rounding (and round-half-even)

 (3) replace math.sum with the slower but (presumably) less erratic 
math.fsum, possibly just as a temporary measure.  This would at least 
get all tests passing.

Jean, Raymond:  what do you think?
msg70317 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-07-27 07:19
Tests related to overflow, special-values, intermediate overflow, and
results at or near the boundary of the floating-point range have been
removed in r65258.
msg70409 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-07-29 18:47
See r65292 for more updates to the test-suite:  I've replaced the Python
version of msum by a version based on lsum in the original ASPN recipe.
msg70424 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-07-30 12:02
Minor code cleanups, and fixes to special-value handling in r65299
msg70430 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-07-30 16:21
Renamed math.sum to math.fsum (as previously discussed) in r65308.

I think all that's left now is to add a note to the docs about the
problems on x86.
msg70438 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-07-30 20:24
Added documentation note about x86 problems in r65315.

Jean, Raymond:  is it okay to close this issue now?
msg70439 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-07-30 22:00
Here (fsum8.patch) is a clean version of the alternative fsum algorithm.

I'd like to push for using this in place of the existing algorithm.
msg70440 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2008-07-30 22:26
Mark, I don't currently have a machine with SVN and a compiler
installed, so can't play with patches.  I just want to note here that,
if you're concerned about speed, it would probably pay to eliminate all
library calls except one to frexp().  fmod() in particular is typically
way too expensive, taking time proportional to the difference between
its inputs' exponents (it emulates "long division" one bit at a time). 
While float->integer conversion is also "too expensive" on Pentium
chips, multiply-and-convert-to-integer is probably a substantially
cheaper way to extract bits from the mantissa frexp() delivers; note
that this is how the Cookbook lsum() function gets bits, although it
gets all 53 bits in one gulp while in C you'd probably want to get,
e.g., 30 bits at a time.

Something that's surprised me for decades is how slow platform ldexp()
functions are too, given how little they do.  Whenever you have a fixed
offset E you'd like to add to an exponent, it's almost certainly very
much faster to multiply by 2.0**E (when that's a compile-time constant)
than to call ldexp(whatever, E).
msg70508 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-07-31 14:27
[Tim]
> if you're concerned about speed, it would probably pay to eliminate all
> library calls except one to frexp().

Indeed it would!  Here's a revised patch that avoids use of fmod.  Speed is comparable with the current 
version.  Here are some timings for summing random values on OS X/Core 2 Duo, on a non-debug build of the 
trunk. lsum is the patched version, msum is the version in the current trunk; timings are in seconds.

                                                |  lsum    |  msum
------------------------------------------------+----------+---------
50000 random values in [0, 1)                   | 0.003091 | 0.002540
50000 random values in [0, 1), sorted           | 0.003202 | 0.003043
50000 random values in [0, 1), reverse order    | 0.003201 | 0.002587
50000 random values in [-1, 1)                  | 0.003229 | 0.002829
50000 random values in [-1, 1), sorted          | 0.003183 | 0.002629
50000 random values in [-1, 1), reverse order   | 0.003195 | 0.002731
50000 random exponential values                 | 0.003994 | 0.006178
50000 random exponential values, sorted         | 0.003713 | 0.007933
50000 random exponential values, reverse order  | 0.003714 | 0.002849

Note that lsum doesn't suffer from the 'fragmentation of partials' problem that slows down msum for sorted 
datasets.

I'll do some timings on Linux/x86 as well.
msg70522 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-07-31 15:55
Timings on x86/Linux are similar:  the lsum-based version is around
10% slower on average, 25% slower in the worst case, and significantly 
faster for the msum worst cases.

There's probably still some snot left to optimize out, though.  Some
tempting ideas are:

  (1) to try using doubles instead of longs for the accumulator digits
      (with 51 or 52 bits of precision), and
  (2) to split each mantissa into (nearest_integer, fraction) instead
      of (next_smallest_integer, fraction), using rint or lrint.

Anything else?
msg70545 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-08-01 09:18
Toned down note in docs (at Raymond's request) in r65366.

While I'd really like to see an lsum-based version go in instead, I 
think it's too close to the release to make this change right now.

There was also originally some talk of a complex math version.  What do 
people think about this?  Personally, I suspect that the use cases would 
be few and far between, and anyone who *really* needs a complex high-
precision sum can just apply math.fsum to real and imaginary parts.  
(This is easier now that x.imag and x.real make sense for integers as 
well as floats.)
msg70585 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2008-08-01 19:16
Mark, changing API is something to avoid after beta cycles begin (so,
e.g., it's late in the game to /add/ a new API, like a new method for
complex summation).  But there's no particular reason to fear changing
implementation for an API that's already in.  If the test suite supplies
good coverage, the buildbots will catch significant portability snafus
in plenty of time for the next release.

The big attraction to the lsum-like approach for users is that it "just
works", without requiring weasel words in the docs about overflows,
underflows, or esoteric details about various HW FP quirks.  Integer
arithmetic is just plain better behaved across platforms ;-)
msg70611 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2008-08-02 00:51
Comment: as a user (on x86) I agree with Tim.  I could almost see the
new version as a bugfix.

Question: I see "math module patch committed, r63542" in May 22.  But in
3.0b2, there is no math.fsum and math.sum seems to be a wrapper for
builtin sum.  Has this not been forward ported (merged) yet?
>>> math.sum
<built-in function sum>
>>> sum
<built-in function sum>
>>> math.sum is sum
False
msg70623 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-08-02 08:57
> Question: I see "math module patch committed, r63542" in May 22.  But in
> 3.0b2, there is no math.fsum and math.sum seems to be a wrapper for
> builtin sum.  Has this not been forward ported (merged) yet?

I'm pretty sure it *was* merged:  math.sum should be the full-precision 
summation in both recent betas (2.6b2 and 3.0b2).  Try comparing
sum([1e100, 1, -1e100, -1]) and math.sum([1e100, 1, -1e100, -1])---they 
should produce -1.0 and 0.0 respectively.

The name change to fsum only happened in the last few days.
msg70641 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2008-08-02 20:16
> I'm pretty sure it *was* merged:  math.sum should be the full-precision 
> summation in both recent betas (2.6b2 and 3.0b2).  Try comparing
> sum([1e100, 1, -1e100, -1]) and math.sum([1e100, 1, -1e100, -1])---they 
> should produce -1.0 and 0.0 respectively.

They do.  I realize now that two different built-in funcs in two 
different modules but with the same name will give the same 
representation.  That is so unusual, I was not expecting it.

> The name change to fsum only happened in the last few days.

Which will prevent the confusion I had ;-).  Good idea.
msg71118 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-08-14 11:10
Here's a patch, in final form, that replaces fsum with an lsum-based
implementation.  In brief, the accumulated sum-so-far is represented in
the form

  huge_integer * 2**(smallest_possible_exponent)

and the huge_integer is stored in base 2**30, with a signed-digit representation (digits 
in the range [-2**29, 2**29).

What are the chances of getting this in before next week's beta?

I did toy with a base 2**52 version, with digits stored as doubles.  It's attractive for 
a couple of reasons: (1) each 53-bit double straddles exactly two digits, which makes 
the inner loop more predictable and removes some branches, and (2) one can make some 
optimizations (e.g. being sloppy about transferring single-bit carries to the next digit 
up) based on the assumption that the input is unlikely to have more than 2**51 summands.  The result was slightly faster on OS X, and slower on Linux;  the final rounding code 
also became a little more complicated (as a result of not being able to do bit 
operations on a double easily), and making sure that things work for non IEEE doubles is 
a bit of a pain.  So in the end I abandoned this approach.
msg77628 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-12-11 19:54
Closing this.  Let's stick with what we have.
History
Date User Action Args
2008-12-11 19:54:45mark.dickinsonsetstatus: open -> closed
messages: + msg77628
2008-10-13 20:58:54hayposetnosy: + haypo
2008-10-13 12:12:46jceasetnosy: + jcea
2008-08-14 11:18:57mark.dickinsonsetfiles: - fsum10.patch
2008-08-14 11:18:40mark.dickinsonsetfiles: - fsum8.patch
2008-08-14 11:18:27mark.dickinsonsetfiles: - fsum7.patch
2008-08-14 11:10:12mark.dickinsonsetfiles: + fsum11.patch
messages: + msg71118
2008-08-02 20:16:56terry.reedysetmessages: + msg70641
2008-08-02 08:57:13mark.dickinsonsetmessages: + msg70623
2008-08-02 00:51:39terry.reedysetnosy: + terry.reedy
messages: + msg70611
2008-08-01 19:16:28tim.peterssetmessages: + msg70585
2008-08-01 09:18:22mark.dickinsonsetmessages: + msg70545
2008-07-31 15:55:22mark.dickinsonsetmessages: + msg70522
2008-07-31 14:28:01mark.dickinsonsetfiles: + fsum10.patch
messages: + msg70508
2008-07-30 22:26:25tim.peterssetnosy: + tim.peters
messages: + msg70440
2008-07-30 22:01:05mark.dickinsonsetfiles: + fsum8.patch
messages: + msg70439
2008-07-30 20:24:34mark.dickinsonsetmessages: + msg70438
2008-07-30 16:21:23mark.dickinsonsetmessages: + msg70430
2008-07-30 12:02:44mark.dickinsonsetmessages: + msg70424
2008-07-29 18:47:17mark.dickinsonsetmessages: + msg70409
2008-07-27 07:19:31mark.dickinsonsetmessages: + msg70317
2008-07-26 10:38:41mark.dickinsonsetfiles: + fsum7.patch
messages: + msg70295
2008-06-03 01:20:01mark.dickinsonsetfiles: + mathsum_IA32.patch
messages: + msg67647
2008-05-23 02:37:32mark.dickinsonsetmessages: + msg67213
2008-05-23 01:37:19mark.dickinsonsetmessages: + msg67212
2008-05-23 00:23:01rhettingersetassignee: rhettinger -> mark.dickinson
resolution: accepted
messages: + msg67211
2008-05-23 00:13:37MrJean1setfiles: + test_math_sum19.py
2008-05-23 00:12:22MrJean1setfiles: + mathmodule19.c.2.6a3.diff
messages: + msg67210
2008-05-22 16:12:22MrJean1setfiles: + test_math_sum12.py
2008-05-22 16:11:49MrJean1setfiles: + mathmodule12.c.2.6a3.diff
messages: + msg67198
2008-05-22 16:06:07MrJean1setfiles: + test_math_sum11.py
2008-05-22 16:05:06MrJean1setfiles: + mathmodule11.c.2.6a3.diff
messages: + msg67197
2008-05-19 22:20:09MrJean1setfiles: + test_math_sum5.py
2008-05-19 22:18:49MrJean1setfiles: + cmathmodule5.c.2.6a3.diff
2008-05-19 22:18:24MrJean1setfiles: + mathmodule5.c.2.6a3.diff
messages: + msg67080
2008-05-19 17:57:16mark.dickinsonsetmessages: + msg67067
2008-05-19 01:22:05MrJean1setmessages: + msg67053
2008-05-18 21:23:21MrJean1setfiles: + test_math_sum4.py
2008-05-18 21:22:25MrJean1setfiles: + cmathmodule4.c.2.6a3.diff
2008-05-18 21:21:37MrJean1setfiles: + mathmodule4.c.2.6a3.diff
messages: + msg67045
2008-05-18 21:18:10MrJean1setfiles: - test_math_sum4.py
2008-05-18 21:18:04MrJean1setfiles: - mathmodule4.c.2.6a3.diff
2008-05-18 19:29:32MrJean1setfiles: + test_math_sum4.py
2008-05-18 19:28:07MrJean1setfiles: + mathmodule4.c.2.6a3.diff
2008-05-18 19:25:47MrJean1setfiles: - mathmodule4.c.2.6a3.diff
2008-05-18 19:25:36MrJean1setfiles: + mathmodule4.c.2.6a3.diff
messages: + msg67034
2008-05-18 01:40:56MrJean1setmessages: + msg67021
2008-05-17 16:28:15mark.dickinsonsetfiles: + msum4.py
messages: + msg67006
2008-05-16 19:44:08mark.dickinsonsetfiles: + crsum.py
messages: + msg66965
2008-05-16 16:03:19mark.dickinsonsetmessages: + msg66954
2008-05-16 15:57:22MrJean1setfiles: + msum3.py
messages: + msg66953
2008-05-16 15:54:17MrJean1setfiles: - unnamed
2008-05-16 14:48:38mark.dickinsonsetfiles: + msum.py
messages: + msg66947
2008-05-15 05:00:47rhettingersetmessages: + msg66844
2008-05-12 23:27:50rhettingersetmessages: + msg66764
2008-05-12 19:45:02MrJean1setmessages: + msg66755
2008-05-12 19:16:57MrJean1setmessages: + msg66753
2008-05-12 18:45:01MrJean1setmessages: + msg66750
2008-05-12 18:42:28mark.dickinsonsetmessages: + msg66749
2008-05-12 17:35:41MrJean1setfiles: + floatsum.c
messages: + msg66745
2008-05-12 16:22:36MrJean1setfiles: + unnamed, mathmodule.c.2.6a3.take2.diff
messages: + msg66734
2008-05-11 20:49:24mark.dickinsonsetmessages: + msg66666
2008-05-11 20:41:52mark.dickinsonsetnosy: + mark.dickinson
messages: + msg66663
2008-05-11 19:31:20rhettingersetassignee: rhettinger
messages: + msg66655
2008-05-11 16:17:37MrJean1setfiles: + test_math_sum1.py
2008-05-11 16:17:24MrJean1setfiles: + mathmodule.c.2.6a3.diff
2008-05-11 16:16:52MrJean1create