Issue 2819: Full precision summation

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/47068

classification

Title:	Full precision summation
Type:	performance	Stage:
Components:	Extension Modules	Versions:	Python 2.6

process

Status:	closed	Resolution:	accepted
Dependencies:		Superseder:
Assigned To:	mark.dickinson	Nosy List:	MrJean1, jcea, mark.dickinson, rhettinger, terry.reedy, tim.peters, vstinner
Priority:	normal	Keywords:	patch

Created on 2008-05-11 16:16 by MrJean1, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
cmathmodule.c.2.6a3.diff	MrJean1, 2008-05-11 16:16
mathmodule.c.2.6a3.diff	MrJean1, 2008-05-11 16:17
test_math_sum1.py	MrJean1, 2008-05-11 16:17
mathmodule.c.2.6a3.take2.diff	MrJean1, 2008-05-12 16:22
floatsum.c	MrJean1, 2008-05-12 17:35
msum.py	mark.dickinson, 2008-05-16 14:48	Version of msum that takes care of overflow
msum3.py	MrJean1, 2008-05-16 15:57	Modified msum.py with overflow file
crsum.py	mark.dickinson, 2008-05-16 19:43	Correctly rounded sum of nonadjacent floats
msum4.py	mark.dickinson, 2008-05-17 16:28	msum, with overflow, correct rounding and special-value handling
mathmodule4.c.2.6a3.diff	MrJean1, 2008-05-18 21:21	msum4 patch for mathmodule.c of Python 2.6a3
cmathmodule4.c.2.6a3.diff	MrJean1, 2008-05-18 21:22	msum4 patch for cmathmodule.c of Python 2.6a3
test_math_sum4.py	MrJean1, 2008-05-18 21:23	msum4.py tests for math- and cmathmodule.c
mathmodule5.c.2.6a3.diff	MrJean1, 2008-05-19 22:18	update #5 of mathmodule.c for Python 2.6a3
cmathmodule5.c.2.6a3.diff	MrJean1, 2008-05-19 22:18	update #5 of cmathmodule.c for Python 2.6a3
test_math_sum5.py	MrJean1, 2008-05-19 22:20	update #5 of the test script
mathmodule11.c.2.6a3.diff	MrJean1, 2008-05-22 16:05	rev 11 patch for mathmodule.c for Python 2.6a3
test_math_sum11.py	MrJean1, 2008-05-22 16:06	rev 11 of test_math_sum.py for Python 2.6a3
mathmodule12.c.2.6a3.diff	MrJean1, 2008-05-22 16:11	rev 12 of test_math_sum.py for Python 2.6a3
test_math_sum12.py	MrJean1, 2008-05-22 16:12	rev 12 of test_math_sum.py for Python 2.6a3
mathmodule19.c.2.6a3.diff	MrJean1, 2008-05-23 00:12	rev 19 patch for mathmodule.c for Python 2.6a3
test_math_sum19.py	MrJean1, 2008-05-23 00:13	rev 19 of test_math_sum.py for Python 2.6a3
mathsum_IA32.patch	mark.dickinson, 2008-06-03 01:19
fsum11.patch	mark.dickinson, 2008-08-14 11:10

Messages (47)
msg66641 - (view)	Author: Jean Brouwers (MrJean1)	Date: 2008-05-11 16:16
Attached are 2 patches and a test script adding a function sum to the math and cmath modules of Python 2.6a3. The sum is calculated using a full precision summation method. The test script compares the result of the functions with the original implementation in Python. All tests pass with 4 different builds of Python 2.6a3: - GNU gcc 4.0.1 on MacOS X 10.4.11 (Intel Core Duo), 32-bit - GNU gcc 4.1.2 on RHEL 3 update 7 (Opteron), 64-bit - Sun C 5.8 on Solaris 10 (Opteron), both 32- and 64-bit
msg66655 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2008-05-11 19:31
This looks pretty good at first glance. Will review more throughly later this week. It does need docs and unittests.
msg66663 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2008-05-11 20:41
Some comments/questions: (1) It seems wasteful to wrap every addition in PyFPE_START/END_PROTECT, and to check for NaNs and infinities after every addition. I'd wrap the whole thing in a single PyFPE_START/END_PROTECT, replace _math_sum_add with an inline addition, and just let NaNs and Infs sort themselves out. If the result comes out finite (as it surely will in almost all applications), then all the summands were necessarily finite and there's nothing more to do. If the result comes out as an infinity or NaN, you need to decide whether it's appropriate to return a NaN, an infinity, or to raise OverflowError or ValueError. I'm not sure it's worth trying to do the right thing for all special value cases, but if you do want to follow 'spirit of IEEE 754' rules for special values, they should look something like this: (1) if the summands include a NaN, return a NaN (2) else if the summands include infinities of both signs, raise ValueError, (3) else if the summands include infinities of only one sign, return infinity with that sign, (4) else (all summands are finite) if the result is infinite, raise OverflowError. (The result can never be a NaN if all summands are finite.) Note that some sums involving overflow won't be computed correctly: e.g. [1e308, 1e308, -1e308] will likely sum to infinity instead of returning 1e308. I don't see any easy way around this, and it's probably not worth worrying about. (2) The algorithm is only guaranteed to work correctly assuming IEEE 754 semantics. Python currently doesn't insist on IEEE 754 floating point, so what should happen on non IEEE-754 machines? (3) Rather than duplicating the math module code in cmathmodule.c, why not have the complex version simply sum real parts and imaginary parts separately, using a version of the code that's already in mathmodule.c?
msg66666 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2008-05-11 20:49
One more question: What are the use cases for an exact summation algorithm? That is, in what situations does one care about exactness rather than simply accuracy? I know that loss of accuracy is a problem in things like numeric integration routines, but something like Kahan summation (faster and simpler, but not exact) usually takes care of that.
msg66734 - (view)	Author: Jean Brouwers (MrJean1)	Date: 2008-05-12 16:22
Mark, Thank you very much for your comments. Here is my initial response to the first 3. (1) Attached is an attempt to address the 1st issue (just the mathmodule). The macros PyFPE_START_PROTECT/_END_PROTECT have been moved outside the main loop and the errno is set following the IEEE 754 rules as you suggested. One related issue is testing these, how can a NaN and +/-Infinity float object be created in Python? (2) On your 2nd comment, supporting non-IEEE floating point, perhaps the Kahan method should be used in that case. If so, the next question is how to detect that? There are two symbols in pyconfig.h HAVE_IEEEFP_H and HAVE_LIBIEEE. Are those the proper ones to determine IEEE floating point support? (3) On the 3rd comment, Raymond and I did discus using a single function to be called by the math and cmath modules. The question is where should that function reside? The math and cmath modules are not the right place since both are loadable modules. It will have to be somewhere inside the Python main/core. Also, depending on the implementation of that function, it may require iterating the complex sequence twice. And that will force the C complex numbers to be created twice by the PyComplex_AsCComplex() call. Would that be a concern? /Jean Brouwers On Sun, May 11, 2008 at 1:49 PM, Mark Dickinson <report@bugs.python.org> wrote: > > Mark Dickinson <dickinsm@gmail.com> added the comment: > > One more question: > > What are the use cases for an exact summation algorithm? That is, in what > situations does one care about exactness rather than simply accuracy? I > know that loss of accuracy is a problem in things like numeric integration > routines, but something like Kahan summation (faster and simpler, but not > exact) usually takes care of that. > > __________________________________ > Tracker <report@bugs.python.org> > <http://bugs.python.org/issue2819> > __________________________________ >
msg66745 - (view)	Author: Jean Brouwers (MrJean1)	Date: 2008-05-12 17:35
My apologies for the messy post. I replied to the email instead of posting my response. /Jean Brouwers PS) Attached is an example of the math_sum() and cmath_sum() functions using the same, shared function float_sum(). Perhaps, that resides in Objects/floatobject.c?
msg66749 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2008-05-12 18:42
> One related issue is testing these, how can a NaN and +/-Infinity > float object be created in Python? In 2.6 and 3.0 (but not 2.5 and older), float('nan'), float('inf') and float('-inf') should all work reliably across platforms (or at least those platforms that support infs and nans). If they don't it's a bug. > (2) On your 2nd comment, supporting non-IEEE floating point, perhaps > the Kahan method should be used in that case. If so, the next > question is how to detect that? Actually, I think you could probably just leave the algorithm exactly as it is, but put a warning in the documentation that the exactness only applies in the presence of IEEE 754 semantics. Practically everybody's on an IEEE 754 platform anyway. > There are two symbols in pyconfig.h HAVE_IEEEFP_H and > HAVE_LIBIEEE. Are those the proper ones to determine IEEE floating > point support? I'm not sure that either of these is the right thing. Neither is defined on my MacBook, for example. > (3) On the 3rd comment, Raymond and I did discus using a single > function to be called by the math and cmath modules. I think you're right that it's easier to just duplicate the code. It's a nice feature that this function only has to pass once through the data, and it doesn't seem worth losing that to save a little bit of code duplication. I still wonder whether there's a way to avoid incorrectly signaling overflow in the case where the result is finite. (e.g. sum([1e308, 1e308, -1e308])).
msg66750 - (view)	Author: Jean Brouwers (MrJean1)	Date: 2008-05-12 18:45
It turns out, float('nan') creates a Nan and float('[+\|-]inf') creates a [signed] Infinity object. That answers my earlier question. /Jean Brouwers
msg66753 - (view)	Author: Jean Brouwers (MrJean1)	Date: 2008-05-12 19:16
The current results are quite "interesting" >>> math.sum([1e308, 1e308, -1e308]) Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: math range error >>> math.sum([1e308, -1e308, 1e308]) 1e+308 Handling this case would require holding the finite, prior sum at an intermediate overflow in a separate array, say overflows. Then, add the partials which may create additional overflows. Finally, keep adding the overflows (accurately?) until none remain or until none can be added without overflow. /Jean Brouwers
msg66755 - (view)	Author: Jean Brouwers (MrJean1)	Date: 2008-05-12 19:45
There may be another reason to use a single summation function. The summation functions does need (a copy of) the is_error() function from the math module. The cmath module has a similar function called math_error() which slightly different from is_error(). It would be better, more consistent if both modules used the same function, say is_error() moved to file Object/floatobject.c Then, the math and cmath module can use ist_error() instead of each their own. Also, the summation function can use it and function float_pow() in floatobject.c could. /Jean Brouwers ) and renamed to e.g. float_error().
msg66764 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2008-05-12 23:27
Rather keep it in mathmodule.c, not floatobject.c. We should keep extension code out of the core types.
msg66844 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2008-05-15 05:00
When you need full precision, the Kahan approach helps but doesn't make guarantees and can sometimes hurt (it makes some assumptions about the data). One use case in is computing stats like a mean where many of the larger magnitude entries tend to cancel out but only after clipping bits off of the lower magnitude components.
msg66947 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2008-05-16 14:47
Here's (msum.py) an example in Python of one fairly straightforward way of dealing with overflow correctly, without needing more than one pass through the data, and without significant slowdown in the normal case. (The Python code is needlessly inefficient in places, notably in that partials[1:] creates a new list; this is obviously not a problem in C.) The idea is essentially just to maintain the sum modulo integer multiples of 2.1024. partials[0] is reserved for keeping track of the current multiple of 2.!024. So at each stage, the sum so far is sum(partials[1:], 0.0) + 2.*1024 partials[0]. I'm 97.3% convinced that the proof of correctness goes through: it's still true with this modification that partials always consists of nonadjacent, nonzero values of increasing magnitude. One of the keys to proving this is to note that for any value x between 21023 and 21024, both x-21023 and x-21024 are exactly representable. --- There's one more 'nice-to-have' that I think should be considered: it would be nice if the result of msum were always correctly rounded. One aspect of correct rounding is that it provides a guarantee that the sum is independent of the order of the summands, so msum(list) == msum(sorted(list)) would hold true.
msg66953 - (view)	Author: Jean Brouwers (MrJean1)	Date: 2008-05-16 15:57
Two tests failed with Python 2.6a3 on MacOS X Intel. Test 11 failed: 9007199254740992.0 vs 9007199254740991.0 expected for [9007199254740992.0, -0.5, -5.5511151231257827e-17]. Test 12 failed: inf vs 1.7976931348623157e+308 expected for [8.9884656743115785e+307, -1.0, 8.9884656743115795e+307]. /Jeab Brouwers
msg66954 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2008-05-16 16:03
> Test 11 failed: 9007199254740992.0 vs 9007199254740991.0 expected for > [9007199254740992.0, -0.5, -5.5511151231257827e-17]. Yes: that's the lack of correct rounding rearing its ugly head... > Test 12 failed: inf vs 1.7976931348623157e+308 expected for > [8.9884656743115785e+307, -1.0, 8.9884656743115795e+307]. I'm still trying to work out how to get around this one; again, the result's not out by much, but it would be nice to be able to guarantee correctly rounded answers all the time, instead of just most of the time.
msg66965 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2008-05-16 19:43
Ensuring correct rounding isn't as onerous as I expected it to be. crsum.py is a snippet of Python code showing how to add nonadjacent floats and get the correctly rounded result.
msg67006 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2008-05-17 16:27
Okay, just to show it's possible: Here (msum4.py) is a modified version of Raymond's recipe that deals correctly with: (1) intermediate overflows (2) special values (infs and nans) in the input, and (3) always gives correctly rounded results. The file contains more tests, and a proof of correctness. The algorithm still makes only a single pass through the given iterable, and there should be minimal slowdown in the common case. It's still only 60-70 lines of Python code, so I don't think it would be unreasonable to aim to include these modifications in the C version.
msg67021 - (view)	Author: Jean Brouwers (MrJean1)	Date: 2008-05-18 01:40
I intend to submit a C version of msum4 shortly.
msg67034 - (view)	Author: Jean Brouwers (MrJean1)	Date: 2008-05-18 19:25
Attached is the patch for the the mathmodule.c file of Python 2.6a3 containing the C version of the msum() function from Marks's msum4.py Python implementation. Please review the C code, in particular the setting of _do_sum_pow_2 for FLT_RADIX not equal to 2. The results of the C and Python versions match (on 32-bit MacOS X Intel), using the test_math_sum4.py script. That includes all the tests from msum4.py and from Raymond's recipe. More testing and the cmath version are still to be done. /Jean Brouwers
msg67045 - (view)	Author: Jean Brouwers (MrJean1)	Date: 2008-05-18 21:21
Attached are updated patches for both the mathmodule.c and cmathmodule.c files for Python 2.6a3 and for the test_math_sum4.py script. Please ignore the ones posted earlier.
msg67053 - (view)	Author: Jean Brouwers (MrJean1)	Date: 2008-05-19 01:21
The tests of the test_math_sum4.py script all pass with the latest math- and cmathmodule.c files for 4 different builds of Python 2.6a3: - GNU gcc 4.0.1 on MacOS X 10.4.11 (Intel Core Duo), 32-bit - GNU gcc 4.1.2 on RHEL 3 update 7 (Opteron), 64-bit - Sun C 5.8 on Solaris 10 (Opteron), both 32- and 64-bit Also interesting might be that the tests run 25+ times faster with the C math.sum compared to the Python implementation of msum(). /Jean Brouwers
msg67067 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2008-05-19 17:56
The latest mathmodule patch (file 10371) looks pretty good to me. I haven't looked at the complex version yet. A few comments: - for setting _do_sum_pow_2 I think you should use ldexp instead of pow; there's a much greater chance of pow giving inexact results. - trying to support FLT_RADIX != 2 seems futile; for a start, the whole algorithm and its proof would need to be reexamined. Any code specific to FLT_RADIX != 2 would be very difficult to test---all of the Python buildbots are IEEE 754, for example. I'm not sure what the best solution is: perhaps math.sum should only be included when FLT_RADIX = 2, or (even better) when IEEE 754 is in use. It shouldn't be too hard to add autoconf tests to detect the float format. - there are long lines in _do_sum, in violation of PEP 7; maybe factoring out the memory-management code for the partials array would help? - it would be nice to have some brief comments before each of the _do_sum functions indicating its purpose.
msg67080 - (view)	Author: Jean Brouwers (MrJean1)	Date: 2008-05-19 22:18
Here is another patch for the mathmodule.c and cmathmodule.c files updated with the suggested and other modifications. Note, if FLT_RADIX is not 2, the sum functions still exist but will raise a NotImplementedError. An update of the test script is also attached. The tests all pass on the same 4 Python 2.6a3 builds mentioned before. /Jean Brouwers
msg67197 - (view)	Author: Jean Brouwers (MrJean1)	Date: 2008-05-22 16:04
Attached is revision 11 of the mathmodule.c patch for Python 2.6a3. This one includes Raymond's full precision summation, Mark's rounding partials addition and correct non-finites and error handling. However, intermediate overflow will raise an OverflowError and a FLT_RADIX not equal 2 cause a NotImplementedError. An updated test_math_sum11.py script is also attached. All test cases pass on 5 different builds of Python 2.6a3: - 32-bit MacOS X 10.4.11 (Intel) with gcc 4.0.1 - 32-bit MacOS X 10.3.9 (PPC) with gcc 3.3 - 64-bit RHEL 3 update 7 (Opteron) using gcc 4.1.2 - 32- and 64-bit Solaris 10 (Opteron) with Sun C 5.8. /Jean Brouwers
msg67198 - (view)	Author: Jean Brouwers (MrJean1)	Date: 2008-05-22 16:11
Here is rev 12 of the mathmodule.c patch. It is the same as rev 11 but with additional code removed as requested: - no FLT_RADIX 2 check - no errno illustration in _do_sum_add2() - no _do_sum() callback function argument - no option 'start' argument for math.sum. The test_math.sum12.py script and results are the same as rev 11. /Jean Brouwers
msg67210 - (view)	Author: Jean Brouwers (MrJean1)	Date: 2008-05-23 00:12
Here is another, cleaner revision 19 of the same mathmodule.c patch and the corresponding test_math_sum19.py script. /Jean Brouwers
msg67211 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2008-05-23 00:22
Nice work Jean. Marking the patch accepted. Mark please go ahead with commit. Once the commit has settled for a couple of days, go ahead with a separate patch to cover the rest of 754R logic for special values. After that one settles, close this issue and open a new one for a comparable patch in cmath.
msg67212 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2008-05-23 01:37
math module patch committed, r63542. I'm still working on converting the tests to the unittest framework.
msg67213 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2008-05-23 02:37
Tests committed in r63543
msg67647 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2008-06-03 01:19
Here's a patch that fixes the problems with math.sum on x86 machines that aren't using SSE2. On these machines, the patched version of math.sum does almost the entire calculation using long doubles instead of doubles. Then in the final summation, the top long double is split into a double and a residue long double.
msg70295 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2008-07-26 10:38
Here's a patch giving an alternative implementation of math.fsum; it's based on Tim Peter's suggestions, works mostly with integer arithmetic, and so bypasses problems with double rounding and extended precision floats. The patch is experimental: it doesn't have sufficient tests, has no documentation, and it adds math.fsum alongside the current math.sum, to make it easy to compare the two implementations. On my MacBook, math.fsum is a factor of 2-3 times slower than math.sum. It's also longer and distinctly less elegant. So its only real advantage is that it should fix the difficulties on x86 hardware. We really need to sort math.sum out, one way or another, before the next beta. Georg recently discovered another problem on x86/Linux: see issue 3421. Some options: (1) leave math.sum as it is, skip all tests on x86/Linux, and document the current behaviour. (2) investigate a version of math.sum that plays with the FPU control word to force 53-bit rounding (and round-half-even) (3) replace math.sum with the slower but (presumably) less erratic math.fsum, possibly just as a temporary measure. This would at least get all tests passing. Jean, Raymond: what do you think?
msg70317 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2008-07-27 07:19
Tests related to overflow, special-values, intermediate overflow, and results at or near the boundary of the floating-point range have been removed in r65258.
msg70409 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2008-07-29 18:47
See r65292 for more updates to the test-suite: I've replaced the Python version of msum by a version based on lsum in the original ASPN recipe.
msg70424 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2008-07-30 12:02
Minor code cleanups, and fixes to special-value handling in r65299
msg70430 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2008-07-30 16:21
Renamed math.sum to math.fsum (as previously discussed) in r65308. I think all that's left now is to add a note to the docs about the problems on x86.
msg70438 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2008-07-30 20:24
Added documentation note about x86 problems in r65315. Jean, Raymond: is it okay to close this issue now?
msg70439 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2008-07-30 22:00
Here (fsum8.patch) is a clean version of the alternative fsum algorithm. I'd like to push for using this in place of the existing algorithm.
msg70440 - (view)	Author: Tim Peters (tim.peters) *	Date: 2008-07-30 22:26
Mark, I don't currently have a machine with SVN and a compiler installed, so can't play with patches. I just want to note here that, if you're concerned about speed, it would probably pay to eliminate all library calls except one to frexp(). fmod() in particular is typically way too expensive, taking time proportional to the difference between its inputs' exponents (it emulates "long division" one bit at a time). While float->integer conversion is also "too expensive" on Pentium chips, multiply-and-convert-to-integer is probably a substantially cheaper way to extract bits from the mantissa frexp() delivers; note that this is how the Cookbook lsum() function gets bits, although it gets all 53 bits in one gulp while in C you'd probably want to get, e.g., 30 bits at a time. Something that's surprised me for decades is how slow platform ldexp() functions are too, given how little they do. Whenever you have a fixed offset E you'd like to add to an exponent, it's almost certainly very much faster to multiply by 2.0**E (when that's a compile-time constant) than to call ldexp(whatever, E).
msg70508 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2008-07-31 14:27
[Tim] > if you're concerned about speed, it would probably pay to eliminate all > library calls except one to frexp(). Indeed it would! Here's a revised patch that avoids use of fmod. Speed is comparable with the current version. Here are some timings for summing random values on OS X/Core 2 Duo, on a non-debug build of the trunk. lsum is the patched version, msum is the version in the current trunk; timings are in seconds. \| lsum \| msum ------------------------------------------------+----------+--------- 50000 random values in [0, 1) \| 0.003091 \| 0.002540 50000 random values in [0, 1), sorted \| 0.003202 \| 0.003043 50000 random values in [0, 1), reverse order \| 0.003201 \| 0.002587 50000 random values in [-1, 1) \| 0.003229 \| 0.002829 50000 random values in [-1, 1), sorted \| 0.003183 \| 0.002629 50000 random values in [-1, 1), reverse order \| 0.003195 \| 0.002731 50000 random exponential values \| 0.003994 \| 0.006178 50000 random exponential values, sorted \| 0.003713 \| 0.007933 50000 random exponential values, reverse order \| 0.003714 \| 0.002849 Note that lsum doesn't suffer from the 'fragmentation of partials' problem that slows down msum for sorted datasets. I'll do some timings on Linux/x86 as well.
msg70522 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2008-07-31 15:55
Timings on x86/Linux are similar: the lsum-based version is around 10% slower on average, 25% slower in the worst case, and significantly faster for the msum worst cases. There's probably still some snot left to optimize out, though. Some tempting ideas are: (1) to try using doubles instead of longs for the accumulator digits (with 51 or 52 bits of precision), and (2) to split each mantissa into (nearest_integer, fraction) instead of (next_smallest_integer, fraction), using rint or lrint. Anything else?
msg70545 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2008-08-01 09:18
Toned down note in docs (at Raymond's request) in r65366. While I'd really like to see an lsum-based version go in instead, I think it's too close to the release to make this change right now. There was also originally some talk of a complex math version. What do people think about this? Personally, I suspect that the use cases would be few and far between, and anyone who really needs a complex high- precision sum can just apply math.fsum to real and imaginary parts. (This is easier now that x.imag and x.real make sense for integers as well as floats.)
msg70585 - (view)	Author: Tim Peters (tim.peters) *	Date: 2008-08-01 19:16
Mark, changing API is something to avoid after beta cycles begin (so, e.g., it's late in the game to /add/ a new API, like a new method for complex summation). But there's no particular reason to fear changing implementation for an API that's already in. If the test suite supplies good coverage, the buildbots will catch significant portability snafus in plenty of time for the next release. The big attraction to the lsum-like approach for users is that it "just works", without requiring weasel words in the docs about overflows, underflows, or esoteric details about various HW FP quirks. Integer arithmetic is just plain better behaved across platforms ;-)
msg70611 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2008-08-02 00:51
Comment: as a user (on x86) I agree with Tim. I could almost see the new version as a bugfix. Question: I see "math module patch committed, r63542" in May 22. But in 3.0b2, there is no math.fsum and math.sum seems to be a wrapper for builtin sum. Has this not been forward ported (merged) yet? >>> math.sum <built-in function sum> >>> sum <built-in function sum> >>> math.sum is sum False
msg70623 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2008-08-02 08:57
> Question: I see "math module patch committed, r63542" in May 22. But in > 3.0b2, there is no math.fsum and math.sum seems to be a wrapper for > builtin sum. Has this not been forward ported (merged) yet? I'm pretty sure it was merged: math.sum should be the full-precision summation in both recent betas (2.6b2 and 3.0b2). Try comparing sum([1e100, 1, -1e100, -1]) and math.sum([1e100, 1, -1e100, -1])---they should produce -1.0 and 0.0 respectively. The name change to fsum only happened in the last few days.
msg70641 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2008-08-02 20:16
> I'm pretty sure it was merged: math.sum should be the full-precision > summation in both recent betas (2.6b2 and 3.0b2). Try comparing > sum([1e100, 1, -1e100, -1]) and math.sum([1e100, 1, -1e100, -1])---they > should produce -1.0 and 0.0 respectively. They do. I realize now that two different built-in funcs in two different modules but with the same name will give the same representation. That is so unusual, I was not expecting it. > The name change to fsum only happened in the last few days. Which will prevent the confusion I had ;-). Good idea.
msg71118 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2008-08-14 11:10
Here's a patch, in final form, that replaces fsum with an lsum-based implementation. In brief, the accumulated sum-so-far is represented in the form huge_integer * 2(smallest_possible_exponent) and the huge_integer is stored in base 230, with a signed-digit representation (digits in the range [-229, 229). What are the chances of getting this in before next week's beta? I did toy with a base 252 version, with digits stored as doubles. It's attractive for a couple of reasons: (1) each 53-bit double straddles exactly two digits, which makes the inner loop more predictable and removes some branches, and (2) one can make some optimizations (e.g. being sloppy about transferring single-bit carries to the next digit up) based on the assumption that the input is unlikely to have more than 251 summands. The result was slightly faster on OS X, and slower on Linux; the final rounding code also became a little more complicated (as a result of not being able to do bit operations on a double easily), and making sure that things work for non IEEE doubles is a bit of a pain. So in the end I abandoned this approach.
msg77628 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2008-12-11 19:54
Closing this. Let's stick with what we have.

History
Date	User	Action	Args
2022-04-11 14:56:34	admin	set	github: 47068
2008-12-11 19:54:45	mark.dickinson	set	status: open -> closed messages: + msg77628
2008-10-13 20:58:54	vstinner	set	nosy: + vstinner
2008-10-13 12:12:46	jcea	set	nosy: + jcea
2008-08-14 11:18:57	mark.dickinson	set	files: - fsum10.patch
2008-08-14 11:18:40	mark.dickinson	set	files: - fsum8.patch
2008-08-14 11:18:27	mark.dickinson	set	files: - fsum7.patch
2008-08-14 11:10:12	mark.dickinson	set	files: + fsum11.patch messages: + msg71118
2008-08-02 20:16:56	terry.reedy	set	messages: + msg70641
2008-08-02 08:57:13	mark.dickinson	set	messages: + msg70623
2008-08-02 00:51:39	terry.reedy	set	nosy: + terry.reedy messages: + msg70611
2008-08-01 19:16:28	tim.peters	set	messages: + msg70585
2008-08-01 09:18:22	mark.dickinson	set	messages: + msg70545
2008-07-31 15:55:22	mark.dickinson	set	messages: + msg70522
2008-07-31 14:28:01	mark.dickinson	set	files: + fsum10.patch messages: + msg70508
2008-07-30 22:26:25	tim.peters	set	nosy: + tim.peters messages: + msg70440
2008-07-30 22:01:05	mark.dickinson	set	files: + fsum8.patch messages: + msg70439
2008-07-30 20:24:34	mark.dickinson	set	messages: + msg70438
2008-07-30 16:21:23	mark.dickinson	set	messages: + msg70430
2008-07-30 12:02:44	mark.dickinson	set	messages: + msg70424
2008-07-29 18:47:17	mark.dickinson	set	messages: + msg70409
2008-07-27 07:19:31	mark.dickinson	set	messages: + msg70317
2008-07-26 10:38:41	mark.dickinson	set	files: + fsum7.patch messages: + msg70295
2008-06-03 01:20:01	mark.dickinson	set	files: + mathsum_IA32.patch messages: + msg67647
2008-05-23 02:37:32	mark.dickinson	set	messages: + msg67213
2008-05-23 01:37:19	mark.dickinson	set	messages: + msg67212
2008-05-23 00:23:01	rhettinger	set	assignee: rhettinger -> mark.dickinson resolution: accepted messages: + msg67211
2008-05-23 00:13:37	MrJean1	set	files: + test_math_sum19.py
2008-05-23 00:12:22	MrJean1	set	files: + mathmodule19.c.2.6a3.diff messages: + msg67210
2008-05-22 16:12:22	MrJean1	set	files: + test_math_sum12.py
2008-05-22 16:11:49	MrJean1	set	files: + mathmodule12.c.2.6a3.diff messages: + msg67198
2008-05-22 16:06:07	MrJean1	set	files: + test_math_sum11.py
2008-05-22 16:05:06	MrJean1	set	files: + mathmodule11.c.2.6a3.diff messages: + msg67197
2008-05-19 22:20:09	MrJean1	set	files: + test_math_sum5.py
2008-05-19 22:18:49	MrJean1	set	files: + cmathmodule5.c.2.6a3.diff
2008-05-19 22:18:24	MrJean1	set	files: + mathmodule5.c.2.6a3.diff messages: + msg67080
2008-05-19 17:57:16	mark.dickinson	set	messages: + msg67067
2008-05-19 01:22:05	MrJean1	set	messages: + msg67053
2008-05-18 21:23:21	MrJean1	set	files: + test_math_sum4.py
2008-05-18 21:22:25	MrJean1	set	files: + cmathmodule4.c.2.6a3.diff
2008-05-18 21:21:37	MrJean1	set	files: + mathmodule4.c.2.6a3.diff messages: + msg67045
2008-05-18 21:18:10	MrJean1	set	files: - test_math_sum4.py
2008-05-18 21:18:04	MrJean1	set	files: - mathmodule4.c.2.6a3.diff
2008-05-18 19:29:32	MrJean1	set	files: + test_math_sum4.py
2008-05-18 19:28:07	MrJean1	set	files: + mathmodule4.c.2.6a3.diff
2008-05-18 19:25:47	MrJean1	set	files: - mathmodule4.c.2.6a3.diff
2008-05-18 19:25:36	MrJean1	set	files: + mathmodule4.c.2.6a3.diff messages: + msg67034
2008-05-18 01:40:56	MrJean1	set	messages: + msg67021
2008-05-17 16:28:15	mark.dickinson	set	files: + msum4.py messages: + msg67006
2008-05-16 19:44:08	mark.dickinson	set	files: + crsum.py messages: + msg66965
2008-05-16 16:03:19	mark.dickinson	set	messages: + msg66954
2008-05-16 15:57:22	MrJean1	set	files: + msum3.py messages: + msg66953
2008-05-16 15:54:17	MrJean1	set	files: - unnamed
2008-05-16 14:48:38	mark.dickinson	set	files: + msum.py messages: + msg66947
2008-05-15 05:00:47	rhettinger	set	messages: + msg66844
2008-05-12 23:27:50	rhettinger	set	messages: + msg66764
2008-05-12 19:45:02	MrJean1	set	messages: + msg66755
2008-05-12 19:16:57	MrJean1	set	messages: + msg66753
2008-05-12 18:45:01	MrJean1	set	messages: + msg66750
2008-05-12 18:42:28	mark.dickinson	set	messages: + msg66749
2008-05-12 17:35:41	MrJean1	set	files: + floatsum.c messages: + msg66745
2008-05-12 16:22:36	MrJean1	set	files: + unnamed, mathmodule.c.2.6a3.take2.diff messages: + msg66734
2008-05-11 20:49:24	mark.dickinson	set	messages: + msg66666
2008-05-11 20:41:52	mark.dickinson	set	nosy: + mark.dickinson messages: + msg66663
2008-05-11 19:31:20	rhettinger	set	assignee: rhettinger messages: + msg66655
2008-05-11 16:17:37	MrJean1	set	files: + test_math_sum1.py
2008-05-11 16:17:24	MrJean1	set	files: + mathmodule.c.2.6a3.diff
2008-05-11 16:16:52	MrJean1	create