File name |
Uploaded |
Description |
Edit |
cmathmodule.c.2.6a3.diff
|
MrJean1,
2008-05-11 16:16
|
|
|
mathmodule.c.2.6a3.diff
|
MrJean1,
2008-05-11 16:17
|
|
|
test_math_sum1.py
|
MrJean1,
2008-05-11 16:17
|
|
|
mathmodule.c.2.6a3.take2.diff
|
MrJean1,
2008-05-12 16:22
|
|
|
floatsum.c
|
MrJean1,
2008-05-12 17:35
|
|
|
msum.py
|
mark.dickinson,
2008-05-16 14:48
|
Version of msum that takes care of overflow |
|
msum3.py
|
MrJean1,
2008-05-16 15:57
|
Modified msum.py with overflow file |
|
crsum.py
|
mark.dickinson,
2008-05-16 19:43
|
Correctly rounded sum of nonadjacent floats |
|
msum4.py
|
mark.dickinson,
2008-05-17 16:28
|
msum, with overflow, correct rounding and special-value handling |
|
mathmodule4.c.2.6a3.diff
|
MrJean1,
2008-05-18 21:21
|
msum4 patch for mathmodule.c of Python 2.6a3 |
|
cmathmodule4.c.2.6a3.diff
|
MrJean1,
2008-05-18 21:22
|
msum4 patch for cmathmodule.c of Python 2.6a3 |
|
test_math_sum4.py
|
MrJean1,
2008-05-18 21:23
|
msum4.py tests for math- and cmathmodule.c |
|
mathmodule5.c.2.6a3.diff
|
MrJean1,
2008-05-19 22:18
|
update #5 of mathmodule.c for Python 2.6a3 |
|
cmathmodule5.c.2.6a3.diff
|
MrJean1,
2008-05-19 22:18
|
update #5 of cmathmodule.c for Python 2.6a3 |
|
test_math_sum5.py
|
MrJean1,
2008-05-19 22:20
|
update #5 of the test script |
|
mathmodule11.c.2.6a3.diff
|
MrJean1,
2008-05-22 16:05
|
rev 11 patch for mathmodule.c for Python 2.6a3 |
|
test_math_sum11.py
|
MrJean1,
2008-05-22 16:06
|
rev 11 of test_math_sum.py for Python 2.6a3 |
|
mathmodule12.c.2.6a3.diff
|
MrJean1,
2008-05-22 16:11
|
rev 12 of test_math_sum.py for Python 2.6a3 |
|
test_math_sum12.py
|
MrJean1,
2008-05-22 16:12
|
rev 12 of test_math_sum.py for Python 2.6a3 |
|
mathmodule19.c.2.6a3.diff
|
MrJean1,
2008-05-23 00:12
|
rev 19 patch for mathmodule.c for Python 2.6a3 |
|
test_math_sum19.py
|
MrJean1,
2008-05-23 00:13
|
rev 19 of test_math_sum.py for Python 2.6a3 |
|
mathsum_IA32.patch
|
mark.dickinson,
2008-06-03 01:19
|
|
|
fsum11.patch
|
mark.dickinson,
2008-08-14 11:10
|
|
|
msg66641 - (view) |
Author: Jean Brouwers (MrJean1) |
Date: 2008-05-11 16:16 |
Attached are 2 patches and a test script adding a function sum to the
math and cmath modules of Python 2.6a3. The sum is calculated using a
full precision summation method.
The test script compares the result of the functions with the original
implementation in Python. All tests pass with 4 different builds of
Python 2.6a3:
- GNU gcc 4.0.1 on MacOS X 10.4.11 (Intel Core Duo), 32-bit
- GNU gcc 4.1.2 on RHEL 3 update 7 (Opteron), 64-bit
- Sun C 5.8 on Solaris 10 (Opteron), both 32- and 64-bit
|
msg66655 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2008-05-11 19:31 |
This looks pretty good at first glance. Will review more throughly
later this week. It does need docs and unittests.
|
msg66663 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2008-05-11 20:41 |
Some comments/questions:
(1) It seems wasteful to wrap every addition in PyFPE_START/END_PROTECT,
and to check for NaNs and infinities after every addition. I'd wrap the
whole thing in a single PyFPE_START/END_PROTECT, replace _math_sum_add
with an inline addition, and just let NaNs and Infs sort themselves out.
If the result comes out finite (as it surely will in almost all
applications), then all the summands were necessarily finite and there's
nothing more to do. If the result comes out as an infinity or NaN, you
need to decide whether it's appropriate to return a NaN, an infinity, or
to raise OverflowError or ValueError. I'm not sure it's worth trying to
do the right thing for all special value cases, but if you do want to
follow 'spirit of IEEE 754' rules for special values, they should look
something like this:
(1) if the summands include a NaN, return a NaN
(2) else if the summands include infinities of both signs, raise
ValueError,
(3) else if the summands include infinities of only one sign, return
infinity with that sign,
(4) else (all summands are finite) if the result is infinite, raise
OverflowError. (The result can never be a NaN if all summands
are finite.)
Note that some sums involving overflow won't be computed correctly:
e.g. [1e308, 1e308, -1e308] will likely sum to infinity instead of
returning 1e308. I don't see any easy way around this, and it's
probably not worth worrying about.
(2) The algorithm is only guaranteed to work correctly assuming IEEE 754
semantics. Python currently doesn't insist on IEEE 754 floating point,
so what should happen on non IEEE-754 machines?
(3) Rather than duplicating the math module code in cmathmodule.c, why
not have the complex version simply sum real parts and imaginary parts
separately, using a version of the code that's already in mathmodule.c?
|
msg66666 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2008-05-11 20:49 |
One more question:
What are the use cases for an exact summation algorithm? That is, in what
situations does one care about exactness rather than simply accuracy? I
know that loss of accuracy is a problem in things like numeric integration
routines, but something like Kahan summation (faster and simpler, but not
exact) usually takes care of that.
|
msg66734 - (view) |
Author: Jean Brouwers (MrJean1) |
Date: 2008-05-12 16:22 |
Mark,
Thank you very much for your comments. Here is my initial response to the
first 3.
(1) Attached is an attempt to address the 1st issue (just the
mathmodule). The macros PyFPE_START_PROTECT/_END_PROTECT have been moved
outside the main loop and the errno is set following the IEEE 754 rules as
you suggested.
One related issue is testing these, how can a NaN and +/-Infinity float
object be created in Python?
(2) On your 2nd comment, supporting non-IEEE floating point, perhaps the
Kahan method should be used in that case. If so, the next question is how
to detect that? There are two symbols in pyconfig.h HAVE_IEEEFP_H and
HAVE_LIBIEEE. Are those the proper ones to determine IEEE floating point
support?
(3) On the 3rd comment, Raymond and I did discus using a single function to
be called by the math and cmath modules. The question is where should that
function reside? The math and cmath modules are not the right place since
both are loadable modules. It will have to be somewhere inside the Python
main/core.
Also, depending on the implementation of that function, it may require
iterating the complex sequence twice. And that will force the C complex
numbers to be created twice by the PyComplex_AsCComplex() call. Would that
be a concern?
/Jean Brouwers
On Sun, May 11, 2008 at 1:49 PM, Mark Dickinson <report@bugs.python.org>
wrote:
>
> Mark Dickinson <dickinsm@gmail.com> added the comment:
>
> One more question:
>
> What are the use cases for an exact summation algorithm? That is, in what
> situations does one care about exactness rather than simply accuracy? I
> know that loss of accuracy is a problem in things like numeric integration
> routines, but something like Kahan summation (faster and simpler, but not
> exact) usually takes care of that.
>
> __________________________________
> Tracker <report@bugs.python.org>
> <http://bugs.python.org/issue2819>
> __________________________________
>
|
msg66745 - (view) |
Author: Jean Brouwers (MrJean1) |
Date: 2008-05-12 17:35 |
My apologies for the messy post. I replied to the email instead of
posting my response.
/Jean Brouwers
PS) Attached is *an* example of the math_sum() and cmath_sum() functions
using the same, shared function float_sum(). Perhaps, that resides in
Objects/floatobject.c?
|
msg66749 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2008-05-12 18:42 |
> One related issue is testing these, how can a NaN and +/-Infinity
> float object be created in Python?
In 2.6 and 3.0 (but not 2.5 and older), float('nan'), float('inf') and
float('-inf') should all work reliably across platforms (or at least
those platforms that support infs and nans). If they don't it's a bug.
> (2) On your 2nd comment, supporting non-IEEE floating point, perhaps
> the Kahan method should be used in that case. If so, the next
> question is how to detect that?
Actually, I think you could probably just leave the algorithm exactly as
it is, but put a warning in the documentation that the exactness only
applies in the presence of IEEE 754 semantics. Practically everybody's
on an IEEE 754 platform anyway.
> There are two symbols in pyconfig.h HAVE_IEEEFP_H and
> HAVE_LIBIEEE. Are those the proper ones to determine IEEE floating
> point support?
I'm not sure that either of these is the right thing. Neither is
defined on my MacBook, for example.
> (3) On the 3rd comment, Raymond and I did discus using a single
> function to be called by the math and cmath modules.
I think you're right that it's easier to just duplicate the code.
It's a nice feature that this function only has to pass once through the
data, and it doesn't seem worth losing that to save a little bit of code
duplication.
I still wonder whether there's a way to avoid incorrectly signaling
overflow in the case where the result is finite. (e.g. sum([1e308,
1e308, -1e308])).
|
msg66750 - (view) |
Author: Jean Brouwers (MrJean1) |
Date: 2008-05-12 18:45 |
It turns out, float('nan') creates a Nan and float('[+|-]inf') creates a
[signed] Infinity object. That answers my earlier question.
/Jean Brouwers
|
msg66753 - (view) |
Author: Jean Brouwers (MrJean1) |
Date: 2008-05-12 19:16 |
The current results are quite "interesting"
>>> math.sum([1e308, 1e308, -1e308])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OverflowError: math range error
>>> math.sum([1e308, -1e308, 1e308])
1e+308
Handling this case would require holding the finite, prior sum at an
intermediate overflow in a separate array, say overflows. Then, add the
partials which may create additional overflows. Finally, keep adding
the overflows (accurately?) until none remain or until none can be added
without overflow.
/Jean Brouwers
|
msg66755 - (view) |
Author: Jean Brouwers (MrJean1) |
Date: 2008-05-12 19:45 |
There may be another reason to use a single summation function. The
summation functions does need (a copy of) the is_error() function from
the math module.
The cmath module has a similar function called math_error() which
slightly different from is_error().
It would be better, more consistent if both modules used the same
function, say is_error() moved** to file Object/floatobject.c
Then, the math and cmath module can use ist_error() instead of each
their own. Also, the summation function can use it and function
float_pow() in floatobject.c could.
/Jean Brouwers
**) and renamed to e.g. float_error().
|
msg66764 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2008-05-12 23:27 |
Rather keep it in mathmodule.c, not floatobject.c.
We should keep extension code out of the core types.
|
msg66844 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2008-05-15 05:00 |
When you need full precision, the Kahan approach helps but doesn't make
guarantees and can sometimes hurt (it makes some assumptions about the
data). One use case in is computing stats like a mean where many of
the larger magnitude entries tend to cancel out but only after clipping
bits off of the lower magnitude components.
|
msg66947 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2008-05-16 14:47 |
Here's (msum.py) an example in Python of one fairly straightforward way of
dealing with overflow correctly, without needing more than one pass
through the data, and without significant slowdown in the normal case.
(The Python code is needlessly inefficient in places, notably in that
partials[1:] creates a new list; this is obviously not a problem in C.)
The idea is essentially just to maintain the sum modulo integer multiples
of 2.**1024. partials[0] is reserved for keeping track of the current
multiple of 2.**!024. So at each stage, the sum so far is
sum(partials[1:], 0.0) + 2.**1024 * partials[0].
I'm 97.3% convinced that the proof of correctness goes through: it's
still true with this modification that partials always consists of
nonadjacent, nonzero values of increasing magnitude. One of the keys to proving this is to note that for any value x between 2**1023 and 2**1024,
both x-2**1023 and x-2**1024 are exactly representable.
---
There's one more 'nice-to-have' that I think should be considered: it
would be nice if the result of msum were always correctly rounded. One
aspect of correct rounding is that it provides a guarantee that the sum is
independent of the order of the summands, so
msum(list) == msum(sorted(list))
would hold true.
|
msg66953 - (view) |
Author: Jean Brouwers (MrJean1) |
Date: 2008-05-16 15:57 |
Two tests failed with Python 2.6a3 on MacOS X Intel.
Test 11 failed: 9007199254740992.0 vs 9007199254740991.0 expected for
[9007199254740992.0, -0.5, -5.5511151231257827e-17].
Test 12 failed: inf vs 1.7976931348623157e+308 expected for
[8.9884656743115785e+307, -1.0, 8.9884656743115795e+307].
/Jeab Brouwers
|
msg66954 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2008-05-16 16:03 |
> Test 11 failed: 9007199254740992.0 vs 9007199254740991.0 expected for
> [9007199254740992.0, -0.5, -5.5511151231257827e-17].
Yes: that's the lack of correct rounding rearing its ugly head...
> Test 12 failed: inf vs 1.7976931348623157e+308 expected for
> [8.9884656743115785e+307, -1.0, 8.9884656743115795e+307].
I'm still trying to work out how to get around this one; again, the
result's not out by much, but it would be nice to be able to guarantee
correctly rounded answers *all* the time, instead of just *most* of the
time.
|
msg66965 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2008-05-16 19:43 |
Ensuring correct rounding isn't as onerous as I expected it to be.
crsum.py is a snippet of Python code showing how to add nonadjacent floats
and get the correctly rounded result.
|
msg67006 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2008-05-17 16:27 |
Okay, just to show it's possible:
Here (msum4.py) is a modified version of Raymond's recipe that deals
correctly with:
(1) intermediate overflows
(2) special values (infs and nans) in the input, and
(3) always gives correctly rounded results.
The file contains more tests, and a proof of correctness. The algorithm
still makes only a single pass through the given iterable, and there
should be minimal slowdown in the common case. It's still only 60-70
lines of Python code, so I don't think it would be unreasonable to aim
to include these modifications in the C version.
|
msg67021 - (view) |
Author: Jean Brouwers (MrJean1) |
Date: 2008-05-18 01:40 |
I intend to submit a C version of msum4 shortly.
|
msg67034 - (view) |
Author: Jean Brouwers (MrJean1) |
Date: 2008-05-18 19:25 |
Attached is the patch for the the mathmodule.c file of Python 2.6a3
containing the C version of the msum() function from Marks's msum4.py
Python implementation.
Please review the C code, in particular the setting of _do_sum_pow_2 for
FLT_RADIX not equal to 2.
The results of the C and Python versions match (on 32-bit MacOS X
Intel), using the test_math_sum4.py script. That includes all the tests
from msum4.py and from Raymond's recipe.
More testing and the cmath version are still to be done.
/Jean Brouwers
|
msg67045 - (view) |
Author: Jean Brouwers (MrJean1) |
Date: 2008-05-18 21:21 |
Attached are updated patches for both the mathmodule.c and cmathmodule.c
files for Python 2.6a3 and for the test_math_sum4.py script. Please
ignore the ones posted earlier.
|
msg67053 - (view) |
Author: Jean Brouwers (MrJean1) |
Date: 2008-05-19 01:21 |
The tests of the test_math_sum4.py script all pass with the latest math-
and cmathmodule.c files for 4 different builds of Python 2.6a3:
- GNU gcc 4.0.1 on MacOS X 10.4.11 (Intel Core Duo), 32-bit
- GNU gcc 4.1.2 on RHEL 3 update 7 (Opteron), 64-bit
- Sun C 5.8 on Solaris 10 (Opteron), both 32- and 64-bit
Also interesting might be that the tests run 25+ times faster with the C
math.sum compared to the Python implementation of msum().
/Jean Brouwers
|
msg67067 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2008-05-19 17:56 |
The latest mathmodule patch (file 10371) looks pretty good to me. I
haven't looked at the complex version yet. A few comments:
- for setting _do_sum_pow_2 I think you should use ldexp instead of pow;
there's a much greater chance of pow giving inexact results.
- trying to support FLT_RADIX != 2 seems futile; for a start, the whole
algorithm and its proof would need to be reexamined. Any code specific to
FLT_RADIX != 2 would be very difficult to test---all of the Python
buildbots are IEEE 754, for example. I'm not sure what the best solution
is: perhaps math.sum should only be included when FLT_RADIX = 2, or (even
better) when IEEE 754 is in use. It shouldn't be too hard to add autoconf
tests to detect the float format.
- there are long lines in _do_sum, in violation of PEP 7; maybe
factoring out the memory-management code for the partials array would
help?
- it would be nice to have some brief comments before each of the _do_sum
functions indicating its purpose.
|
msg67080 - (view) |
Author: Jean Brouwers (MrJean1) |
Date: 2008-05-19 22:18 |
Here is another patch for the mathmodule.c and cmathmodule.c files updated
with the suggested and other modifications. Note, if FLT_RADIX is not 2,
the sum functions still exist but will raise a NotImplementedError.
An update of the test script is also attached. The tests all pass on the
same 4 Python 2.6a3 builds mentioned before.
/Jean Brouwers
|
msg67197 - (view) |
Author: Jean Brouwers (MrJean1) |
Date: 2008-05-22 16:04 |
Attached is revision 11 of the mathmodule.c patch for Python 2.6a3.
This one includes Raymond's full precision summation, Mark's rounding
partials addition and correct non-finites and error handling.
However, intermediate overflow will raise an OverflowError and a
FLT_RADIX not equal 2 cause a NotImplementedError.
An updated test_math_sum11.py script is also attached. All test cases
pass on 5 different builds of Python 2.6a3:
- 32-bit MacOS X 10.4.11 (Intel) with gcc 4.0.1
- 32-bit MacOS X 10.3.9 (PPC) with gcc 3.3
- 64-bit RHEL 3 update 7 (Opteron) using gcc 4.1.2
- 32- and 64-bit Solaris 10 (Opteron) with Sun C 5.8.
/Jean Brouwers
|
msg67198 - (view) |
Author: Jean Brouwers (MrJean1) |
Date: 2008-05-22 16:11 |
Here is rev 12 of the mathmodule.c patch. It is the same as rev 11 but
with additional code removed as requested:
- no FLT_RADIX 2 check
- no errno illustration in _do_sum_add2()
- no _do_sum() callback function argument
- no option 'start' argument for math.sum.
The test_math.sum12.py script and results are the same as rev 11.
/Jean Brouwers
|
msg67210 - (view) |
Author: Jean Brouwers (MrJean1) |
Date: 2008-05-23 00:12 |
Here is another, cleaner revision 19 of the same mathmodule.c patch and
the corresponding test_math_sum19.py script.
/Jean Brouwers
|
msg67211 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2008-05-23 00:22 |
Nice work Jean. Marking the patch accepted. Mark please go ahead with
commit.
Once the commit has settled for a couple of days, go ahead with a
separate patch to cover the rest of 754R logic for special values.
After that one settles, close this issue and open a new one for a
comparable patch in cmath.
|
msg67212 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2008-05-23 01:37 |
math module patch committed, r63542.
I'm still working on converting the tests to the unittest framework.
|
msg67213 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2008-05-23 02:37 |
Tests committed in r63543
|
msg67647 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2008-06-03 01:19 |
Here's a patch that fixes the problems with math.sum on x86 machines that
aren't using SSE2.
On these machines, the patched version of math.sum does almost the entire
calculation using long doubles instead of doubles. Then in the final
summation, the top long double is split into a double and a residue long
double.
|
msg70295 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2008-07-26 10:38 |
Here's a patch giving an alternative implementation of math.fsum; it's
based on Tim Peter's suggestions, works mostly with integer arithmetic,
and so bypasses problems with double rounding and extended precision
floats.
The patch is experimental: it doesn't have sufficient tests, has no
documentation, and it adds math.fsum alongside the current math.sum, to
make it easy to compare the two implementations.
On my MacBook, math.fsum is a factor of 2-3 times slower than math.sum.
It's also longer and distinctly less elegant. So its only real
advantage is that it should fix the difficulties on x86 hardware.
We *really* need to sort math.sum out, one way or another, before the
next beta. Georg recently discovered another problem on x86/Linux: see
issue 3421.
Some options:
(1) leave math.sum as it is, skip all tests on x86/Linux, and document
the current behaviour.
(2) investigate a version of math.sum that plays with the FPU control
word to force 53-bit rounding (and round-half-even)
(3) replace math.sum with the slower but (presumably) less erratic
math.fsum, possibly just as a temporary measure. This would at least
get all tests passing.
Jean, Raymond: what do you think?
|
msg70317 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2008-07-27 07:19 |
Tests related to overflow, special-values, intermediate overflow, and
results at or near the boundary of the floating-point range have been
removed in r65258.
|
msg70409 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2008-07-29 18:47 |
See r65292 for more updates to the test-suite: I've replaced the Python
version of msum by a version based on lsum in the original ASPN recipe.
|
msg70424 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2008-07-30 12:02 |
Minor code cleanups, and fixes to special-value handling in r65299
|
msg70430 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2008-07-30 16:21 |
Renamed math.sum to math.fsum (as previously discussed) in r65308.
I think all that's left now is to add a note to the docs about the
problems on x86.
|
msg70438 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2008-07-30 20:24 |
Added documentation note about x86 problems in r65315.
Jean, Raymond: is it okay to close this issue now?
|
msg70439 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2008-07-30 22:00 |
Here (fsum8.patch) is a clean version of the alternative fsum algorithm.
I'd like to push for using this in place of the existing algorithm.
|
msg70440 - (view) |
Author: Tim Peters (tim.peters) * |
Date: 2008-07-30 22:26 |
Mark, I don't currently have a machine with SVN and a compiler
installed, so can't play with patches. I just want to note here that,
if you're concerned about speed, it would probably pay to eliminate all
library calls except one to frexp(). fmod() in particular is typically
way too expensive, taking time proportional to the difference between
its inputs' exponents (it emulates "long division" one bit at a time).
While float->integer conversion is also "too expensive" on Pentium
chips, multiply-and-convert-to-integer is probably a substantially
cheaper way to extract bits from the mantissa frexp() delivers; note
that this is how the Cookbook lsum() function gets bits, although it
gets all 53 bits in one gulp while in C you'd probably want to get,
e.g., 30 bits at a time.
Something that's surprised me for decades is how slow platform ldexp()
functions are too, given how little they do. Whenever you have a fixed
offset E you'd like to add to an exponent, it's almost certainly very
much faster to multiply by 2.0**E (when that's a compile-time constant)
than to call ldexp(whatever, E).
|
msg70508 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2008-07-31 14:27 |
[Tim]
> if you're concerned about speed, it would probably pay to eliminate all
> library calls except one to frexp().
Indeed it would! Here's a revised patch that avoids use of fmod. Speed is comparable with the current
version. Here are some timings for summing random values on OS X/Core 2 Duo, on a non-debug build of the
trunk. lsum is the patched version, msum is the version in the current trunk; timings are in seconds.
| lsum | msum
------------------------------------------------+----------+---------
50000 random values in [0, 1) | 0.003091 | 0.002540
50000 random values in [0, 1), sorted | 0.003202 | 0.003043
50000 random values in [0, 1), reverse order | 0.003201 | 0.002587
50000 random values in [-1, 1) | 0.003229 | 0.002829
50000 random values in [-1, 1), sorted | 0.003183 | 0.002629
50000 random values in [-1, 1), reverse order | 0.003195 | 0.002731
50000 random exponential values | 0.003994 | 0.006178
50000 random exponential values, sorted | 0.003713 | 0.007933
50000 random exponential values, reverse order | 0.003714 | 0.002849
Note that lsum doesn't suffer from the 'fragmentation of partials' problem that slows down msum for sorted
datasets.
I'll do some timings on Linux/x86 as well.
|
msg70522 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2008-07-31 15:55 |
Timings on x86/Linux are similar: the lsum-based version is around
10% slower on average, 25% slower in the worst case, and significantly
faster for the msum worst cases.
There's probably still some snot left to optimize out, though. Some
tempting ideas are:
(1) to try using doubles instead of longs for the accumulator digits
(with 51 or 52 bits of precision), and
(2) to split each mantissa into (nearest_integer, fraction) instead
of (next_smallest_integer, fraction), using rint or lrint.
Anything else?
|
msg70545 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2008-08-01 09:18 |
Toned down note in docs (at Raymond's request) in r65366.
While I'd really like to see an lsum-based version go in instead, I
think it's too close to the release to make this change right now.
There was also originally some talk of a complex math version. What do
people think about this? Personally, I suspect that the use cases would
be few and far between, and anyone who *really* needs a complex high-
precision sum can just apply math.fsum to real and imaginary parts.
(This is easier now that x.imag and x.real make sense for integers as
well as floats.)
|
msg70585 - (view) |
Author: Tim Peters (tim.peters) * |
Date: 2008-08-01 19:16 |
Mark, changing API is something to avoid after beta cycles begin (so,
e.g., it's late in the game to /add/ a new API, like a new method for
complex summation). But there's no particular reason to fear changing
implementation for an API that's already in. If the test suite supplies
good coverage, the buildbots will catch significant portability snafus
in plenty of time for the next release.
The big attraction to the lsum-like approach for users is that it "just
works", without requiring weasel words in the docs about overflows,
underflows, or esoteric details about various HW FP quirks. Integer
arithmetic is just plain better behaved across platforms ;-)
|
msg70611 - (view) |
Author: Terry J. Reedy (terry.reedy) * |
Date: 2008-08-02 00:51 |
Comment: as a user (on x86) I agree with Tim. I could almost see the
new version as a bugfix.
Question: I see "math module patch committed, r63542" in May 22. But in
3.0b2, there is no math.fsum and math.sum seems to be a wrapper for
builtin sum. Has this not been forward ported (merged) yet?
>>> math.sum
<built-in function sum>
>>> sum
<built-in function sum>
>>> math.sum is sum
False
|
msg70623 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2008-08-02 08:57 |
> Question: I see "math module patch committed, r63542" in May 22. But in
> 3.0b2, there is no math.fsum and math.sum seems to be a wrapper for
> builtin sum. Has this not been forward ported (merged) yet?
I'm pretty sure it *was* merged: math.sum should be the full-precision
summation in both recent betas (2.6b2 and 3.0b2). Try comparing
sum([1e100, 1, -1e100, -1]) and math.sum([1e100, 1, -1e100, -1])---they
should produce -1.0 and 0.0 respectively.
The name change to fsum only happened in the last few days.
|
msg70641 - (view) |
Author: Terry J. Reedy (terry.reedy) * |
Date: 2008-08-02 20:16 |
> I'm pretty sure it *was* merged: math.sum should be the full-precision
> summation in both recent betas (2.6b2 and 3.0b2). Try comparing
> sum([1e100, 1, -1e100, -1]) and math.sum([1e100, 1, -1e100, -1])---they
> should produce -1.0 and 0.0 respectively.
They do. I realize now that two different built-in funcs in two
different modules but with the same name will give the same
representation. That is so unusual, I was not expecting it.
> The name change to fsum only happened in the last few days.
Which will prevent the confusion I had ;-). Good idea.
|
msg71118 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2008-08-14 11:10 |
Here's a patch, in final form, that replaces fsum with an lsum-based
implementation. In brief, the accumulated sum-so-far is represented in
the form
huge_integer * 2**(smallest_possible_exponent)
and the huge_integer is stored in base 2**30, with a signed-digit representation (digits
in the range [-2**29, 2**29).
What are the chances of getting this in before next week's beta?
I did toy with a base 2**52 version, with digits stored as doubles. It's attractive for
a couple of reasons: (1) each 53-bit double straddles exactly two digits, which makes
the inner loop more predictable and removes some branches, and (2) one can make some
optimizations (e.g. being sloppy about transferring single-bit carries to the next digit
up) based on the assumption that the input is unlikely to have more than 2**51 summands. The result was slightly faster on OS X, and slower on Linux; the final rounding code
also became a little more complicated (as a result of not being able to do bit
operations on a double easily), and making sure that things work for non IEEE doubles is
a bit of a pain. So in the end I abandoned this approach.
|
msg77628 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2008-12-11 19:54 |
Closing this. Let's stick with what we have.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:34 | admin | set | github: 47068 |
2008-12-11 19:54:45 | mark.dickinson | set | status: open -> closed messages:
+ msg77628 |
2008-10-13 20:58:54 | vstinner | set | nosy:
+ vstinner |
2008-10-13 12:12:46 | jcea | set | nosy:
+ jcea |
2008-08-14 11:18:57 | mark.dickinson | set | files:
- fsum10.patch |
2008-08-14 11:18:40 | mark.dickinson | set | files:
- fsum8.patch |
2008-08-14 11:18:27 | mark.dickinson | set | files:
- fsum7.patch |
2008-08-14 11:10:12 | mark.dickinson | set | files:
+ fsum11.patch messages:
+ msg71118 |
2008-08-02 20:16:56 | terry.reedy | set | messages:
+ msg70641 |
2008-08-02 08:57:13 | mark.dickinson | set | messages:
+ msg70623 |
2008-08-02 00:51:39 | terry.reedy | set | nosy:
+ terry.reedy messages:
+ msg70611 |
2008-08-01 19:16:28 | tim.peters | set | messages:
+ msg70585 |
2008-08-01 09:18:22 | mark.dickinson | set | messages:
+ msg70545 |
2008-07-31 15:55:22 | mark.dickinson | set | messages:
+ msg70522 |
2008-07-31 14:28:01 | mark.dickinson | set | files:
+ fsum10.patch messages:
+ msg70508 |
2008-07-30 22:26:25 | tim.peters | set | nosy:
+ tim.peters messages:
+ msg70440 |
2008-07-30 22:01:05 | mark.dickinson | set | files:
+ fsum8.patch messages:
+ msg70439 |
2008-07-30 20:24:34 | mark.dickinson | set | messages:
+ msg70438 |
2008-07-30 16:21:23 | mark.dickinson | set | messages:
+ msg70430 |
2008-07-30 12:02:44 | mark.dickinson | set | messages:
+ msg70424 |
2008-07-29 18:47:17 | mark.dickinson | set | messages:
+ msg70409 |
2008-07-27 07:19:31 | mark.dickinson | set | messages:
+ msg70317 |
2008-07-26 10:38:41 | mark.dickinson | set | files:
+ fsum7.patch messages:
+ msg70295 |
2008-06-03 01:20:01 | mark.dickinson | set | files:
+ mathsum_IA32.patch messages:
+ msg67647 |
2008-05-23 02:37:32 | mark.dickinson | set | messages:
+ msg67213 |
2008-05-23 01:37:19 | mark.dickinson | set | messages:
+ msg67212 |
2008-05-23 00:23:01 | rhettinger | set | assignee: rhettinger -> mark.dickinson resolution: accepted messages:
+ msg67211 |
2008-05-23 00:13:37 | MrJean1 | set | files:
+ test_math_sum19.py |
2008-05-23 00:12:22 | MrJean1 | set | files:
+ mathmodule19.c.2.6a3.diff messages:
+ msg67210 |
2008-05-22 16:12:22 | MrJean1 | set | files:
+ test_math_sum12.py |
2008-05-22 16:11:49 | MrJean1 | set | files:
+ mathmodule12.c.2.6a3.diff messages:
+ msg67198 |
2008-05-22 16:06:07 | MrJean1 | set | files:
+ test_math_sum11.py |
2008-05-22 16:05:06 | MrJean1 | set | files:
+ mathmodule11.c.2.6a3.diff messages:
+ msg67197 |
2008-05-19 22:20:09 | MrJean1 | set | files:
+ test_math_sum5.py |
2008-05-19 22:18:49 | MrJean1 | set | files:
+ cmathmodule5.c.2.6a3.diff |
2008-05-19 22:18:24 | MrJean1 | set | files:
+ mathmodule5.c.2.6a3.diff messages:
+ msg67080 |
2008-05-19 17:57:16 | mark.dickinson | set | messages:
+ msg67067 |
2008-05-19 01:22:05 | MrJean1 | set | messages:
+ msg67053 |
2008-05-18 21:23:21 | MrJean1 | set | files:
+ test_math_sum4.py |
2008-05-18 21:22:25 | MrJean1 | set | files:
+ cmathmodule4.c.2.6a3.diff |
2008-05-18 21:21:37 | MrJean1 | set | files:
+ mathmodule4.c.2.6a3.diff messages:
+ msg67045 |
2008-05-18 21:18:10 | MrJean1 | set | files:
- test_math_sum4.py |
2008-05-18 21:18:04 | MrJean1 | set | files:
- mathmodule4.c.2.6a3.diff |
2008-05-18 19:29:32 | MrJean1 | set | files:
+ test_math_sum4.py |
2008-05-18 19:28:07 | MrJean1 | set | files:
+ mathmodule4.c.2.6a3.diff |
2008-05-18 19:25:47 | MrJean1 | set | files:
- mathmodule4.c.2.6a3.diff |
2008-05-18 19:25:36 | MrJean1 | set | files:
+ mathmodule4.c.2.6a3.diff messages:
+ msg67034 |
2008-05-18 01:40:56 | MrJean1 | set | messages:
+ msg67021 |
2008-05-17 16:28:15 | mark.dickinson | set | files:
+ msum4.py messages:
+ msg67006 |
2008-05-16 19:44:08 | mark.dickinson | set | files:
+ crsum.py messages:
+ msg66965 |
2008-05-16 16:03:19 | mark.dickinson | set | messages:
+ msg66954 |
2008-05-16 15:57:22 | MrJean1 | set | files:
+ msum3.py messages:
+ msg66953 |
2008-05-16 15:54:17 | MrJean1 | set | files:
- unnamed |
2008-05-16 14:48:38 | mark.dickinson | set | files:
+ msum.py messages:
+ msg66947 |
2008-05-15 05:00:47 | rhettinger | set | messages:
+ msg66844 |
2008-05-12 23:27:50 | rhettinger | set | messages:
+ msg66764 |
2008-05-12 19:45:02 | MrJean1 | set | messages:
+ msg66755 |
2008-05-12 19:16:57 | MrJean1 | set | messages:
+ msg66753 |
2008-05-12 18:45:01 | MrJean1 | set | messages:
+ msg66750 |
2008-05-12 18:42:28 | mark.dickinson | set | messages:
+ msg66749 |
2008-05-12 17:35:41 | MrJean1 | set | files:
+ floatsum.c messages:
+ msg66745 |
2008-05-12 16:22:36 | MrJean1 | set | files:
+ unnamed, mathmodule.c.2.6a3.take2.diff messages:
+ msg66734 |
2008-05-11 20:49:24 | mark.dickinson | set | messages:
+ msg66666 |
2008-05-11 20:41:52 | mark.dickinson | set | nosy:
+ mark.dickinson messages:
+ msg66663 |
2008-05-11 19:31:20 | rhettinger | set | assignee: rhettinger messages:
+ msg66655 |
2008-05-11 16:17:37 | MrJean1 | set | files:
+ test_math_sum1.py |
2008-05-11 16:17:24 | MrJean1 | set | files:
+ mathmodule.c.2.6a3.diff |
2008-05-11 16:16:52 | MrJean1 | create | |