Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full precision summation #47068

Closed
MrJean1 mannequin opened this issue May 11, 2008 · 47 comments
Closed

Full precision summation #47068

MrJean1 mannequin opened this issue May 11, 2008 · 47 comments
Assignees
Labels
extension-modules C modules in the Modules dir performance Performance or resource usage

Comments

@MrJean1
Copy link
Mannequin

MrJean1 mannequin commented May 11, 2008

BPO 2819
Nosy @tim-one, @rhettinger, @terryjreedy, @jcea, @mdickinson, @vstinner
Files
  • cmathmodule.c.2.6a3.diff
  • mathmodule.c.2.6a3.diff
  • test_math_sum1.py
  • mathmodule.c.2.6a3.take2.diff
  • floatsum.c
  • msum.py: Version of msum that takes care of overflow
  • msum3.py: Modified msum.py with overflow file
  • crsum.py: Correctly rounded sum of nonadjacent floats
  • msum4.py: msum, with overflow, correct rounding and special-value handling
  • mathmodule4.c.2.6a3.diff: msum4 patch for mathmodule.c of Python 2.6a3
  • cmathmodule4.c.2.6a3.diff: msum4 patch for cmathmodule.c of Python 2.6a3
  • test_math_sum4.py: msum4.py tests for math- and cmathmodule.c
  • mathmodule5.c.2.6a3.diff: update Convert READMEs to rst #5 of mathmodule.c for Python 2.6a3
  • cmathmodule5.c.2.6a3.diff: update Convert READMEs to rst #5 of cmathmodule.c for Python 2.6a3
  • test_math_sum5.py: update Convert READMEs to rst #5 of the test script
  • mathmodule11.c.2.6a3.diff: rev 11 patch for mathmodule.c for Python 2.6a3
  • test_math_sum11.py: rev 11 of test_math_sum.py for Python 2.6a3
  • mathmodule12.c.2.6a3.diff: rev 12 of test_math_sum.py for Python 2.6a3
  • test_math_sum12.py: rev 12 of test_math_sum.py for Python 2.6a3
  • mathmodule19.c.2.6a3.diff: rev 19 patch for mathmodule.c for Python 2.6a3
  • test_math_sum19.py: rev 19 of test_math_sum.py for Python 2.6a3
  • mathsum_IA32.patch
  • fsum11.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/mdickinson'
    closed_at = <Date 2008-12-11.19:54:45.020>
    created_at = <Date 2008-05-11.16:16:51.847>
    labels = ['extension-modules', 'performance']
    title = 'Full precision summation'
    updated_at = <Date 2008-12-11.19:54:45.008>
    user = 'https://bugs.python.org/MrJean1'

    bugs.python.org fields:

    activity = <Date 2008-12-11.19:54:45.008>
    actor = 'mark.dickinson'
    assignee = 'mark.dickinson'
    closed = True
    closed_date = <Date 2008-12-11.19:54:45.020>
    closer = 'mark.dickinson'
    components = ['Extension Modules']
    creation = <Date 2008-05-11.16:16:51.847>
    creator = 'MrJean1'
    dependencies = []
    files = ['10285', '10286', '10287', '10307', '10308', '10345', '10347', '10351', '10357', '10371', '10372', '10373', '10379', '10380', '10381', '10405', '10406', '10407', '10408', '10410', '10411', '10505', '11108']
    hgrepos = []
    issue_num = 2819
    keywords = ['patch']
    message_count = 47.0
    messages = ['66641', '66655', '66663', '66666', '66734', '66745', '66749', '66750', '66753', '66755', '66764', '66844', '66947', '66953', '66954', '66965', '67006', '67021', '67034', '67045', '67053', '67067', '67080', '67197', '67198', '67210', '67211', '67212', '67213', '67647', '70295', '70317', '70409', '70424', '70430', '70438', '70439', '70440', '70508', '70522', '70545', '70585', '70611', '70623', '70641', '71118', '77628']
    nosy_count = 7.0
    nosy_names = ['tim.peters', 'rhettinger', 'terry.reedy', 'jcea', 'mark.dickinson', 'vstinner', 'MrJean1']
    pr_nums = []
    priority = 'normal'
    resolution = 'accepted'
    stage = None
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue2819'
    versions = ['Python 2.6']

    @MrJean1
    Copy link
    Mannequin Author

    MrJean1 mannequin commented May 11, 2008

    Attached are 2 patches and a test script adding a function sum to the
    math and cmath modules of Python 2.6a3. The sum is calculated using a
    full precision summation method.

    The test script compares the result of the functions with the original
    implementation in Python. All tests pass with 4 different builds of
    Python 2.6a3:

    • GNU gcc 4.0.1 on MacOS X 10.4.11 (Intel Core Duo), 32-bit
    • GNU gcc 4.1.2 on RHEL 3 update 7 (Opteron), 64-bit
    • Sun C 5.8 on Solaris 10 (Opteron), both 32- and 64-bit

    @MrJean1 MrJean1 mannequin added extension-modules C modules in the Modules dir performance Performance or resource usage labels May 11, 2008
    @rhettinger
    Copy link
    Contributor

    This looks pretty good at first glance. Will review more throughly
    later this week. It does need docs and unittests.

    @rhettinger rhettinger self-assigned this May 11, 2008
    @mdickinson
    Copy link
    Member

    Some comments/questions:

    (1) It seems wasteful to wrap every addition in PyFPE_START/END_PROTECT,
    and to check for NaNs and infinities after every addition. I'd wrap the
    whole thing in a single PyFPE_START/END_PROTECT, replace _math_sum_add
    with an inline addition, and just let NaNs and Infs sort themselves out.

    If the result comes out finite (as it surely will in almost all
    applications), then all the summands were necessarily finite and there's
    nothing more to do. If the result comes out as an infinity or NaN, you
    need to decide whether it's appropriate to return a NaN, an infinity, or
    to raise OverflowError or ValueError. I'm not sure it's worth trying to
    do the right thing for all special value cases, but if you do want to
    follow 'spirit of IEEE 754' rules for special values, they should look
    something like this:

    (1) if the summands include a NaN, return a NaN
    (2) else if the summands include infinities of both signs, raise
    ValueError,
    (3) else if the summands include infinities of only one sign, return
    infinity with that sign,
    (4) else (all summands are finite) if the result is infinite, raise
    OverflowError. (The result can never be a NaN if all summands
    are finite.)

    Note that some sums involving overflow won't be computed correctly:
    e.g. [1e308, 1e308, -1e308] will likely sum to infinity instead of
    returning 1e308. I don't see any easy way around this, and it's
    probably not worth worrying about.

    (2) The algorithm is only guaranteed to work correctly assuming IEEE 754
    semantics. Python currently doesn't insist on IEEE 754 floating point,
    so what should happen on non IEEE-754 machines?

    (3) Rather than duplicating the math module code in cmathmodule.c, why
    not have the complex version simply sum real parts and imaginary parts
    separately, using a version of the code that's already in mathmodule.c?

    @mdickinson
    Copy link
    Member

    One more question:

    What are the use cases for an exact summation algorithm? That is, in what
    situations does one care about exactness rather than simply accuracy? I
    know that loss of accuracy is a problem in things like numeric integration
    routines, but something like Kahan summation (faster and simpler, but not
    exact) usually takes care of that.

    @MrJean1
    Copy link
    Mannequin Author

    MrJean1 mannequin commented May 12, 2008

    Mark,
    Thank you very much for your comments. Here is my initial response to the
    first 3.

    (1) Attached is an attempt to address the 1st issue (just the
    mathmodule). The macros PyFPE_START_PROTECT/_END_PROTECT have been moved
    outside the main loop and the errno is set following the IEEE 754 rules as
    you suggested.

    One related issue is testing these, how can a NaN and +/-Infinity float
    object be created in Python?

    (2) On your 2nd comment, supporting non-IEEE floating point, perhaps the
    Kahan method should be used in that case. If so, the next question is how
    to detect that? There are two symbols in pyconfig.h HAVE_IEEEFP_H and
    HAVE_LIBIEEE. Are those the proper ones to determine IEEE floating point
    support?

    (3) On the 3rd comment, Raymond and I did discus using a single function to
    be called by the math and cmath modules. The question is where should that
    function reside? The math and cmath modules are not the right place since
    both are loadable modules. It will have to be somewhere inside the Python
    main/core.

    Also, depending on the implementation of that function, it may require
    iterating the complex sequence twice. And that will force the C complex
    numbers to be created twice by the PyComplex_AsCComplex() call. Would that
    be a concern?

    /Jean Brouwers

    On Sun, May 11, 2008 at 1:49 PM, Mark Dickinson <report@bugs.python.org>
    wrote:

    Mark Dickinson <dickinsm@gmail.com> added the comment:

    One more question:

    What are the use cases for an exact summation algorithm? That is, in what
    situations does one care about exactness rather than simply accuracy? I
    know that loss of accuracy is a problem in things like numeric integration
    routines, but something like Kahan summation (faster and simpler, but not
    exact) usually takes care of that.


    Tracker <report@bugs.python.org>
    <http://bugs.python.org/issue2819\>


    @MrJean1
    Copy link
    Mannequin Author

    MrJean1 mannequin commented May 12, 2008

    My apologies for the messy post. I replied to the email instead of
    posting my response.

    /Jean Brouwers

    PS) Attached is *an* example of the math_sum() and cmath_sum() functions
    using the same, shared function float_sum(). Perhaps, that resides in
    Objects/floatobject.c?

    @mdickinson
    Copy link
    Member

    One related issue is testing these, how can a NaN and +/-Infinity
    float object be created in Python?

    In 2.6 and 3.0 (but not 2.5 and older), float('nan'), float('inf') and
    float('-inf') should all work reliably across platforms (or at least
    those platforms that support infs and nans). If they don't it's a bug.

    (2) On your 2nd comment, supporting non-IEEE floating point, perhaps
    the Kahan method should be used in that case. If so, the next
    question is how to detect that?

    Actually, I think you could probably just leave the algorithm exactly as
    it is, but put a warning in the documentation that the exactness only
    applies in the presence of IEEE 754 semantics. Practically everybody's
    on an IEEE 754 platform anyway.

    There are two symbols in pyconfig.h HAVE_IEEEFP_H and
    HAVE_LIBIEEE. Are those the proper ones to determine IEEE floating
    point support?

    I'm not sure that either of these is the right thing. Neither is
    defined on my MacBook, for example.

    (3) On the 3rd comment, Raymond and I did discus using a single
    function to be called by the math and cmath modules.

    I think you're right that it's easier to just duplicate the code.
    It's a nice feature that this function only has to pass once through the
    data, and it doesn't seem worth losing that to save a little bit of code
    duplication.

    I still wonder whether there's a way to avoid incorrectly signaling
    overflow in the case where the result is finite. (e.g. sum([1e308,
    1e308, -1e308])).

    @MrJean1
    Copy link
    Mannequin Author

    MrJean1 mannequin commented May 12, 2008

    It turns out, float('nan') creates a Nan and float('[+|-]inf') creates a
    [signed] Infinity object. That answers my earlier question.

    /Jean Brouwers

    @MrJean1
    Copy link
    Mannequin Author

    MrJean1 mannequin commented May 12, 2008

    The current results are quite "interesting"

    >>> math.sum([1e308, 1e308, -1e308])
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    OverflowError: math range error
    
    >>> math.sum([1e308, -1e308, 1e308])
    1e+308

    Handling this case would require holding the finite, prior sum at an
    intermediate overflow in a separate array, say overflows. Then, add the
    partials which may create additional overflows. Finally, keep adding
    the overflows (accurately?) until none remain or until none can be added
    without overflow.

    /Jean Brouwers

    @MrJean1
    Copy link
    Mannequin Author

    MrJean1 mannequin commented May 12, 2008

    There may be another reason to use a single summation function. The
    summation functions does need (a copy of) the is_error() function from
    the math module.

    The cmath module has a similar function called math_error() which
    slightly different from is_error().

    It would be better, more consistent if both modules used the same
    function, say is_error() moved** to file Object/floatobject.c

    Then, the math and cmath module can use ist_error() instead of each
    their own. Also, the summation function can use it and function
    float_pow() in floatobject.c could.

    /Jean Brouwers

    **) and renamed to e.g. float_error().

    @rhettinger
    Copy link
    Contributor

    Rather keep it in mathmodule.c, not floatobject.c.
    We should keep extension code out of the core types.

    @rhettinger
    Copy link
    Contributor

    When you need full precision, the Kahan approach helps but doesn't make
    guarantees and can sometimes hurt (it makes some assumptions about the
    data). One use case in is computing stats like a mean where many of
    the larger magnitude entries tend to cancel out but only after clipping
    bits off of the lower magnitude components.

    @mdickinson
    Copy link
    Member

    Here's (msum.py) an example in Python of one fairly straightforward way of
    dealing with overflow correctly, without needing more than one pass
    through the data, and without significant slowdown in the normal case.
    (The Python code is needlessly inefficient in places, notably in that
    partials[1:] creates a new list; this is obviously not a problem in C.)

    The idea is essentially just to maintain the sum modulo integer multiples
    of 2.**1024. partials[0] is reserved for keeping track of the current
    multiple of 2.**!024. So at each stage, the sum so far is
    sum(partials[1:], 0.0) + 2.**1024 * partials[0].

    I'm 97.3% convinced that the proof of correctness goes through: it's
    still true with this modification that partials always consists of
    nonadjacent, nonzero values of increasing magnitude. One of the keys to proving this is to note that for any value x between 2**1023 and 2**1024,
    both x-2**1023 and x-2**1024 are exactly representable.

    ---

    There's one more 'nice-to-have' that I think should be considered: it
    would be nice if the result of msum were always correctly rounded. One
    aspect of correct rounding is that it provides a guarantee that the sum is
    independent of the order of the summands, so

    msum(list) == msum(sorted(list))

    would hold true.

    @MrJean1
    Copy link
    Mannequin Author

    MrJean1 mannequin commented May 16, 2008

    Two tests failed with Python 2.6a3 on MacOS X Intel.

    Test 11 failed: 9007199254740992.0 vs 9007199254740991.0 expected for
    [9007199254740992.0, -0.5, -5.5511151231257827e-17].

    Test 12 failed: inf vs 1.7976931348623157e+308 expected for
    [8.9884656743115785e+307, -1.0, 8.9884656743115795e+307].

    /Jeab Brouwers

    @mdickinson
    Copy link
    Member

    Test 11 failed: 9007199254740992.0 vs 9007199254740991.0 expected for
    [9007199254740992.0, -0.5, -5.5511151231257827e-17].

    Yes: that's the lack of correct rounding rearing its ugly head...

    Test 12 failed: inf vs 1.7976931348623157e+308 expected for
    [8.9884656743115785e+307, -1.0, 8.9884656743115795e+307].

    I'm still trying to work out how to get around this one; again, the
    result's not out by much, but it would be nice to be able to guarantee
    correctly rounded answers *all* the time, instead of just *most* of the
    time.

    @mdickinson
    Copy link
    Member

    Ensuring correct rounding isn't as onerous as I expected it to be.
    crsum.py is a snippet of Python code showing how to add nonadjacent floats
    and get the correctly rounded result.

    @mdickinson
    Copy link
    Member

    Okay, just to show it's possible:

    Here (msum4.py) is a modified version of Raymond's recipe that deals
    correctly with:

    (1) intermediate overflows
    (2) special values (infs and nans) in the input, and
    (3) always gives correctly rounded results.

    The file contains more tests, and a proof of correctness. The algorithm
    still makes only a single pass through the given iterable, and there
    should be minimal slowdown in the common case. It's still only 60-70
    lines of Python code, so I don't think it would be unreasonable to aim
    to include these modifications in the C version.

    @MrJean1
    Copy link
    Mannequin Author

    MrJean1 mannequin commented May 18, 2008

    I intend to submit a C version of msum4 shortly.

    @MrJean1
    Copy link
    Mannequin Author

    MrJean1 mannequin commented May 18, 2008

    Attached is the patch for the the mathmodule.c file of Python 2.6a3
    containing the C version of the msum() function from Marks's msum4.py
    Python implementation.

    Please review the C code, in particular the setting of _do_sum_pow_2 for
    FLT_RADIX not equal to 2.

    The results of the C and Python versions match (on 32-bit MacOS X
    Intel), using the test_math_sum4.py script. That includes all the tests
    from msum4.py and from Raymond's recipe.

    More testing and the cmath version are still to be done.

    /Jean Brouwers

    @MrJean1
    Copy link
    Mannequin Author

    MrJean1 mannequin commented May 18, 2008

    Attached are updated patches for both the mathmodule.c and cmathmodule.c
    files for Python 2.6a3 and for the test_math_sum4.py script. Please
    ignore the ones posted earlier.

    @MrJean1
    Copy link
    Mannequin Author

    MrJean1 mannequin commented May 19, 2008

    The tests of the test_math_sum4.py script all pass with the latest math-
    and cmathmodule.c files for 4 different builds of Python 2.6a3:

    • GNU gcc 4.0.1 on MacOS X 10.4.11 (Intel Core Duo), 32-bit
    • GNU gcc 4.1.2 on RHEL 3 update 7 (Opteron), 64-bit
    • Sun C 5.8 on Solaris 10 (Opteron), both 32- and 64-bit

    Also interesting might be that the tests run 25+ times faster with the C
    math.sum compared to the Python implementation of msum().

    /Jean Brouwers

    @mdickinson
    Copy link
    Member

    The latest mathmodule patch (file 10371) looks pretty good to me. I
    haven't looked at the complex version yet. A few comments:

    • for setting _do_sum_pow_2 I think you should use ldexp instead of pow;
      there's a much greater chance of pow giving inexact results.

    • trying to support FLT_RADIX != 2 seems futile; for a start, the whole
      algorithm and its proof would need to be reexamined. Any code specific to
      FLT_RADIX != 2 would be very difficult to test---all of the Python
      buildbots are IEEE 754, for example. I'm not sure what the best solution
      is: perhaps math.sum should only be included when FLT_RADIX = 2, or (even
      better) when IEEE 754 is in use. It shouldn't be too hard to add autoconf
      tests to detect the float format.

    • there are long lines in _do_sum, in violation of PEP-7; maybe
      factoring out the memory-management code for the partials array would
      help?

    • it would be nice to have some brief comments before each of the _do_sum
      functions indicating its purpose.

    @MrJean1
    Copy link
    Mannequin Author

    MrJean1 mannequin commented May 19, 2008

    Here is another patch for the mathmodule.c and cmathmodule.c files updated
    with the suggested and other modifications. Note, if FLT_RADIX is not 2,
    the sum functions still exist but will raise a NotImplementedError.

    An update of the test script is also attached. The tests all pass on the
    same 4 Python 2.6a3 builds mentioned before.

    /Jean Brouwers

    @MrJean1
    Copy link
    Mannequin Author

    MrJean1 mannequin commented May 22, 2008

    Attached is revision 11 of the mathmodule.c patch for Python 2.6a3.

    This one includes Raymond's full precision summation, Mark's rounding
    partials addition and correct non-finites and error handling.

    However, intermediate overflow will raise an OverflowError and a
    FLT_RADIX not equal 2 cause a NotImplementedError.

    An updated test_math_sum11.py script is also attached. All test cases
    pass on 5 different builds of Python 2.6a3:

    • 32-bit MacOS X 10.4.11 (Intel) with gcc 4.0.1
    • 32-bit MacOS X 10.3.9 (PPC) with gcc 3.3
    • 64-bit RHEL 3 update 7 (Opteron) using gcc 4.1.2
    • 32- and 64-bit Solaris 10 (Opteron) with Sun C 5.8.

    /Jean Brouwers

    @MrJean1
    Copy link
    Mannequin Author

    MrJean1 mannequin commented May 22, 2008

    Here is rev 12 of the mathmodule.c patch. It is the same as rev 11 but
    with additional code removed as requested:

    • no FLT_RADIX 2 check
    • no errno illustration in _do_sum_add2()
    • no _do_sum() callback function argument
    • no option 'start' argument for math.sum.

    The test_math.sum12.py script and results are the same as rev 11.

    /Jean Brouwers

    @MrJean1
    Copy link
    Mannequin Author

    MrJean1 mannequin commented May 23, 2008

    Here is another, cleaner revision 19 of the same mathmodule.c patch and
    the corresponding test_math_sum19.py script.

    /Jean Brouwers

    @rhettinger
    Copy link
    Contributor

    Nice work Jean. Marking the patch accepted. Mark please go ahead with
    commit.

    Once the commit has settled for a couple of days, go ahead with a
    separate patch to cover the rest of 754R logic for special values.

    After that one settles, close this issue and open a new one for a
    comparable patch in cmath.

    @rhettinger rhettinger assigned mdickinson and unassigned rhettinger May 23, 2008
    @mdickinson
    Copy link
    Member

    math module patch committed, r63542.

    I'm still working on converting the tests to the unittest framework.

    @mdickinson
    Copy link
    Member

    Tests committed in r63543

    @mdickinson
    Copy link
    Member

    Here's a patch that fixes the problems with math.sum on x86 machines that
    aren't using SSE2.

    On these machines, the patched version of math.sum does almost the entire
    calculation using long doubles instead of doubles. Then in the final
    summation, the top long double is split into a double and a residue long
    double.

    @mdickinson
    Copy link
    Member

    Here's a patch giving an alternative implementation of math.fsum; it's
    based on Tim Peter's suggestions, works mostly with integer arithmetic,
    and so bypasses problems with double rounding and extended precision
    floats.

    The patch is experimental: it doesn't have sufficient tests, has no
    documentation, and it adds math.fsum alongside the current math.sum, to
    make it easy to compare the two implementations.

    On my MacBook, math.fsum is a factor of 2-3 times slower than math.sum.
    It's also longer and distinctly less elegant. So its only real
    advantage is that it should fix the difficulties on x86 hardware.

    We *really* need to sort math.sum out, one way or another, before the
    next beta. Georg recently discovered another problem on x86/Linux: see
    bpo-3421.

    Some options:

    (1) leave math.sum as it is, skip all tests on x86/Linux, and document
    the current behaviour.

    (2) investigate a version of math.sum that plays with the FPU control
    word to force 53-bit rounding (and round-half-even)

    (3) replace math.sum with the slower but (presumably) less erratic
    math.fsum, possibly just as a temporary measure. This would at least
    get all tests passing.

    Jean, Raymond: what do you think?

    @mdickinson
    Copy link
    Member

    Tests related to overflow, special-values, intermediate overflow, and
    results at or near the boundary of the floating-point range have been
    removed in r65258.

    @mdickinson
    Copy link
    Member

    See r65292 for more updates to the test-suite: I've replaced the Python
    version of msum by a version based on lsum in the original ASPN recipe.

    @mdickinson
    Copy link
    Member

    Minor code cleanups, and fixes to special-value handling in r65299

    @mdickinson
    Copy link
    Member

    Renamed math.sum to math.fsum (as previously discussed) in r65308.

    I think all that's left now is to add a note to the docs about the
    problems on x86.

    @mdickinson
    Copy link
    Member

    Added documentation note about x86 problems in r65315.

    Jean, Raymond: is it okay to close this issue now?

    @mdickinson
    Copy link
    Member

    Here (fsum8.patch) is a clean version of the alternative fsum algorithm.

    I'd like to push for using this in place of the existing algorithm.

    @tim-one
    Copy link
    Member

    tim-one commented Jul 30, 2008

    Mark, I don't currently have a machine with SVN and a compiler
    installed, so can't play with patches. I just want to note here that,
    if you're concerned about speed, it would probably pay to eliminate all
    library calls except one to frexp(). fmod() in particular is typically
    way too expensive, taking time proportional to the difference between
    its inputs' exponents (it emulates "long division" one bit at a time).
    While float->integer conversion is also "too expensive" on Pentium
    chips, multiply-and-convert-to-integer is probably a substantially
    cheaper way to extract bits from the mantissa frexp() delivers; note
    that this is how the Cookbook lsum() function gets bits, although it
    gets all 53 bits in one gulp while in C you'd probably want to get,
    e.g., 30 bits at a time.

    Something that's surprised me for decades is how slow platform ldexp()
    functions are too, given how little they do. Whenever you have a fixed
    offset E you'd like to add to an exponent, it's almost certainly very
    much faster to multiply by 2.0**E (when that's a compile-time constant)
    than to call ldexp(whatever, E).

    @mdickinson
    Copy link
    Member

    [Tim]

    if you're concerned about speed, it would probably pay to eliminate all
    library calls except one to frexp().

    Indeed it would! Here's a revised patch that avoids use of fmod. Speed is comparable with the current
    version. Here are some timings for summing random values on OS X/Core 2 Duo, on a non-debug build of the
    trunk. lsum is the patched version, msum is the version in the current trunk; timings are in seconds.

                                                |  lsum    |  msum
    

    ------------------------------------------------+----------+---------
    50000 random values in [0, 1) | 0.003091 | 0.002540
    50000 random values in [0, 1), sorted | 0.003202 | 0.003043
    50000 random values in [0, 1), reverse order | 0.003201 | 0.002587
    50000 random values in [-1, 1) | 0.003229 | 0.002829
    50000 random values in [-1, 1), sorted | 0.003183 | 0.002629
    50000 random values in [-1, 1), reverse order | 0.003195 | 0.002731
    50000 random exponential values | 0.003994 | 0.006178
    50000 random exponential values, sorted | 0.003713 | 0.007933
    50000 random exponential values, reverse order | 0.003714 | 0.002849

    Note that lsum doesn't suffer from the 'fragmentation of partials' problem that slows down msum for sorted
    datasets.

    I'll do some timings on Linux/x86 as well.

    @mdickinson
    Copy link
    Member

    Timings on x86/Linux are similar: the lsum-based version is around
    10% slower on average, 25% slower in the worst case, and significantly
    faster for the msum worst cases.

    There's probably still some snot left to optimize out, though. Some
    tempting ideas are:

    (1) to try using doubles instead of longs for the accumulator digits
    (with 51 or 52 bits of precision), and
    (2) to split each mantissa into (nearest_integer, fraction) instead
    of (next_smallest_integer, fraction), using rint or lrint.

    Anything else?

    @mdickinson
    Copy link
    Member

    Toned down note in docs (at Raymond's request) in r65366.

    While I'd really like to see an lsum-based version go in instead, I
    think it's too close to the release to make this change right now.

    There was also originally some talk of a complex math version. What do
    people think about this? Personally, I suspect that the use cases would
    be few and far between, and anyone who *really* needs a complex high-
    precision sum can just apply math.fsum to real and imaginary parts.
    (This is easier now that x.imag and x.real make sense for integers as
    well as floats.)

    @tim-one
    Copy link
    Member

    tim-one commented Aug 1, 2008

    Mark, changing API is something to avoid after beta cycles begin (so,
    e.g., it's late in the game to /add/ a new API, like a new method for
    complex summation). But there's no particular reason to fear changing
    implementation for an API that's already in. If the test suite supplies
    good coverage, the buildbots will catch significant portability snafus
    in plenty of time for the next release.

    The big attraction to the lsum-like approach for users is that it "just
    works", without requiring weasel words in the docs about overflows,
    underflows, or esoteric details about various HW FP quirks. Integer
    arithmetic is just plain better behaved across platforms ;-)

    @terryjreedy
    Copy link
    Member

    Comment: as a user (on x86) I agree with Tim. I could almost see the
    new version as a bugfix.

    Question: I see "math module patch committed, r63542" in May 22.  But in
    3.0b2, there is no math.fsum and math.sum seems to be a wrapper for
    builtin sum.  Has this not been forward ported (merged) yet?
    >>> math.sum
    <built-in function sum>
    >>> sum
    <built-in function sum>
    >>> math.sum is sum
    False

    @mdickinson
    Copy link
    Member

    Question: I see "math module patch committed, r63542" in May 22. But in
    3.0b2, there is no math.fsum and math.sum seems to be a wrapper for
    builtin sum. Has this not been forward ported (merged) yet?

    I'm pretty sure it *was* merged: math.sum should be the full-precision
    summation in both recent betas (2.6b2 and 3.0b2). Try comparing
    sum([1e100, 1, -1e100, -1]) and math.sum([1e100, 1, -1e100, -1])---they
    should produce -1.0 and 0.0 respectively.

    The name change to fsum only happened in the last few days.

    @terryjreedy
    Copy link
    Member

    I'm pretty sure it *was* merged: math.sum should be the full-precision
    summation in both recent betas (2.6b2 and 3.0b2). Try comparing
    sum([1e100, 1, -1e100, -1]) and math.sum([1e100, 1, -1e100, -1])---they
    should produce -1.0 and 0.0 respectively.

    They do. I realize now that two different built-in funcs in two
    different modules but with the same name will give the same
    representation. That is so unusual, I was not expecting it.

    The name change to fsum only happened in the last few days.

    Which will prevent the confusion I had ;-). Good idea.

    @mdickinson
    Copy link
    Member

    Here's a patch, in final form, that replaces fsum with an lsum-based
    implementation. In brief, the accumulated sum-so-far is represented in
    the form

    huge_integer * 2**(smallest_possible_exponent)

    and the huge_integer is stored in base 2**30, with a signed-digit representation (digits
    in the range [-2**29, 2**29).

    What are the chances of getting this in before next week's beta?

    I did toy with a base 2**52 version, with digits stored as doubles. It's attractive for
    a couple of reasons: (1) each 53-bit double straddles exactly two digits, which makes
    the inner loop more predictable and removes some branches, and (2) one can make some
    optimizations (e.g. being sloppy about transferring single-bit carries to the next digit
    up) based on the assumption that the input is unlikely to have more than 2**51 summands. The result was slightly faster on OS X, and slower on Linux; the final rounding code
    also became a little more complicated (as a result of not being able to do bit
    operations on a double easily), and making sure that things work for non IEEE doubles is
    a bit of a pain. So in the end I abandoned this approach.

    @mdickinson
    Copy link
    Member

    Closing this. Let's stick with what we have.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants