Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High accuracy math.hypot() #85685

Closed
rhettinger opened this issue Aug 10, 2020 · 39 comments
Closed

High accuracy math.hypot() #85685

rhettinger opened this issue Aug 10, 2020 · 39 comments
Labels
3.10 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@rhettinger
Copy link
Contributor

BPO 41513
Nosy @tim-one, @rhettinger, @terryjreedy, @mdickinson, @vstinner, @serhiy-storchaka
PRs
  • bpo-41513: Improve speed and accuracy of math.hypot() #21803
  • bpo-41513: More accurate hypot() #21916
  • bpo-41513: Save unnecessary steps in the hypot() calculation #21994
  • Further improve accuracy of math.hypot() #22013
  • bpo-41513: Improve hypot() accuracy with three separate accumulators #22032
  • bpo-41513: Expand comments #22123
  • bpo-41513: Add docs and tests for hypot() #22238
  • bpo-41513: Remove broken tests that fail on Gentoo #22249
  • bpo-41513: Make steps after the loop consistent with the loop body #22315
  • bpo-41513: Second attempt to add accuracy tests for math.hypot() #22327
  • bpo-41513: Improve order of adding fractional values. Improve variable names. #22368
  • Files
  • test_hypot.py: Script for measuring accuracy
  • test_hypot_accuracy.py: Cross version accuracy tester
  • hypot_flow.pdf: Data flow visualization
  • hypot.png: Updated data flow visualization
  • test_hypot_commutativity.py: Test commutativity
  • best_frac.py: Measure accuracy of hypot() variants
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-09-15.00:15:26.344>
    created_at = <Date 2020-08-10.01:14:14.750>
    labels = ['type-feature', 'library', '3.10']
    title = 'High accuracy math.hypot()'
    updated_at = <Date 2020-10-02.00:19:15.829>
    user = 'https://github.com/rhettinger'

    bugs.python.org fields:

    activity = <Date 2020-10-02.00:19:15.829>
    actor = 'rhettinger'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-09-15.00:15:26.344>
    closer = 'rhettinger'
    components = ['Library (Lib)']
    creation = <Date 2020-08-10.01:14:14.750>
    creator = 'rhettinger'
    dependencies = []
    files = ['49380', '49399', '49424', '49439', '49448', '49484']
    hgrepos = []
    issue_num = 41513
    keywords = ['patch']
    message_count = 39.0
    messages = ['375089', '375096', '375166', '375437', '375440', '375442', '375447', '375448', '375483', '375489', '375498', '375500', '375501', '375591', '375600', '375623', '375634', '375635', '375651', '375685', '375721', '375762', '375763', '375770', '375829', '375865', '375866', '375867', '375868', '375896', '376068', '376098', '376470', '376869', '376910', '376919', '376920', '377242', '377354']
    nosy_count = 6.0
    nosy_names = ['tim.peters', 'rhettinger', 'terry.reedy', 'mark.dickinson', 'vstinner', 'serhiy.storchaka']
    pr_nums = ['21803', '21916', '21994', '22013', '22032', '22123', '22238', '22249', '22315', '22327', '22368']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue41513'
    versions = ['Python 3.10']

    @rhettinger
    Copy link
    Contributor Author

    I'd like to resurrect Serhiy's idea for using power-of-two scaling in math.hypot() rather than dividing each coordinate by the maximum value.

    I drafted a patch. It looks to be a little faster than what we have now and is generally (but not always) more accurate.

    For accuracy, the current approach has the advantage that for the largest coordinate, (x/max)**2 is exactly 1.0. If that element is much larger than the others, it overpowers the 1/2 ulp error in each of the other divisions.

    In contrast, scaling by a power of two is always exact, but the squaring of the largest scaled coordinate is no longer exact.

    The two approaches each have some cases that are more accurate than the other. I generated 10,000 samples. In 66% of the cases, the two results agree. In 26% of the cases, scaling by a power of two was more accurate. In the remaining 8%, the division by max was more accurate. The standard deviation of relative errors was better using power-of-two scaling. These results were consistent whether using 2, 3, or 20 dimensions.

    As expected, multiplying by a power of two was a bit faster than dividing by an arbitrary float.

    @rhettinger rhettinger added 3.10 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Aug 10, 2020
    @rhettinger
    Copy link
    Contributor Author

    Added a revised script for testing accuracy measured in ULPs. It shows an improvement for all dimensions tested, with the best for 2 dimensions. Also, the maximum error is now 1 ulp; formerly, it was 2 ulps.

    ======== 2 dimensions ========
    div-by-max: [(-2.0, 14), (-1.0, 1721), (0.0, 6533), (1.0, 1722), (2.0, 10)]
    scaled-by-2: [(-1.0, 841), (0.0, 8339), (1.0, 820)]

    ======== 3 dimensions ========
    div-by-max: [(-2.0, 3), (-1.0, 1525), (0.0, 6933), (1.0, 1535), (2.0, 4)]
    scaled-by-2: [(-1.0, 739), (0.0, 8515), (1.0, 746)]

    ======== 5 dimensions ========
    div-by-max: [(-2.0, 2), (-1.0, 1377), (0.0, 7263), (1.0, 1358)]
    scaled-by-2: [(-1.0, 695), (0.0, 8607), (1.0, 698)]

    ======== 10 dimensions ========
    div-by-max: [(-2.0, 13), (-1.0, 1520), (0.0, 6873), (1.0, 1580), (2.0, 14)]
    scaled-by-2: [(-1.0, 692), (0.0, 8605), (1.0, 703)]

    ======== 20 dimensions ========
    div-by-max: [(-1.0, 1285), (0.0, 7310), (1.0, 1405)]
    scaled-by-2: [(-1.0, 555), (0.0, 8822), (1.0, 623)]

    @mdickinson
    Copy link
    Member

    Fine by me in principle; I haven't had a chance to look at the code yet.

    While we're doing this, any chance we could special-case the two-argument hypot to use the libm hypot directly? On many platforms the libm hypot will be correctly rounded, or close to it. There was a noticeable accuracy regression for two-argument hypot when hypot gained the ability to support multiple arguments.

    @rhettinger
    Copy link
    Contributor Author

    While we're doing this, any chance we could special-case
    the two-argument hypot to use the libm hypot directly?

    Yes, that is an easy amendment (see below).

    I just tried it out on the macOs build but didn't see a change in accuracy from the current PR which uses lossless power-of-two-scaling and Neumaier summation.

    In GCC's libquadmath implementation, the comments say that the error is less than 1 ulp, falling short of being correctly rounded within ±½ ulp.

    If the platform hypots have the nearly the same accuracy as the current PR, it may make sense to skip the special case and opt for consistent cross-platform results.

    ==================================================================

    $ git diff
    diff --git a/Modules/mathmodule.c b/Modules/mathmodule.c
    index d0621f59df..3a42ea5318 100644
    --- a/Modules/mathmodule.c
    +++ b/Modules/mathmodule.c
    @@ -2457,6 +2457,10 @@ vector_norm(Py_ssize_t n, double *vec, double max, int found_nan)
         if (max == 0.0 || n <= 1) {
             return max;
         }
    +    if (n == 2) {
    +        /* C library hypot() implementations tend to be very accurate */
    +        return hypot(vec[0], vec[1]);
    +    }
         frexp(max, &max_e);
         if (max_e >= -1023) {
             scale = ldexp(1.0, -max_e)
    $ gcc --version
    Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/4.2.1
    Apple clang version 11.0.3 (clang-1103.0.32.62)
    Target: x86_64-apple-darwin19.6.0
    Thread model: posix
    InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
    
    $ ./python.exe test_hypot.py

    ======== 2 dimensions ========
    platform hypot(): [(-1.0, 8398), (0.0, 83152), (1.0, 8450)]
    scaled-by-2 : [(-1.0, 8412), (0.0, 83166), (1.0, 8422)]

    @rhettinger
    Copy link
    Contributor Author

    This paper addresses the topic directly: https://arxiv.org/pdf/1904.09481.pdf

    I tried to reproduce the results shown in Table 1 of the paper. For clib hypot(), they get 1 ulp errors 12.91% of the time. On my build, using the same gaussian distribution, I get 14.08% for both the clang hypot() and for the current version of the PR, showing that the two are fundamentally the same.

    @rhettinger
    Copy link
    Contributor Author

    Per the referenced paper, here's what is involved in further reducing error:

        def hypot(*coords):
            # Corrected, unfused: https://arxiv.org/pdf/1904.09481.pdf
            # Simplified version omits scaling and handling of wide ranges.
            # Only handle the 2-dimensional cases.  
            # Has 1 ulp error 0.88% of the time (0.54% per the paper).
            # Can be reduced to 0.00% with a reliable fused multiply-add.
            a, b = map(fabs, coords)
            h = sqrt(a*a + b*b)
            if h <= 2*b:
                delta = h - b
                x = a*(2*delta - a) + (delta - 2*(a - b))*delta
            else:
                delta = h - a
                x = 2*delta*(a - 2*b) + (4*delta - b)*b + delta*delta
            return h - x/(2*h)

    @rhettinger
    Copy link
    Contributor Author

    Update:

        def hypot(*coords):
            # Corrected, unfused: https://arxiv.org/pdf/1904.09481.pdf
            # Simplified version omits scaling and handling of wide ranges
            # Has 1 ulp error 0.44% of the time (0.54% per the paper).
            # Can be reduced to 0% with a fused multiply-add.
            a, b = map(fabs, coords)
            if a < b:
                a, b = b, a
            h = sqrt(a*a + b*b)
            if h <= 2*b:
                delta = h - b
                x = a*(2*delta - a) + (delta - 2*(a - b))*delta
            else:
                delta = h - a
                x = 2*delta*(a - 2*b) + (4*delta - b)*b + delta*delta
            return h - x/(2*h)

    @tim-one
    Copy link
    Member

    tim-one commented Aug 15, 2020

    Cute: for any number of arguments, try computing h**2, then one at a time subtract a**2 (an argument squared) in descending order of magnitude. Call that (h**2 - a1**2 - a2**2 - ...) x.

    Then

    h -= x/(2*h)
    

    That should reduce errors too, although not nearly so effectively, since it's a cheap way of estimating (& correcting for) the discrepancy between sum(a**2) and h**2.

    Note that "descending order" is important: it's trying to cancel as many leading bits as possible as early as possible, so that lower-order bits come into play.

    Then again ... why bother? ;-)

    @tim-one
    Copy link
    Member

    tim-one commented Aug 15, 2020

    ...
    one at a time subtract a**2 (an argument squared) in descending
    order of magnitude
    ...

    But that doesn't really help unless the sum of squares was computed without care to begin with. Can do as well by skipping that but instead computing the original sum of squares in increasing order of magnitude.

    Cheapest way I know of that "seemingly always" reproduces the Decimal result (when that's rounded back to float) combines fsum(), Veltkamp splitting, and the correction trick from the paper:

        def split(x, T27=ldexp(1.0, 27)+1.0):
            t = x * T27
            hi = t - (t - x)
            lo = x - hi
            assert hi + lo == x
            return hi, lo
    
        # result := hypot(*xs)
        parts = []
        for x in xs:
            a, b = split(x)
            parts.append(a*a)
            parts.append(2.0*a*b)
            parts.append(b*b)
        result = sqrt(fsum(parts))
        a, b = split(result)
        parts.append(-a*a)
        parts.append(-2.0*a*b)
        parts.append(-b*b)
        x = fsum(parts) # best float approx to sum(x_i ** 2) - result**2
        result += x / (2.0 * result)

    @rhettinger
    Copy link
    Contributor Author

    Cheapest way I know of that "seemingly always" reproduces
    the Decimal result (when that's rounded back to float)
    combines fsum(), Veltkamp splitting, and the correction
    trick from the paper.

    That's impressive. Do you think this is worth implementing? Or should we declare victory with the current PR which is both faster and more accurate than what we have now?

    Having 1-ulp error 17% of the time and correctly rounded 83% of the time is pretty darned good (and on par with C library code for the two-argument case).

    Unless we go all-out with the technique you described, the paper shows that we're already near the limit of what can be done by trying to make the sum of squares more accurate, "... the correctly rounded square root of the correctly rounded a**2+b**2 can still be off by as much as one ulp. This hints at the possibility that working harder to compute a**2+b**2 accurately may not be the best path to a better answer".

    FWIW, in my use cases, the properties that matter most are monotonicity, commutativity, cross-platform portability, and speed. Extra accuracy would nice to have but isn't essential and would likely never be noticed.

    @tim-one
    Copy link
    Member

    tim-one commented Aug 16, 2020

    Oh no - I wouldn't use this as a default implementation. Too expensive. There is one aspect you may find especially attractive, though: unlike even the Decimal approach, it should be 100% insensitive to argument order (no info is lost before fsum() is called, and fsum's output should be unaffected by summand order).

    @rhettinger
    Copy link
    Contributor Author

    New changeset fff3c28 by Raymond Hettinger in branch 'master':
    bpo-41513: Improve speed and accuracy of math.hypot() (GH-21803)
    fff3c28

    @rhettinger
    Copy link
    Contributor Author

    If someone thinks there is a case for using the C library hypot() for the two-argument form, feel free to reopen this.

    Likewise, if someone thinks there is a case for doing the expensive but more accurate algorithm, go ahead and reopen this.

    Otherwise, I think we're done for now :-)

    @rhettinger
    Copy link
    Contributor Author

    Here's a much cheaper way to get correctly rounded results almost all of the time. It uses Serhiy's idea for scaling by a power of two, Tim's idea for Veltkamp-Dekker splitting, my variant of Neumaier summation, and the paper's differential correction. Together, these allow the algorithm to simulate 128-bit quad precision throughout.

    # Veltkamp-Dekker splitting
    
        def split(x, T27=ldexp(1.0, 27)+1.0):
            t = x * T27
            hi = t - (t - x)
            lo = x - hi
            assert hi + lo == x
            return hi, lo
    
        # Variant of Neumaier summation specialized
        # for cases where |csum| >= |x| for every x.
        # Establishes the invariant by setting *csum*
        # to 1.0, only adding *x* values of smaller
        # magnitude, and subtracting 1.0 at the end.
        
        def zero():  
            return 1.0, 0.0     # csum, frac
        
        def add_on(x, state):
            csum, frac = state
            assert fabs(csum) >= fabs(x)
            oldcsum = csum
            csum += x
            frac += (oldcsum - csum) + x
            return csum, frac
    
        def to_float(state):
            csum, frac = state
            return csum - 1.0 + frac
    
        def hypot(*xs):
            max_value = max(xs, key=fabs)
            _, max_e = frexp(max_value)
            scalar = ldexp(1.0, -max_e)        
            parts = zero()
            for x in xs:
                x *= scalar
                a, b = split(x)
                parts = add_on(a*a, parts)
                parts = add_on(2.0*a*b, parts)
                parts = add_on(b*b, parts)
            result = sqrt(to_float(parts))
            a, b = split(result)
            parts = add_on(-a*a, parts)
            parts = add_on(-2.0*a*b, parts)
            parts = add_on(-b*b, parts)
            x = to_float(parts) # best float approx to sum(x_i ** 2) - result**2
            result += x / (2.0 * result)
            return result / scalar

    @rhettinger
    Copy link
    Contributor Author

    Hmm, I tried-out a C implementation and the timings weren't bad, 60% slower in exchange for always being correctly rounded. Do you all think that would be worth it?

    # New correctly rounded
    $ ./python.exe -m timeit -r11 -s 'from math import hypot' 'hypot(1.7, 15.5, 0.3, 5.2)'
    2000000 loops, best of 11: 101 nsec per loop
    $ ./python.exe -m timeit -r11 -s 'from math import hypot' 'hypot(1.7, 15.81)'
    5000000 loops, best of 11: 82.8 nsec per loop

    # Current baseline for Python 3.10
    $ ./python.exe -m timeit -r11 -s 'from math import hypot' 'hypot(1.7, 15.5, 0.3, 5.2)'
    5000000 loops, best of 11: 65.6 nsec per loop
    $ ./python.exe -m timeit -r11 -s 'from math import hypot' 'hypot(1.7, 15.81)'
    5000000 loops, best of 11: 56.6 nsec per loop

    # Current baseline for Python 3.9
    $ ./python.exe -m timeit -r11 -s 'from math import hypot' 'hypot(1.7, 15.5, 0.3, 5.2)'
    5000000 loops, best of 11: 72.2 nsec per loop
    ~/npython $ ./python.exe -m timeit -r11 -s 'from math import hypot' 'hypot(1.7, 15.81)'
    5000000 loops, best of 11: 56.2 nsec per loop

    @rhettinger rhettinger reopened this Aug 18, 2020
    @rhettinger rhettinger reopened this Aug 18, 2020
    @tim-one
    Copy link
    Member

    tim-one commented Aug 18, 2020

    About speed, the fsum() version I posted ran about twice as fast as the all-Decimal approach, but the add_on() version runs a little slower than all-Decimal. I assume that's because fsum() is coded in C while the add_on() prototype makes mounds of additional Python-level function calls. Using released Windows 3.8.5 CPython and on argument vectors of length 50.

    Is it worth it? Up to you ;-) There are good arguments to be made in favor of increased accuracy, and in favor of speed.

    About "correctly rounded", such a claim can only be justified by proof, not by testing (unless testing exhaustively covers the entire space of inputs). Short of that, best you can claim is stuff like "max error < 0.51 ulp" (or whatever other bound can be proved).

    @tim-one
    Copy link
    Member

    tim-one commented Aug 19, 2020

    Here's a "correct rounding" fail for the add_on approach:

    xs = [16.000000000000004] * 9

    decimal result = 48.00000000000001065814103642
    which rounds to float 48.000000000000014

    add_on result: 48.00000000000001

    That's about 0.5000000000026 ulp too small - shocking ;-)

    The fsum approach got this one right, but has similar failures on other inputs of the same form; i.e.,

    [i + ulp(i)] * 9
    

    for various integers i.

    @tim-one
    Copy link
    Member

    tim-one commented Aug 19, 2020

    That's about 0.5000000000026 ulp too small - shocking ;-)

    Actually, that's an illusion due to the limited precision of Decimal. The rounded result is exactly 1/2 ulp away from the infinitely precise result - it's a nearest/even tie case.

    To make analysis easy, just note that the mathematical hypot(*([x] * 9)) is exactly 3*x. For x = 16 + ulp(16), 3*x moves to a higher binade and has two trailing one bits. The last has to be rounded away, rounding up to leave the last retained bit even. Any source of error, no matter how tiny, can be enough so that doesn't happen.

    @tim-one
    Copy link
    Member

    tim-one commented Aug 19, 2020

    Here's an amusing cautionary tale: when looking at correct-rounding failures for the fsum approach, I was baffled until I realized it was actually the _decimal_ method that was failing. Simplest example I have is 9 instances of b=4.999999999999999, which is 1 ulp less than 5. Of course the best-possible hypot is 3*b rounded, but that's not what default decimal precision computes (when rounded back). In fact, it bounces between "off by 1" and "correct" until decimal precision exceeds 96:

    >>> pfailed = []
    >>> for p in range(28, 300):
    ...     with decimal.localcontext() as c:
    ...         c.prec = p
    ...         if float(sum([decimal.Decimal(b) ** 2] * 9).sqrt()) != 3*b:
    ...             pfailed.append(p)
    >>> pfailed
    [28, 29, 30, 31, 34, 36, 37, 39, 41, 42, 43, 44, 45, 57, 77, 96]

    In a nutshell, it takes in the ballpark of 50 decimal digits to represent b exactly, then twice as many to represent its square exactly, then another to avoid losing a digit when summing 9 of those. So if decimal prec is less than about 100, it can lose info.

    So the fsum method is actually more reliable (it doesn't lose any info until the fsum step).

    @tim-one
    Copy link
    Member

    tim-one commented Aug 20, 2020

    Just FYI, if the "differential correction" step seems obscure to anyone, here's some insight, following a chain of mathematical equivalent respellings:

    result + x / (2 * result) =
    result + (sumsq - result**2) / (2 * result) =
    result + (sumsq - result**2) / result / 2 =
    result + (sumsq / result - result**2 / result) / 2 =
    result + (sumsq / result - result) / 2 =
    result + sumsq / result / 2 - result / 2 =
    result / 2 + sumsq / result / 2 =
    (result + sumsq / result) / 2

    I hope the last line is an "aha!" for you then: it's one iteration of Newton's square root method, for improving a guess "result" for the square root of "sumsq". Which shouldn't be too surprising, since Newton's method is also derived from a first-order Taylor expansion around 0.

    Note an implication: the quality of the initial square root is pretty much irrelevant, since a Newton iteration basically doubles the number of "good bits". Pretty much the whole trick relies on computing "sumsq - result**2" to greater than basic machine precision.

    @tim-one
    Copy link
    Member

    tim-one commented Aug 20, 2020

    My apologies if nobody cares about this ;-) I just think it's helpful if we all understand what really helps here.

    Pretty much the whole trick relies on computing
    "sumsq - result**2" to greater than basic machine
    precision.

    But why is that necessary at all? Because we didn't compute the sqrt of the sum of the squares to begin with - the input to sqrt was a _rounded_-to-double approximation to the sum of squares. The correction step tries to account for the sum-of-squares bits sqrt() didn't see to begin with.

    So, e.g., instead of doing

        result = sqrt(to_float(parts))
        a, b = split(result)
        parts = add_on(-a*a, parts)
        parts = add_on(-b*b, parts)
        parts = add_on(-2.0*a*b, parts)
        x = to_float(parts)
        result += x / (2.0 * result)

    it works just as well to do just

        result = float((D(parts[0] - 1.0) + D(parts[1])).sqrt())
        
    where D is the default-precision decimal.Decimal constructor. Then sqrt sees all the sum-of-squares bits we have (although the "+" rounds away those following the 28th decimal digit).

    @rhettinger
    Copy link
    Contributor Author

    My apologies if nobody cares about this ;-)

    I care :-)

    Am in crunch right now, so won't have a chance to work through it for a week or so.

    @tim-one
    Copy link
    Member

    tim-one commented Aug 21, 2020

    won't have a chance to work through it for a week or so

    These have been much more in the way of FYI glosses. There's no "suggestion" here to be pursued - just trying to get a deeper understanding of code already written :-)

    While I can concoct any number of cases where the add_on() method fails correct rounding, all the ones I dug into turned out to be "oops! went the wrong way" in nearest/even tie cases. By any non-anal measure, its accuracy is excellent.

    @terryjreedy
    Copy link
    Member

    Tim, I have been reading this, without examining all the details, to learn more about FP accuracy problems.

    @rhettinger
    Copy link
    Contributor Author

    I'm ready to move forward on this one. Have improved the assertions, explanations, and variable names. Hve also added references to show that the foundations are firm. After tens of millions of trials, I haven't found a single misrounding. I can't prove "correctly rounded" but am content with "high accuracy". The speed is comparable to what we had in 3.7, so I think it is a net win all around.

    @rhettinger rhettinger changed the title Scale by power of two in math.hypot() High accuracy math.hypot() Aug 23, 2020
    @rhettinger rhettinger changed the title Scale by power of two in math.hypot() High accuracy math.hypot() Aug 23, 2020
    @rhettinger
    Copy link
    Contributor Author

    All the review comments have been addressed. I think it is a good as it can get. Any objections to committing the PR?

    @tim-one
    Copy link
    Member

    tim-one commented Aug 24, 2020

    Do it! It's elegant and practical :-)

    @rhettinger
    Copy link
    Contributor Author

    New changeset 8e19c8b by Raymond Hettinger in branch 'master':
    bpo-41513: More accurate hypot() (GH-21916)
    8e19c8b

    @rhettinger
    Copy link
    Contributor Author

    Done!

    Thank you.

    @tim-one
    Copy link
    Member

    tim-one commented Aug 25, 2020

    One more implication: since the quality of the initial square root doesn't really much matter, instead of

        result = sqrt(to_float(parts))
        a, b = split(result)
        parts = add_on(-a*a, parts)
        parts = add_on(-2.0*a*b, parts)
        parts = add_on(-b*b, parts)
        x = to_float(parts)
        result += x / (2.0 * result)

    at the end, it should work just as well (in fact, probably a little better, due to getting more beneficial cancellation) to do:

        a = parts[0] - 1.0
        result = sqrt(a)
        x, y = split(result)
        result += (a - x*x - 2.0*x*y - y*y + parts[1]) / (2.0 * result)

    at the end. Although this crucially relies on the doing the last-line chain of subtracts and adds "left to right".

    @rhettinger
    Copy link
    Contributor Author

    New changeset 27de286 by Raymond Hettinger in branch 'master':
    bpo-41513: Save unnecessary steps in the hypot() calculation (bpo-21994)
    27de286

    @tim-one
    Copy link
    Member

    tim-one commented Aug 30, 2020

    About test_frac.py, I changed the main loop like so:

                got = [float(expected)] # NEW
                for hypot in hypots:
                    actual = hypot(*coords)
                    got.append(float(actual)) # NEW
                    err = (actual - expected) / expected
                    bits = round(1 / err).bit_length()
                    errs[hypot][bits] += 1
                if len(set(got)) > 1: # NEW
                    print(got) # NEW

    That is, to display every case where the four float results weren't identical.

    Result: nothing was displayed (although it's still running the n=1000 chunk, there's no sign that will change). None of these variations made any difference to results users actually get.

    Even the "worst" of these reliably develops dozens of "good bits" beyond IEEE double precision, but invisibly (under the covers, with no visible effect on delivered results).

    So if there's something else that speeds the code, perhaps it's worth pursuing, but we're already long beyond the point of getting any payback for pursuing accuracy.

    @rhettinger
    Copy link
    Contributor Author

    New changeset 67c998d by Raymond Hettinger in branch 'master':
    bpo-41513: Expand comments and add references for a better understanding (GH-22123)
    67c998d

    @rhettinger
    Copy link
    Contributor Author

    New changeset 457d4e9 by Raymond Hettinger in branch 'master':
    bpo-41513: Add docs and tests for hypot() (GH-22238)
    457d4e9

    @vstinner
    Copy link
    Member

    Multiple tests failed on x86 Gentoo Non-Debug with X 3.x:
    https://buildbot.python.org/all/#/builders/58/builds/73

    Example of failure:

    ======================================================================
    FAIL: testHypotAccuracy (test.test_math.MathTests) (hx='0x1.10106eb4b44a2p+29', hy='0x1.ef0596cdc97f8p+29')
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/buildbot/buildarea/cpython/3.x.ware-gentoo-x86.nondebug/build/Lib/test/test_math.py", line 855, in testHypotAccuracy
        self.assertEqual(hypot(x, y), z)
    AssertionError: 1184594898.793983 != 1184594898.7939832

    (I didn't investigate the bug, I just report the failure.)

    @vstinner vstinner reopened this Sep 14, 2020
    @vstinner vstinner reopened this Sep 14, 2020
    @rhettinger
    Copy link
    Contributor Author

    New changeset 95a8a0e by Raymond Hettinger in branch 'master':
    bpo-41513: Remove broken tests that fail on Gentoo (GH-22249)
    95a8a0e

    @rhettinger
    Copy link
    Contributor Author

    I saw that the Gentoo bot reacted badly to the new tests, so I reverted them.

    @rhettinger
    Copy link
    Contributor Author

    New changeset bc6b7fa by Raymond Hettinger in branch 'master':
    bpo-41513: Add accuracy tests for math.hypot() (GH-22327)
    bc6b7fa

    @rhettinger
    Copy link
    Contributor Author

    New changeset 438e9fc by Raymond Hettinger in branch 'master':
    bpo-41513: Improve order of adding fractional values. Improve variable names. (GH-22368)
    438e9fc

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants