classification
Title: Private _nth_root function loses accuracy
Type: behavior Stage: needs patch
Components: Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: steven.daprano Nosy List: haypo, mark.dickinson, ned.deily, rhettinger, serhiy.storchaka, steven.daprano, tim.peters, wolma
Priority: normal Keywords:

Created on 2016-08-14 02:26 by steven.daprano, last changed 2016-11-23 15:34 by wolma.

Files
File name Uploaded Description Edit
roots.py tim.peters, 2016-09-02 00:59 random-ish test driver
Messages (51)
msg272638 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2016-08-14 02:26
First reported by Martin Panter here:

http://bugs.python.org/issue27181#msg272488

I'm afraid I don't know enough about PowerPC to suggest a fix other than weakening the test on that platform.
msg272639 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2016-08-14 02:27
Tests fail on a Power PC buildbot:

http://buildbot.python.org/all/builders/PPC64LE%20Fedora%203.x/builds/1476/steps/test/logs/stdio
======================================================================
FAIL: testExactPowers (test.test_statistics.Test_Nth_Root) (i=29, n=11)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/shager/cpython-buildarea/3.x.edelsohn-fedora-ppc64le/build/Lib/test/test_statistics.py", line 1216, in testExactPowers
    self.assertEqual(self.nroot(x, n), i)
AssertionError: 29.000000000000004 != 29

======================================================================
FAIL: testExactPowersNegatives (test.test_statistics.Test_Nth_Root) (i=-29, n=11)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/shager/cpython-buildarea/3.x.edelsohn-fedora-ppc64le/build/Lib/test/test_statistics.py", line 1228, in testExactPowersNegatives
    self.assertEqual(self.nroot(x, n), i)
AssertionError: -29.000000000000004 != -29
msg272643 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-08-14 03:16
You can do no better than to consult with Uncle Timmy.
msg272649 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-08-14 06:02
The problem is more widespread than just Power PC. The same failures (+/-29) are also seen on the three s390x buildbots. There are multiple failures of testExactPowers(Negatives) on
http://buildbot.python.org/all/builders/x86%20Tiger%203.x/builds/11140/steps/test/logs/stdio

The first is
AssertionError: 9.000000000000002 != 9

An slightly different failure happens on AMD64 Free BSD and Open Indiana buildbots:
http://buildbot.python.org/all/builders/AMD64%20OpenIndiana%203.x/builds/11077/steps/test/logs/stdio
======================================================================
FAIL: testFraction (test.test_statistics.Test_Nth_Root)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/export/home/buildbot/64bits/3.x.cea-indiana-amd64/build/Lib/test/test_statistics.py", line 1247, in testFraction
    self.assertEqual(self.nroot(x**12, 12), float(x))
AssertionError: 1.1866666666666665 != 1.1866666666666668
msg272658 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2016-08-14 08:29
Steven: can you explain why you think your code *should* be giving exact results for exact powers? Do you have an error analysis that says that should be the case?

One issue here is that libm pow functions vary hugely in quality, so any algorithm that depends on ** or math.pow is going to be hard to prove anything about.

I think the tests should simply be weakened: it's unreasonable to expect perfect results in this case.
msg272663 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2016-08-14 10:44
On Sun, Aug 14, 2016 at 08:29:39AM +0000, Mark Dickinson wrote:
> Steven: can you explain why you think your code *should* be giving 
> exact results for exact powers? Do you have an error analysis that 
> says that should be the case?

No error analysis, only experimental results.

> One issue here is that libm pow functions vary hugely in quality, so 
> any algorithm that depends on ** or math.pow is going to be hard to 
> prove anything about.

pow() is just the starting point. An earlier version of the function 
started with an initial guess of x for the nth root of x, but it was 
very slow, and sometimes failed to converge in any reasonable time. By 
swapping to an initial guess calculated with pow(), I cut the number of 
iterations down to a much smaller number.

Because I'm not to worried about being super-fast, the nth root function 
goes through the following steps:

- use y = pow(x, 1/n) for an initial guess;
- if y**n == x, return y;
- improve the guess with the "Nth root algorithm";
- if that doesn't converge after a while, return y;
- finally, compare the epsilons abs(y**n - x) for the initial guess
  and improved version, returning whichever gives the smaller epsilon.

So I'm confident that nth_root() should never be worse than pow().

> I think the tests should simply be weakened: it's unreasonable to 
> expect perfect results in this case.

Okay. I'll weaken the tests.
msg272664 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2016-08-14 10:48
> - if y**n == x, return y;

Here's the problem, though: you're using the libm pow again in this test, so the result of this being True doesn't give tight information about how close y is to the nth root of x (and the test result may well differ from platform to platform).
msg272665 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2016-08-14 10:52
> - finally, compare the epsilons abs(y**n - x) for the initial guess
>   and improved version, returning whichever gives the smaller epsilon.
> 
> So I'm confident that nth_root() should never be worse than pow().

Same deal here: those aren't the actual errors; they're approximations to the errors, since the computations of the epsilons depends on (a) the usual floating-point rounding, and more significantly (b) the accuracy of the `y**n` computation. It's entirely possible that the value giving the smaller epsilon is actually the worse of the two approximations.
msg272667 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2016-08-14 11:50
On Sun, Aug 14, 2016 at 10:52:42AM +0000, Mark Dickinson wrote:
> Same deal here: those aren't the actual errors; they're approximations 
> to the errors, since the computations of the epsilons depends on (a) 
> the usual floating-point rounding, and more significantly (b) the 
> accuracy of the `y**n` computation. It's entirely possible that the 
> value giving the smaller epsilon is actually the worse of the two 
> approximations.

Let's call the two calculated roots y and w, where y is generated by 
pow() and w is the "improved" version, and the actual true 
mathematical root is r (which might fall between floats).

You're saying that it might turn out that abs(y**n - x) < abs(w**n - x), 
but abs(w - r) < abs(y - r), so I return the wrong calculated root.

I can see how that could happen, but not what to do about it. I'm 
tempted to say "that platform's pow() is weird, and the best I can do 
is return whatever it returns". That way you're no worse than if I just 
used pow() and didn't try to improve the result at all.

I think this would be a problem if I wanted to make nth_root() a public 
function with a guarantee that it will be better than pow(), but at 
worst here it just slows the geometric mean down a bit compared to a 
naive pow(product, 1/n).

What else can I do? Since I'm only dealing with integer powers, should I 
try using my own ipow(y, n) for testing?

def ipow(y, n):
    x = 1.0
    while n > 0:
        if n % 2 == 1:
            x *= y
        n //= 2
        y *= y
    return x
msg272668 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2016-08-14 12:05
> What else can I do? Since I'm only dealing with integer powers, should I 
> try using my own ipow(y, n) for testing?

I'd expect that a square-and-multiply-style pow would be less accurate than math.pow, in general, simply because of the number of floating-point operations involved.

But I don't think there's a real problem here so long as you don't have an expectation of getting super-accurate (e.g., correctly rounded or faithfully rounded) results; testing that results are accurate to within 10 ulps or so is probably good enough.

If you want an actual correctly-rounded nth root just for testing purposes, it's certainly possible to construct one with integer arithmetic, but an easier alternative might simply be to use MPFR's "mpfr_root" function (as wrapped by gmpy2) to generate test values.
msg272670 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2016-08-14 12:34
On Sun, Aug 14, 2016 at 12:05:37PM +0000, Mark Dickinson wrote:
> But I don't think there's a real problem here so long as you don't 
> have an expectation of getting super-accurate (e.g., correctly rounded 
> or faithfully rounded) results; testing that results are accurate to 
> within 10 ulps or so is probably good enough.

Thanks for the advice Mark.

This has certainly showed my naivety when it comes to floating point :-( 
I thought IEEE 754 was supposed to put an end to these sorts of cross- 
platform differences. If this was a trig function, I wouldn't have been 
so surprised, but I was not expecting pow() to differ like this. Once I 
started getting exact results on my machine, I thought everyone would 
see the same thing.
msg272672 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2016-08-14 13:34
> I thought IEEE 754 was supposed to put an end to these sorts of cross-platform differences.

Maybe one day. (Though I'm not holding my breath.)

IEEE 754-2008 does indeed *recommend* (but not actually *require*) correctly rounded implementations of all the usual transcendental and power functions (including nth root). And if everyone followed those recommendations, then yes, the correct rounding would imply that those cross-platform differences would be a thing of the past.

I don't see that happening any time soon, though: writing an implementation of log (for example) that's provably correctly rounded for all possible inputs is much harder than writing an implementation that's simply "almost always" correctly rounded (e.g., provably accurate to 0.53 ulps instead of 0.5 ulps), and the latter is going to be good enough in the vast majority of contexts. Going from "almost always correctly rounded" to "correctly rounded" isn't going to have much effect on the overall accuracy of a multistep numerical algorithm, so all you're buying (apart from a likely degraded running time) is the cross-platform reproducibility, and it's not really clear how useful cross-platform reproducibility is as a goal in its own right. Overall, it doesn't seem like a good tradeoff.

The pow function is especially hard to make correctly rounded, not least because of its two inputs. At least for single-input transcendental functions there's some (perhaps remote) hope of doing exhaustive testing on the 18 million million million possible double-precision inputs; for a function taking two inputs that's completely infeasible.

You can take a look at CRlibm [1] for efforts in the direction of a correctly-rounded libm; AFAIK it hasn't had wide adoption, and its pow function is still work in progress.

[1] http://lipforge.ens-lyon.fr/www/crlibm/
msg272713 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2016-08-15 03:44
A meta-note:  one iteration of Newton's method generally, roughly speaking, doubles the number of "good bits" in the initial approximation.

For floating n'th root, it would an astonishingly bad libm pow() that didn't get more than half the leading bits in pow(x, 1/n) right.

So a single Newton iteration, if carried out in infinite precision, should be enough to get all the bits "right" (meaning not significantly worse than 0.5 ulp error when converted back to a float).

So if you find yourself doing more than one Newton iteration, you're just fighting floating-point noise.  It's an illusion - nothing is actually getting better, except perhaps by accident.

Which suggests one approach for doubles (in C - same as a Python float):  get the pow() approximation.  Feed it to a `fractions.Fraction()` constructor.  Do one Newton iteration using the Fraction type.  Then use `float()` to convert the result to a float.

I believe that's the best you can possibly do without doing real work ;-)
msg272782 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2016-08-15 17:11
Just for fun, here's a recipe for a correctly-rounded nth root operation for positive finite floats. I'm not suggesting using this in the business logic: it's likely way too slow (especially for large n), but it may have a use in the tests. The logic for the Newton iteration in floor_nroot follows that outlined in this Stack Overflow answer: http://stackoverflow.com/a/35276426

from math import frexp, ldexp

def floor_nroot(x, n):
    """ For positive integers x, n, return the floor of the nth root of x. """

    # Initial guess: here we use the smallest power of 2 that exceeds the nth
    # root. But any value greater than or equal to the target result will do.
    a = 1 << -(-x.bit_length() // n)
    while True:
        d = x // a**(n-1)
        if a <= d:
            return a
        a = ((n-1) * a + d) // n

def nroot(x, n):
    """
    Correctly-rounded nth root (n >= 2) of x, for a finite positive float x.
    """
    if not (x > 0 and n >= 2):
        raise ValueError("x should be positive; n should be at least 2")

    m, e = frexp(x)
    rootm = floor_nroot(int(m * 2**53) << (53*n + (e-1)%n - 52), n)
    return ldexp(rootm + rootm % 2, (e-1)//n - 53)
msg272784 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2016-08-15 17:19
> for positive finite floats

... assuming IEEE 754 binary64 format, of course.
msg272806 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2016-08-15 23:01
Since test_NTh_Root is failing on so many different buildbot platforms and in different ways, I'm marking this as a "release blocker".  It would be very helpful to get this resolved prior to 3.6.0b1 next month.  Suggest checking the test results outputs of the various 3x build slaves to ensure all the cases have been addressed.

https://www.python.org/dev/buildbot/
msg272871 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2016-08-16 18:00
Thanks, Mark!  I had worked out the `floor_nroot` algorithm many years ago, but missed the connection to the AM-GM inequality.  As a result, instead of being easy, proving correctness was a pain that stretched over pages.  Delighted to see how obvious it _can_ be!

Just FYI, this was my "cheap" suggestion:

    from fractions import Fraction

    def rootn(x, n):
        g = Fraction(x**(1.0/n))
        g = ((n-1)*g + Fraction(x)/g**(n-1)) / n
        return float(g)

For fun, I ran that and your `nroot` over several million random cases with `x` of the form

    ldexp(random(), randrange(-500, 501))

and `n` in randrange(2, 501).

They always delivered the same result, which differed from `x**(1/n)` about a quarter of the time.  On average `nroot` took about 3.7 times longer.

But reducing the `n` range to randrange(1, 100), over half the time the (common) result differed from `x**(1/n)`, and `nroot` was significantly faster.

Moral of the story:  none I can see ;-)
msg272902 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2016-08-17 05:17
Noting that `floor_nroot` can be sped a lot by giving it a better starting guess.  In the context of `nroot`, the latter _could_ pass `int(x**(1/n))` as an excellent starting guess.  In the absence of any help, this version figures that out on its own; an optional `a=None` argument (to supply a starting guess, if desired) would make sense.

    def floor_nroot(x, n):
        """For positive integers x, n, return the floor of the nth root of x."""

        bl = x.bit_length()
        if bl <= 1000:
            a = int(float(x) ** (1.0/n))
        else:
            xhi = x >> (bl - 53)
            # x ~= xhi * 2**(bl-53)
            # x**(1/n) ~= xhi**(1/n) * 2**((bl-53)/n)
            # Let(bl-53)/n = w+f where w is the integer part.
            # 2**(w+f) is then 2**w * 2**f, where 2**w is a shift.
            a = xhi ** (1.0 / n)
            t = (bl - 53) / n
            w = int(t)
            a *= 2.0 ** (t - w)
            m, e = frexp(a)
            a = int(m * 2**53)
            e += w - 53
            if e >= 0:
                a <<= e
            else:
                a >>= -e
        # A guess of 1 can be horribly slow, since then the next
        # guess is approximately x/n.  So force the first guess to
        # be at least 2.  If that's too large, fine, it will be
        # cut down to 1 right away.
        a = max(a, 2)
        a = ((n-1)*a + x // a**(n-1)) // n
        while True:
            d = x // a**(n-1)
            if a <= d:
                return a
            a = ((n-1) * a + d) // n

I haven't yet found a case in the context of `nroot` where it doesn't get out on the first `if a <= d:` test.  Of course you can provoke as many iterations as you like by passing `x` with a lot more than 53 "significant" bits (the large `x` passed by `nroot` are mostly long strings of trailing 0 bits).
msg272911 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-08-17 08:31
"Just for fun, here's a recipe for a correctly-rounded nth root operation for positive finite floats. I'm not suggesting using this in the business logic: it's likely way too slow (especially for large n), but it may have a use in the tests."

I don't know well the statistics module, but it looks like it doesn't use directly floats, more a somehow higher level type of numbers to try to reduce rounding errors. For me, the math module is a thin wrapper on C library math functions, except of a few functions specific to Python like math.factorial. But the statistics module is at a higher level. Maybe we should draw a line between accuracy and speed. For example, explain in the statistics module that the module is designed for accuracy?

Sorry if I completly misunderstood the design of the statistics module :-)
msg272912 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-08-17 08:39
About tradeoff, would it be possible to add an option to choose the quality of the accuracy? For example, a flag to choose between "fast nth root" or "accurate nth root".

Python already has two kind of numbers: Decimal and float. Maybe the "flag" should be the type of input numbers, Decimal or float?

I'm asking because I looked at http://lipforge.ens-lyon.fr/www/crlibm/ a few years ago, and such library is designed for accuracy, not for speed. crlibm is based on the scslib library which compose a number using multiple float numbers to get a better precision. To get correct rounding, crlibm requires more loop iterations and more floating point number operations, so as expected, it is slower.

FYI my old blog article (in french!): http://www.haypocalc.com/blog/index.php/2009/02/20/188-bibliotheque-mathematique-arrondi-exact-crlibm
msg272914 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-08-17 08:56
Just to share my little experience with rounding numbers.

Last years, I worked on a API to convert timestamps between float and integers, the private "PyTime C API":

   https://haypo.github.io/pytime.html

At the beginning, I used various floatting point numbers which looks fine in decimal. But quickly, I got rounding issues on various buildbots. After many years fighting against compilers and trying to write a complete test suite, I decided to only use numbers which can be stored exactly in IEEE 754:

        if use_float:
            # numbers with an exact representation in IEEE 754 (base 2)
            for pow2 in (3, 7, 10, 15):
                ns = 2.0 ** (-pow2)
                ns_timestamps.extend((-ns, ns))

If you are curious, look at CPyTimeTestCase, TestCPyTime and TestOldPyTime classes in Lib/test/test_time.py.

At the end, I decided to reimplement each conversion function in pure Python using decimal.Decimal and compare the result with the C implementation. It makes the unit tests shorter and simpler.
msg272929 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2016-08-17 12:31
Victor: by "way too slow", I really *do* mean way too slow. :-)

floor_nroot does arithmetic with integers of bit-length approximately 54*n. For small n, that's fine, but if someone tried to take the geometric mean of a list of 50000 values (which it seems to me should be a perfectly reasonable use-case), floor_nroot would then be trying to do computations with multi-million-bit integers. The `a**(n-1)` operation in particular would be a killer for large `n`.
msg272948 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-08-17 14:19
> floor_nroot does arithmetic with integers of bit-length approximately 54*n.

Oh, I see.

But maybe the Decimal is fast enough for such giant numbers, since we
can control the precision?
msg273753 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2016-08-27 02:49
I don't care about correct rounding here, but it is, e.g., a bit embarrassing that

>>> 64**(1/3)
3.9999999999999996

Which you may or may not see on your box, depending on your platform pow(), but which you "should" see:  1/3 is not a third, it's a binary approximation that's a little less than 1/3, so 64 to that power should be less than 4.

There are two practical problems with `nroot`, both already noted:  (1) Its very cheap initial guess comes at the cost of requiring multiple iterations to get a guess good to 54 bits.  That can be repaired, and already showed how.  (2) The larger `n`, the more bits it requires.  As Mark noted, at n=50000 we're doing multi-million bit arithmetic.  That's inherent - and slow.

My simple fractions.Fraction function suffers the second problem too, but is even slower for large n.

But if you give up on guaranteed correct rounding, this is really quite easy:  as I noted before, a single Newton iteration approximately doubles the number of "good bits", so _any_ way of getting the effect of extra precision in a single iteration will deliver a result good enough for all practical purposes.  Note that an error bound of strictly less than 1 ulp is good enough to guarantee that the exact result is delivered whenever the exact result is representable.

Here's one way, exploiting that the decimal module has adjustable precision:

    import decimal
    def rootn(x, n):
        c = decimal.getcontext()
        oldprec = c.prec
        try:
            c.prec = 40
            g = decimal.Decimal(x**(1.0/n))
            g = ((n-1)*g + decimal.Decimal(x)/g**(n-1)) / n
            return float(g)
        finally:
            c.prec = oldprec

Now memory use remains trivial regardless of n.  I haven't yet found a case where the result differs from the correctly-rounded `nroot()`, but didn't try very hard.  And, e.g., even at the relatively modest n=5000, it's over 500 times faster than `nroot` (as modified to use a scaled x**(1/n) as an excellent starting guess, so that it too always gets out after its first Newton step).  At n=50000, it's over 15000 times faster.

For n less than about 70, though, `nroot` is faster.  It's plenty fast for me regardless.

Note:  just in case there's confusion about this, `rootn(64.0, 3)` returns exactly 4.0 _not_ necessarily because it's "more accurate" than pow() on this box, but because it sees the exact 3 instead of the approximation to 1/3 pow() sees.  That approximation is perfectly usable to get a starting guess (64**(1/3)) good to nearly 53 bits.  The real improvement comes from the Newton step using _exactly_ 3.

That said, I have seen rare cases where this (and `nroot`) give a better result than the platform pow() even for n=2 (where 1/n _is_ exactly representable).
msg273759 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-08-27 05:03
What if use pow() with exactly represented degree in approximating step?

    def rootn(x, n):
        g = x**(1.0/n)
        m = 1 << (n-1).bit_length()
        if n != m:
            g = (x*g**(m-n))**(1.0/m)
        return g

Maybe it needs several iterations, because it converges slower than Newton approximation. But every step can be faster.
msg273760 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2016-08-27 05:44
Serhiy, I don't know what you're thinking there, and the code doesn't make much sense to me.  For example, consider n=2.  Then m == n, so you accept the initial `g = x**(1.0/n)` guess.  But, as I said, there are cases where that doesn't give the best result, while the other algorithms do.  For example, on this box:

>>> serhiy(7.073208563506701e+46, 2)
2.6595504438733062e+23
>>> pow(7.073208563506701e+46, 0.5)  # exactly the same as your code
2.6595504438733062e+23

>>> nroot(7.073208563506701e+46, 2)  # best possible result
2.6595504438733066e+23
>>> import math
>>> math.sqrt(7.073208563506701e+46) # also achieved by sqrt()
2.6595504438733066e+23

On general principle, you can't expect to do better than plain pow() unless you do _something_ that gets the effect of using more than 53 mantissa bits - unless the platform pow() is truly horrible.  Using pow() multiple times is useless; doing Newton steps _in_ native float (C double) precision is useless; even some form of "binary search" is useless because just computing g**n (in native precision) suffers rounding errors worse than pow() suffered to begin with.

So long as you stick to native precision, you're fighting rounding errors at least as bad as the initial rounding errors you're trying to correct.  There's no a priori reason to even hope iterating will converge.
msg273804 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2016-08-28 04:01
Adding one more version of the last code, faster by cutting the number of extra digits used, and by playing "the usual" low-level CPython speed tricks.

I don't claim it's always correctly rounded - although I haven't found a specific case where it isn't.  I do claim it will return the exact result whenever the exact result is representable as a float.

I also claim it's "fast enough" - this version does on the high end of 50 to 100 thousand roots per second on my box, pretty much independent of `n`.

    import decimal
    c = decimal.DefaultContext.copy()
    c.prec = 25
    def rootn(x, n,
              D=decimal.Decimal,
              pow=c.power,
              mul=c.multiply,
              add=c.add,
              div=c.divide):
        g = D(x**(1.0/n))
        n1 = D(n-1)
        g = div(add(mul(n1, g),
                    div(D(x), pow(g, n1))),
                n)
        return float(g)
    del decimal, c
msg273822 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-08-28 13:11
Hum, "g = D(x**(1.0/n))" computes a float and then convert it to a decimal,
right?

Can we please add comments to explain why you start with float, compute
decimal and finish with float?

And maybe also document the algorithm?
msg273826 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2016-08-28 15:42
Victor, happy to add comments, but only if there's sufficient interest in actually using this.  In the context of this issue report, it's really only important that Mark understands it, and he already does ;-)

For example, it starts with float `**` because that's by far the fastest way to get a very good approximation.  Then it switches to Decimal to perform a Newton step using greater-than-float precision, which is necessary to absorb rounding errors in intermediate steps.  Then it rounds back to float, because it has to - it's a "float in, float out" function.  It's for the same reason, e.g., that Mark's `nroot` converts floats to potentially gigantic integers for its Newton step(s), and my other `fractions.Fraction` function converts floats to potentially gigantic rationals.  "Potentially gigantic" may be necessary to guarantee always-correct rounding of the final result, but isn't necessary to guarantee < 1 ulp error in the final result.
msg273827 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2016-08-28 16:08
This has been really eye-opening, and I just wanted to drop you a note 
that I am watching this thread carefully. My first priority is to get 
the tests all passing before beta 1 on 2016-09-12, even if (as seems 
likely) that means weakening the tests, and then come back and see if we 
can tighten it up again later.

I haven't checked it in yet, but I've already managed to simplify the 
nth_root code by taking Tim's advice that more than one iteration of 
Newton's method is a waste of time. Thanks!

Can somebody comment on my reasoning here?

I start by taking an initial guess for the root by using 
r = pow(x, 1.0/n). Then I test if r**n == x, if it does I conclude that 
r is either the exact root, or as close as representable as a float, and 
just return it without bothering with even one iteration. Sensible?

Or should I just always run one iteration of Newton and trust that it 
won't make things worse?
msg273829 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-08-28 16:27
Wouldn't following implementation be faster?

    import decimal
    c = decimal.DefaultContext.copy()
    c.prec = 25
    def rootn(x, n,
              D=decimal.Decimal,
              sub=c.subtract,
              mul=c.multiply,
              log=c.ln):
        g = x ** (1.0/n)
        g += float(sub(log(D(x)), mul(log(D(g)), D(n)))) * g / n
        return g
    del decimal, c
msg273833 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2016-08-28 17:49
That's clever, Serhiy!  Where did it come from?  It's not Newton's method, but it also appears to enjoy quadratic convergence.

As to speed, why are you asking?  You should be able to time it, yes?  On my box, it's about 6 times slower than the last code I posted.  I assume (but don't know) that's because + - * / have trivial cost compared to power() and ln(), and your code uses two ln() while the last code I posted uses only one power().  Also, my power() has an exact integer exponent, which - for whatever reasons - appears to run much faster than using a non-integer Decimal exponent.  It's for the latter reason that I didn't suggest just doing `return float(D(x) ** (D(1)/n))` (much slower).

Or maybe it's "a platform thing".  FYI, I'm using the released Python 3.5.2 on 64-bit Win 10 Pro.
msg273834 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2016-08-28 18:08
Steven, you certainly _can_ ;-) check first whether `r**n == x`, but can you prove `r` is the best possible result when it's true?  Offhand, I can't.  I question it because it rarely seems to _be_ true (in well less than 1% of the random-ish test cases I tried).  An expensive test that's rarely true tends to make things slower overall rather than faster.

As to whether a Newton step won't make things worse, that requires seeing the exact code you're using.  There are many mathematically equivalent ways to code "a Newton step" that have different numeric behavior.  If for some unfathomable (to me) reason you're determined to stick to native precision, then - generally speaking - the numerically best way to do "a correction step" is to code it in the form:

    r += correction  # where `correction` is small compared to `r`

Coding a Newton step here as, e.g., `r = ((n-1)*r + x/r**(n-1))/n` in native precision would be essentially useless:  multiple rounding errors show up in the result's last few bits, but the last few bits are the only ones we're _trying_ to correct.

When `correction` is small compared to `r` in `r += correction`, the rounding errors in the computation of `correction` show up in correction's last few bits, which are far removed from `r`'s last few bits (_because_ `correction` is small compared to `r`).  So that way of writing it _may_ be helpful.
msg273843 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-08-28 21:35
Tim Peters: "Victor, happy to add comments, but only if there's
sufficient interest in actually using this."

It looks like tests fail on some platforms. I guess that it depends on
the different factors: quality of the C math library, quality of the
IEEE 754 hardware implementation, etc.

I like the idea of providing an accurate and *portable* function
(_nth_root) thanks to the usage of the decimal module to get a better
precision.

If someone considers that the performance of
statistics.geometric_mean() matters, I suggest to document it. It's
new feature of Python 3.6, it's not like a regression, so it's ok.

What do you think?
msg273844 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2016-08-28 22:02
As I said, the last code I posted is "fast enough" - I can't imagine a real application can't live with being able to do "only" tens of thousands of roots per second.  A geometric mean is typically an output summary statistic, not a transformation applied billions of times to input data.

To get a 100% guarantee of 100% portability (although still confined to IEEE 754 format boxes), we can't use the libm pow() at all.  Then you can kiss speed goodbye.  Mark's `nroot` is portable in that way, but can take tens of thousands of times longer to compute a root.

Here's another way, but routinely 100 times slower (than the last code I posted):

    import decimal
    c = decimal.DefaultContext.copy()
    c.prec = 25

    def slow(x, n,
             D=decimal.Decimal,
             pow=c.power,
             div=c.divide,    
             d1=decimal.Decimal(1)):
        return float(pow(D(x), div(d1, n)))

    del decimal, c

Too slow for my tastes, but at least it's obvious ;-)
msg273849 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2016-08-29 02:04
Let's spell one of these out, to better understand why sticking to native precision is inadequate.  Here's one way to write the Newton step in "guess + relatively_small_correction" form:

    def plain(x, n):
        g = x**(1.0/n)
        return g - (g - x/g**(n-1))/n

Alas, it's pretty much useless.  That's because when g is a really good guess, `g` and `x/g**(n-1)` are, in native precision, almost the same.  So the subtraction cancels out almost all the bits, leaving only a bit or two of actual information.  For this kind of approach to be helpful in native precision, it generally requires a clever way of rewriting the correction computation that _doesn't_ suffer massive cancellation.

Example:

>>> x = 5.283415219603294e-14

and we want the square root.  Mark's `nroot` always gives the best possible result:

>>> nroot(x, 2)
2.298568080262861e-07

In this case, so does sqrt(), and also x**0.5 (on my box - pow() may not on yours):

>>> sqrt(x)
2.298568080262861e-07
>>> x**0.5
2.298568080262861e-07

You're going to think it needs "improvement", because the square of that is not x:

>>> sqrt(x)**2 < x
True

Let's see what happens in the `plain()` function above:

>>> g = x**0.5
>>> temp = x/g**(2-1)

>>> g.hex()
'0x1.ed9d1dd7ce57fp-23'
>>> temp.hex()
'0x1.ed9d1dd7ce580p-23'

They differ by just 1 in the very last bit.  There's nothing wrong with "the math", but _in native precision_ the subtraction leaves only 1 bit of information.

Then:

>>> correction = (g - temp)/2
>>> correction
-1.3234889800848443e-23

is indeed small compared to g, but it only has one "real" bit:

>>> correction.hex()
'-0x1.0000000000000p-76'

Finally,

>>> g - correction
2.2985680802628612e-07

is _worse_ than the guess we started with.  Not because of "the math", but because sticking to native precision leaves us with an extremely crude approximation to the truth.

This can be repaired in a straightforward way by computing the crucial subtraction with greater than native precision.  For example,

    import decimal
    c = decimal.DefaultContext.copy()
    c.prec = 25

    def blarg(x, n,
              D=decimal.Decimal,
              pow=c.power,
              sub=c.subtract,
              div=c.divide):
        g = x**(1.0/n)
        Dg = D(g)
        return g - float(sub(Dg, div(D(x), pow(Dg, n-1)))) / n

    del decimal, c

Then the difference is computed as

Decimal('-2.324439989147273024835272E-23')

and the correction (convert that to float and divide by 2) as:

-1.1622199945736365e-23

The magnitude of that is smaller than of the -1.3234889800848443e-23 we got in native precision, and adding the new correction to g makes no difference:  the correction is (correctly!) viewed as being too small (compared with g) to matter.

So, bottom line:  there is no known way of writing the Newton step that won't make things worse at times, so long as it sticks to native precision.  Newton iterations in native precision are wonderful when, e.g., you know you have about 20 good bits and want to get about 40 good bits quickly.  The last bit or two remain pure noise, unless you can write the correction computation in a way that retains "a bunch" of significant bits.  By far the easiest way to do the latter is simply to use more precision.
msg274190 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2016-09-02 00:59
Attched file "roots.py" you can run to get a guess as to how bad pow(x, 1/n) typically is on your box.

Note that it's usually "pretty darned good" the larger `n` is.  There's a reason for that.  For example, when n=1000, all x satisfying 1 <= x < 2**1000 have a root r satisfying 1 <= r < 2.  Mathematically, that's a 1-to-1 function, but in floating point there are only 2**52 representable r in the lstter range:  the larger n, the more of a many-to-1 function the n'th root is in native floating point precision.  That makes it much easier to get the best-possible result even by accident ;-)  So, e.g., this kind of output is common for "large" n:

n = 2272
    x**(1/n)
           -1    10
            0   982
            1     8
    with 1 native-precision step
        all correct
    with 1 extended-precision step
        all correct

That means that, out of 1000 "random-ish" x, x**(1/2272) returned a result 1 ulp smaller than best-possible 10 times, 1 ulp too large 8 times, and best-possible 982 times ("ulp" is relative to the best-possible result).  Doing any form of a single "guess + small_correction" Newton step (in either native or extended precision) repaired all the errors.

Things look very different for small n.  Like:

n = 2
    x**(1/n)
           -1     1
            0   997
            1     2
    with 1 native-precision step
           -1   117
            0   772
            1   111
    with 1 extended-precision step
        all correct

1/2 is exactly representable, so errors are few.  But doing a native-precsion "correction" made results significantly worse.

n=3 is more interesting, because 1/3 is not exactly representable:

n = 3
    x**(1/n)
          -55     1
          -54     2
          -53     2
          -52     1
          -51     3
          -50     3
          -49     2
          -48     3
          -47     4
          -46     2
          -45     6
          -44     5
          -43     7
          -42     5
          -41     6
          -40     4
          -39     7
          -38     7
          -37     8
          -36     7
          -35     5
          -34     8
          -33     8
          -32    14
          -31     8
          -30     9
          -29    15
          -28    13
          -27    21
          -26    14
          -25     7
          -24    14
          -23    15
          -22     7
          -21    18
          -20    14
          -19    12
          -18    17
          -17     8
          -16    13
          -15    15
          -14     9
          -13    10
          -12    14
          -11    11
          -10     7
           -9    10
           -8    14
           -7    11
           -6     7
           -5    11
           -4    12
           -3    12
           -2    12
           -1     8
            0    12
            1     7
            2    11
            3    11
            4    12
            5     5
            6     9
            7    14
            8    12
            9     8
           10    14
           11    15
           12    13
           13    12
           14     9
           15    15
           16    17
           17     9
           18    10
           19    17
           20    15
           21     9
           22     9
           23     6
           24    11
           25    20
           26    24
           27    21
           28    16
           29    11
           30    12
           31     3
           32    11
           33    12
           34    11
           35    11
           36     2
           37     8
           38     8
           39     9
           40     3
           41     5
           42     4
           43     7
           44     4
           45     7
           46     5
           47     3
           48     4
           50     2
           53     3
           54     3
           56     1
    with 1 native-precision step
           -1    42
            0   917
            1    41
    with 1 extended-precision step
        all correct

There a single native-precision step helped a lot overall.  But for n=4 (where 1/n is again exactly representable), it again hurts:

n = 4
    x**(1/n)
            0   999
            1     1
    with 1 native-precision step
           -1    50
            0   894
            1    56
    with 1 extended-precision step
        all correct

And at n=5 it helps again; etc.

Of course my story hasn't changed ;-)  That is, use the single-step extended-precision code.  It's "almost always" correctly rounded (I still haven't seen a case where it isn't, and the test code here assert-fails if it finds one), and is plenty fast.  The reason this code gets very visibly slower as `n` increases is due to using the always-correctly-rounded `nroot()` for comparison.
msg274196 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2016-09-02 02:07
BTW, add this other way of writing a native-precision Newton step to see that it's much worse (numerically) than writing it in the "guess + small_correction" form used in roots.py.  Mathematically they're identical, but numerically they behave differently:


def native2(x, n):
    g = x**(1.0/n)
    if g**n == x:
        return g
    return ((n-1)*g + x/g**(n-1)) / n
msg275930 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2016-09-12 03:07
As discussed with Ned via email, this issue shouldn't affect the public geometric_function and the failing tests (currently skipped) are too strict.  The existing tests demand an unrealistic precision which should be loosened,  e.g. from assertEqual to assertAlmostEqual. Ned wrote:

"If you are only planning to make changes to the tests themselves, I think that can wait for b2."

I am currently unable to build 3.6, and I won't have time to resolve that before b1.
msg276732 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2016-09-16 18:21
I think this whole nth root discussion has become way more complicated than it needs to be, and there's a simple and obvious solution.

Two observations:

1. What we actually need is not nth_root(x), but nth_root(x*2**e) for a float x and integer e. That's easily computed as exp((log(x) + e*log(2)) / n), and if we (a) rescale x to lie in [1.0, 2.0) and (b) remove multiples of n from e beforehand and so replace e with e % n, all intermediate values in this computation are small (less than 2.0).

2. It's easy to compute the above expression either directly using the math module (which gives an nth root operation with an error of a handful of ulps), or with extra precision using the Decimal module (which gives an nth root operation that's almost always correctly rounded).

If we do this, this also fixes issue #28111, which is caused by the current algorithm getting into difficulties when computing the nth root of the 2**e part of x*2**e.

Here's a direct solution using the Decimal module. If my back-of-the-envelope forward error analysis is correct, the result is always within 0.502 ulps of the correct result. That means it's *usually* going to be correctly rounded, and *always* going to be faithfully rounded. If 0.502 ulps isn't good enough for you, increase the context precision from 20 to 30, then we'll always be within 0.5000000000002 ulps instead.

The code deals with the core problem where x is finite and positive. Extending to negative values, zeros, nans and infinities is left as an exercise for the reader.


import decimal
import math

ROOTN_CONTEXT = decimal.Context(prec=20)
LOG2 = ROOTN_CONTEXT.ln(2)


def rootn(x, e, n):
    """
    Accurate value for (x*2**e)**(1/n).

    Result is within 0.502 ulps of the correct result.
    """
    if not 0.0 < x < math.inf:
        raise ValueError("x should be finite and positive")

    m, exp = math.frexp(x)
    q, r = divmod(exp + e - 1, n)
    d = decimal.Decimal(m*2.0)
    c = ROOTN_CONTEXT  # for brevity
    y = c.exp(c.divide(c.add(c.ln(d), c.multiply(r, LOG2)), n))
    return math.ldexp(y, q)
msg276735 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2016-09-16 18:57
Here's the error analysis, for the record. It's crude, but effective.  Assume
our decimal context precision is p (so p = 20 for the above code).

1. d is represented exactly with value in [1.0, 2.0).  (Conversions from float
   to Decimal are exact.)

2. c.ln(d) < 1.0, and Decimal's ln operation is correctly rounded, so the
   *absolute* error in c.ln(d) is at most 0.5 * 10**-p.

3. Similarly, LOG2 has an absolute error of at most 0.5 * 10**-p.

4. Let q be the unique integer satisfying 10**q <= n < 10**(q+1).

5. r * LOG2 < n < 10**(q+1), so ulp(r * LOG2) <= 10**(q + 1 - p),
   and the rounding error in computing r * LOG2 is at most
   0.5 * 10**(q + 1 - p). We also inherit the error from LOG2, scaled
   up by r (which is less than 10**(q+1)), so this inherited error
   is also bounded by 0.5 * 10**(q + 1 - p). Our total absolute error
   is now bounded by 10**(q + 1 - p).

6. The result of the addition of d to r * LOG2 is also bounded by
   n < 10**(q+1), so introduces a new rounding error of up to
   0.5 * 10**(q + 1 - p) on top of the errors in d and r * LOG2
   (steps 2 and 5). Our total error is now bounded by 1.5*10**(q + 1 - p).

7. Division by n produces a result <= 1.0. It divides our absolute error so
   far by n (>= 10**q), giving a new error of at most 15 * 10**-p, and
   introduces a new rounding error of at most 0.5 * 10**-p. Bound
   so far is 15.5 * 10**-p.

8. The exponential operation gives a result in the range [1.0, 2.0],
   and converts our absolute error of 15.5 * 10**-p into a relative
   error of at most 15.5 * 10**-p (let's call it 16 * 10**-p to be
   safe), which since our result is bounded by 2.0 amounts to an absolute
   error of at most 32 * 10**-p. We get a rounding error on the result
   of at most 5 * 10**-p (like ln, the decimal module's "exp" operation
   is also correctly rounded), making our total error bounded by 37 * 10**-p.

Finally, we want to express the error bound above in terms of ulps of the
original IEEE 754 binary64 format that float (almost certainly) uses.
Since our result (before the ldexp) is in the range [1.0, 2.0], one ulp
for the binary format is 2**-52. So the error bound in step 8, expressed
in binary64 ulps, is 37 / 2 **-52 * 10**-p < 2 * 10**(17-p) ulps. Rounding
from the final Decimal value to float also incurs an error of at most
0.5 ulps, so we end up with an error bound of

   final_error <= 0.5 + 2 * 10**(17-p) ulps.

For p = 20, this is 0.502 ulps.
msg276739 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2016-09-16 19:12
Mark, the code I showed in roots.py is somewhat more accurate and highly significantly faster than the code you just posted.  It's not complicated at all:  it just uses Decimal to do a single Newton correction with extended precision.

Since it doesn't use the Decimal exp() or ln(), it's faster.  It does use the Decimal pow(), but with an integer exponent, so this specific use of pow() doesn't invoke the Decimal exp() or ln() either.  And it's still the case that I haven't found a case where its result isn't correctly rounded.  My testing framework found multiple not-correctly-rounded cases in your new code within seconds.

Presumably you could boost the precision to improve that, but then it would get even slower.
msg276741 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2016-09-16 19:35
> Mark, the code I showed in roots.py is somewhat more accurate and highly significantly faster than the code you just posted.

Okay, fair enough. In that case, we still need a solution for computing rootn(x * 2**e) in the case where x*2**e itself overflows / underflows an IEEE 754 float.
msg276742 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2016-09-16 19:36
> It does use the Decimal pow(), but with an integer exponent, so this specific use of pow() doesn't invoke the Decimal exp() or ln() either.

Decimal pow doesn't special-case integer exponents; the solution will still be based on exp and ln.
msg276743 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2016-09-16 19:45
> Decimal pow doesn't special-case integer exponents; the solution will still be based on exp and ln.

Ah, sorry; I'm wrong. That was true for the Python version of the decimal module, not for the C implementation.
msg276744 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2016-09-16 19:58
> And it's still the case that I haven't found a case where its result isn't correctly rounded.

Here's one: :-)

>>> rootn(1 + 2**-52, 2)
1.0000000000000002

The correctly rounded result would be 1.0.
msg276754 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2016-09-16 21:51
Mark, thanks for the counterexample!  I think I can fairly accuse you of thinking ;-)

I expect the same approach would be zippy for scaling x by 2**e, provided that the scaled value doesn't exceed the dynamic range of the decimal context.  Like so:

def erootn(x, e, n,
          D=decimal.Decimal,
          D2=decimal.Decimal(2),
          pow=c.power,
          sub=c.subtract,
          div=c.divide):
    g = x**(1.0/n) * 2.0**(e/n) # force e to float for Python 2?
    Dg = D(g)
    return g - float(sub(Dg, div(D(x)*D2**e, pow(Dg, n-1)))) / n

The multiple errors in the native-float-precision starting guess shouldn't hurt.  In this case D(x)*D2**e may not exactly equal the scaled input either, but the error(s) are "way at the end" of the extended-precision decimal format.

I don't know the range of `e` you have to worry about.  On my box, decimal.MAX_EMAX == 999999999999999999 ... but the docs say that on a 32-bit box it's "only" 425000000.  If forcing the context Emax to that isn't enough, then your code is still graceful but this other approach would need to get uglier.
msg276756 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2016-09-16 22:11
Oops!  The `D2**e` in that code should be `pow(D2, e)`, to make it use the correct decimal context.
msg276772 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2016-09-17 04:51
Let me give complete code for the last idea, also forcing the scaling multiplication to use the correct context:

    import decimal
    c = decimal.DefaultContext.copy()
    c.prec = 25
    c.Emax = decimal.MAX_EMAX
    c.Emin = decimal.MIN_EMIN

    def erootn(x, e, n,
              D=decimal.Decimal,
              D2=decimal.Decimal(2),
              pow=c.power,
              sub=c.subtract,
              mul=c.multiply,
              div=c.divide):
        g = x**(1.0/n) * 2.0**(e/n)
        Dg = D(g)
        return g - float(sub(Dg, div(mul(D(x), pow(D2, e)),
                                     pow(Dg, n-1)))) / n

    del decimal, c

Both pow() instances use an integer exponent, so the C implementation of decimal still avoids expensive exp() and ln() computations - it's all just basic - * / under the covers.

On a 32-bit box, the scaled input (x*2**e) must be <= 9.999...999E425000000, which is about 2**1411819443 (an exponent of about 1.4 billion).  On the close-to-0 side, approximately the reciprocals of those giant numbers.

The range is much wider on a 64-bit box (because Emax and Emin have much larger absolute values).

Then, e.g.,

>>> erootn(1, 1411819443, 1411819443)
2.0
>>> erootn(1, -1411819443, 1411819443)
0.5

So it's obviously perfect ;-)  Except that, because it reduces to the previous algorithm when e=0, Mark's counterexample (to correct rounding) still holds:

>>> erootn(1 + 2**-52, 0, 2)
1.0000000000000002

Boosting precision to 28 (from 25) is enough to repair that, but it doesn't bother me (there was never a claim that it was always correctly rounded - but at prec=25 it went through hours & hours of testing randomish inputs and `n` values without finding a less-than-perfect case - but those tests effectively always used e=0 too).

Basic error analysis for Newton's method usually goes like this:  suppose the true root is `t` and the current guess is `g = t*(1+e)` for some small `|e|`.  Then plug `t*(1+e)` into the equation and do a Taylor expansion around e=0.  In this case, dropping terms at or above cubic in `e` leaves that the new guess is about

t*(1 + (n-1)/2 * e**2)

That the relative error goes from `e` to (about) `e**2` (which is typical of Newton's method) is the source of the claim that the number of good bits approximately doubles on each iteration.  In this specific case, it's not quite that good:  the (n-1)/2 factor means convergence becomes slower the larger `n` is.  But `e` in this case is perhaps on the order of 2**-50, and `(n-1)/2` times 2**-100 (e**2) remains tiny for any plausible value of `n`.

Good enough for me ;-)
msg276865 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2016-09-18 01:19
Let me clarify something about the extended algorithm:  the starting guess is no longer the most significant source of error.  It's the `mul(D(x), pow(D2, e))` part.  `D(x)` is exact, but `pow(D2, e)` may not be exactly representable with 26 decimal digits, and the multiplication may also need to round.  Those 2 operations may each suffer a half ulp error (but "ulp" in this case is 1 in the 26th decimal digit).

The rough error analysis is straightforward to extend, but I used wxMaxima to do the Taylor expansions and to ensure I didn't drop any terms.  

So suppose that instead of exactly x*2**e we get x*2**e*(1+r1) for some small |r1| (on the order of 10**-25), and instead of the exact n'th root `t` the starting guess g = t*(1+r2) for some small |r2| (on the order of 2**-50).  Then the mathematical value of the corrected `g` is `t` times (dropping terms higher than quadratic in `r1` and `r2`):

1   + r1/n   - (n-1)/n * r1*r2   + (n-1)/2 * r2**2

The largest term (besides 1) by far is now `r1/n` (assuming a not-insanely-large `n`).  But that's still "way at the end" of the extended-precision decimal format, which _should be_ ;-) intuitively obvious.  It's too small to have a direct effect on the last bit of the 53-bit result (but can certainly affect "close call" rounding cases).
msg276982 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2016-09-19 18:36
> Good enough for me ;-)

Me too.
History
Date User Action Args
2016-11-23 15:34:51wolmasetnosy: + wolma
2016-10-10 16:09:13ned.deilysetpriority: release blocker -> normal
2016-10-04 16:33:27steven.dapranosetversions: + Python 3.7, - Python 3.6
2016-09-19 18:36:35mark.dickinsonsetmessages: + msg276982
2016-09-18 01:19:11tim.peterssetmessages: + msg276865
2016-09-17 07:27:32martin.pantersetnosy: - martin.panter
2016-09-17 04:51:09tim.peterssetmessages: + msg276772
2016-09-16 22:11:04tim.peterssetmessages: + msg276756
2016-09-16 21:51:09tim.peterssetmessages: + msg276754
2016-09-16 19:58:42mark.dickinsonsetmessages: + msg276744
2016-09-16 19:45:10mark.dickinsonsetmessages: + msg276743
2016-09-16 19:36:31mark.dickinsonsetmessages: + msg276742
2016-09-16 19:35:30mark.dickinsonsetmessages: + msg276741
2016-09-16 19:12:25tim.peterssetmessages: + msg276739
2016-09-16 18:57:44mark.dickinsonsetmessages: + msg276735
2016-09-16 18:21:47mark.dickinsonsetmessages: + msg276732
2016-09-12 03:07:20steven.dapranosetmessages: + msg275930
2016-09-02 02:07:31tim.peterssetmessages: + msg274196
2016-09-02 00:59:16tim.peterssetfiles: + roots.py

messages: + msg274190
2016-08-29 02:04:56tim.peterssetmessages: + msg273849
2016-08-28 22:02:35tim.peterssetmessages: + msg273844
2016-08-28 21:35:25hayposetmessages: + msg273843
2016-08-28 18:08:10tim.peterssetmessages: + msg273834
2016-08-28 17:49:44tim.peterssetmessages: + msg273833
2016-08-28 16:27:53serhiy.storchakasetmessages: + msg273829
2016-08-28 16:08:16steven.dapranosetmessages: + msg273827
2016-08-28 15:42:34tim.peterssetmessages: + msg273826
2016-08-28 13:11:29hayposetmessages: + msg273822
2016-08-28 04:01:34tim.peterssetmessages: + msg273804
2016-08-27 05:44:20tim.peterssetmessages: + msg273760
2016-08-27 05:03:25serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg273759
2016-08-27 02:49:26tim.peterssetmessages: + msg273753
2016-08-17 14:19:52hayposetmessages: + msg272948
2016-08-17 12:31:18mark.dickinsonsetmessages: + msg272929
2016-08-17 08:56:41hayposetmessages: + msg272914
2016-08-17 08:39:01hayposetmessages: + msg272912
2016-08-17 08:31:24hayposetmessages: + msg272911
2016-08-17 08:24:40hayposetnosy: + haypo
2016-08-17 05:17:28tim.peterssetmessages: + msg272902
2016-08-16 18:00:04tim.peterssetmessages: + msg272871
2016-08-15 23:01:24ned.deilysetpriority: normal -> release blocker

nosy: + ned.deily
messages: + msg272806

stage: needs patch
2016-08-15 17:19:02mark.dickinsonsetmessages: + msg272784
2016-08-15 17:11:47mark.dickinsonsetmessages: + msg272782
2016-08-15 03:44:41tim.peterssetmessages: + msg272713
2016-08-14 13:34:57mark.dickinsonsetmessages: + msg272672
2016-08-14 12:34:12steven.dapranosetmessages: + msg272670
2016-08-14 12:05:37mark.dickinsonsetmessages: + msg272668
2016-08-14 11:50:40steven.dapranosetmessages: + msg272667
2016-08-14 10:52:42mark.dickinsonsetmessages: + msg272665
2016-08-14 10:48:17mark.dickinsonsetmessages: + msg272664
2016-08-14 10:44:53steven.dapranosetmessages: + msg272663
2016-08-14 08:29:39mark.dickinsonsetnosy: + mark.dickinson
messages: + msg272658
2016-08-14 06:02:51martin.pantersetmessages: + msg272649
title: Private _nth_root function loses accuracy on PowerPC -> Private _nth_root function loses accuracy
2016-08-14 03:16:26rhettingersetnosy: + rhettinger, tim.peters
messages: + msg272643
2016-08-14 02:27:27steven.dapranosetmessages: + msg272639
2016-08-14 02:26:23steven.dapranocreate