Message 410587 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	tim.peters
Recipients	PedanticHacker, Stefan Pochmann, mark.dickinson, mcognetta, rhettinger, serhiy.storchaka, tim.peters
Date	2022-01-14.18:50:27
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1642186227.39.0.0319442855926.issue37295@roundup.psfhosted.org>
In-reply-to

Content
Another trick, building on the last one: computing factorial(k) isn't cheap, in time or space, and neither is dividing by it. But we know it will entirely cancel out. Indeed, for each outer loop iteration, prod(p) is divisible by the current k. But, unlike as in Stefan's code, which materializes range(n, n-k, -1) as an explicit list, we have no way to calculate "in advance" which elements of p[] are divisible by what. What we _can_ do is march over all of p[], and do a gcd of each element with the current k. If greater than 1, it can be divided out of both that element of p[], and the current k. Later, rinse, repeat - the current k must eventually be driven to 1 then. But that slows things down: gcd() is also expensive. But there's a standard trick to speed that too: as in serious implementations of Pollard's rho factorization method, "chunk it". That is, don't do it on every outer loop iteration, but instead accumulate the running product of several denominators first, then do the expensive gcd pass on that product. Here's a replacement for "the main loop" of the last code that delays doing gcds until the running product is at least 2000 bits: fold_into_p(n) kk = 1 for k in range(2, k+1): n -= 1 # Merge into p[]. fold_into_p(n) # Divide by k. kk *= k if kk.bit_length() < 2000: continue for i, pi in enumerate(p): if pi > 1: g = gcd(pi, kk) if g > 1: p[i] = pi // g kk //= g if kk == 1: break assert kk == 1 showp() return prod(x for x in p if x > 1) // kk That runs in under half the time (for n=1000000, k=500000), down to under 7.5 seconds. And, of course, the largest denominator consumes only about 2000 bits instead of 500000!'s 8,744,448 bits. Raising the kk bit limit from 2000 to 10000 cuts another 2.5 seconds off, down to about 5 seconds. Much above that, it starts getting slower again. Seems to hard to out-think! And highly dubious to fine-tune it based on a single input case ;-) Curious: at a cutoff of 10000 bits, we're beyond the point where Karatsuba would have paid off for computing denominator partial products too.

Another trick, building on the last one: computing factorial(k) isn't cheap, in time or space, and neither is dividing by it. But we know it will entirely cancel out. Indeed, for each outer loop iteration, prod(p) is divisible by the current k. But, unlike as in Stefan's code, which materializes range(n, n-k, -1) as an explicit list, we have no way to calculate "in advance" which elements of p[] are divisible by what.

What we _can_ do is march over all of p[], and do a gcd of each element with the current k. If greater than 1, it can be divided out of both that element of p[], and the current k. Later, rinse, repeat - the current k must eventually be driven to 1 then.

But that slows things down: gcd() is also expensive.

But there's a standard trick to speed that too: as in serious implementations of Pollard's rho factorization method, "chunk it". That is, don't do it on every outer loop iteration, but instead accumulate the running product of several denominators first, then do the expensive gcd pass on that product.

Here's a replacement for "the main loop" of the last code that delays doing gcds until the running product is at least 2000 bits:

        fold_into_p(n)

        kk = 1
        for k in range(2, k+1):
            n -= 1
            # Merge into p[].
            fold_into_p(n)
            # Divide by k.
            kk *= k
            if kk.bit_length() < 2000:
                continue
            for i, pi in enumerate(p):
                if pi > 1:
                    g = gcd(pi, kk)
                    if g > 1:
                        p[i] = pi // g
                        kk //= g
                        if kk == 1:
                            break
            assert kk == 1
        showp()
        return prod(x for x in p if x > 1) // kk

That runs in under half the time (for n=1000000, k=500000), down to under 7.5 seconds. And, of course, the largest denominator consumes only about 2000 bits instead of 500000!'s 8,744,448 bits.

Raising the kk bit limit from 2000 to 10000 cuts another 2.5 seconds off, down to about 5 seconds.

Much above that, it starts getting slower again.

Seems to hard to out-think! And highly dubious to fine-tune it based on a single input case ;-)

Curious: at a cutoff of 10000 bits, we're beyond the point where Karatsuba would have paid off for computing denominator partial products too.

History
Date	User	Action	Args
2022-01-14 18:50:27	tim.peters	set	recipients: + tim.peters, rhettinger, mark.dickinson, serhiy.storchaka, PedanticHacker, mcognetta, Stefan Pochmann
2022-01-14 18:50:27	tim.peters	set	messageid: <1642186227.39.0.0319442855926.issue37295@roundup.psfhosted.org>
2022-01-14 18:50:27	tim.peters	link	issue37295 messages
2022-01-14 18:50:27	tim.peters	create