Message 411343 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	tim.peters
Recipients	PedanticHacker, Stefan Pochmann, mark.dickinson, mcognetta, rhettinger, serhiy.storchaka, tim.peters
Date	2022-01-23.05:08:22
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1642914502.99.0.0626123055768.issue37295@roundup.psfhosted.org>
In-reply-to

Content
Ya, I don't expect anyone will check in a change without doing comparative timings in C first. Not worried about that. I'd be happy to declare victory and move on at this point ;-) But that's me. Near the start of this, I noted that we just won't compete with GMP's vast array of tricks. I noted that they use a special routine for division when it's known in advance that the remainder is 0 (as it is, e.g., in every division performed by "our" recursion (which GMP also uses, in some cases)). But I didn't let on just how much that can buy them. Under 3.10.1 on my box, the final division alone in math.factorial(1000000) // math.factorial(500000)*2 takes over 20 seconds. But a pure Python implementation of what I assume (don't know for sure) is the key idea() in their exact-division algorithm does the same thing in under 0.4 seconds. Huge difference - and while the pure Python version only ever wants the lower N bits of an NxN product, there's no real way to do that in pure Python except via throwing away the higher N bits of a double-width int product. In C, of course, the high half of the bits wouldn't be computed to begin with. () Modular arithmetic again. Given n and d such that it's known n = qd for some integer q, shift n and d right until d is odd. q is unaffected. A good upper bound on the bit length of q is then n.bit_length() - d.bit_length() + 1. Do the remaining work modulo 2 raised to that power. Call that base B. We "merely" need to solve for q in the equation n = qd (mod B). Because d is odd, I = pow(d, -1, B) exists. Just multiply both sides by I to get n I = q (mod B). No divisions of any kind are needed. More, there's also a very efficient, division-free algorithm for finding an inverse modulo a power of 2. To start with, every odd int is its own inverse mod 8, so we start with 3 good bits. A modular Newton-like iteration can double the number of correct bits on each iteration. But I won't post code (unless someone asks) because I don't want to encourage anyone :-)

Ya, I don't expect anyone will check in a change without doing comparative timings in C first. Not worried about that.

I'd be happy to declare victory and move on at this point ;-) But that's me. Near the start of this, I noted that we just won't compete with GMP's vast array of tricks.

I noted that they use a special routine for division when it's known in advance that the remainder is 0 (as it is, e.g., in every division performed by "our" recursion (which GMP also uses, in some cases)).

But I didn't let on just how much that can buy them. Under 3.10.1 on my box, the final division alone in math.factorial(1000000) // math.factorial(500000)**2 takes over 20 seconds. But a pure Python implementation of what I assume (don't know for sure) is the key idea(*) in their exact-division algorithm does the same thing in under 0.4 seconds. Huge difference - and while the pure Python version only ever wants the lower N bits of an NxN product, there's no real way to do that in pure Python except via throwing away the higher N bits of a double-width int product. In C, of course, the high half of the bits wouldn't be computed to begin with.

(*) Modular arithmetic again. Given n and d such that it's known n = q*d for some integer q, shift n and d right until d is odd. q is unaffected. A good upper bound on the bit length of q is then n.bit_length() - d.bit_length() + 1. Do the remaining work modulo 2 raised to that power. Call that base B.

We "merely" need to solve for q in the equation n = q*d (mod B). Because d is odd, I = pow(d, -1, B) exists. Just multiply both sides by I to get n * I = q (mod B).

No divisions of any kind are needed. More, there's also a very efficient, division-free algorithm for finding an inverse modulo a power of 2. To start with, every odd int is its own inverse mod 8, so we start with 3 good bits. A modular Newton-like iteration can double the number of correct bits on each iteration.

But I won't post code (unless someone asks) because I don't want to encourage anyone :-)

History
Date	User	Action	Args
2022-01-23 05:08:23	tim.peters	set	recipients: + tim.peters, rhettinger, mark.dickinson, serhiy.storchaka, PedanticHacker, mcognetta, Stefan Pochmann
2022-01-23 05:08:22	tim.peters	set	messageid: <1642914502.99.0.0626123055768.issue37295@roundup.psfhosted.org>
2022-01-23 05:08:22	tim.peters	link	issue37295 messages
2022-01-23 05:08:22	tim.peters	create