Message361220

Author | steven.daprano |
---|---|

Recipients | Ananthakrishnan, lemburg, mark.dickinson, rhettinger, serhiy.storchaka, steven.daprano, stutzbach, tim.peters, veky, vstinner |

Date | 2020-02-02.07:30:48 |

SpamBayes Score | -1.0 |

Marked as misclassified | Yes |

Message-id | <20200202073043.GH6362@ando.pearwood.info> |

In-reply-to | <1580472937.77.0.880294060551.issue39479@roundup.psfhosted.org> |

Content | |
---|---|

On Fri, Jan 31, 2020 at 12:15:37PM +0000, Vedran Čačić wrote: > > Vedran Čačić <vedgar@gmail.com> added the comment: > > is_prime that's always correct is probably not the right thing to go into math. Besides, now we have isqrt, it's just > > n>1 and n&1 and all(n%d for d in range(3,isqrt(n)+1,2)) > > -- yes, it's damn slow, but so is everything else you want to be absolutely correct. :-] Lots of things are fast and still absolutely correct. And I think that you probably are underestimating just how slow trial division is for testing primes. I can imagine you've probably tested it on "large numbers" like a trillion-trillion (1e18) and thought that's acceptably fast. Try it on numbers like this: P = int('29674495668685510550154174642905332730771991' '79985304335099507553127683875317177019959423' '8596428121188033664754218345562493168782883') Q = P*(313*(P-1)+1)*(353*(P-1)+1) Q is a 397 digit Carmichael Number. Its smallest factor is P, which has 131 digits. If your computer is fast enough to perform a thousand trillion (1e15) divisions per second, your trial division will take more than 3e107 years to complete. That's 10 million trillion trillion trillion trillion trillion trillion trillion trillion trillion times longer than the universe has existed. In a practical sense, your algorithm is never going to terminate. The sun will burn out and die first. Upgrading your CPU is not going to help. My old and slow PC can prove that Q is a composite number, using pure Python, in less than six seconds. And I'm sure that a better programmer than me would be able to shave off some of that time. > is_probable_prime is another matter, but there is an enormous amount > of bikeshedding about the API of that one. What is there to bikeshed about the API? The API should be a function that takes a single integer, and returns False if that number is certainly composite[1] otherwise True. I will argue that any other details -- including whether it is probabilistic or not -- are *quality of implementation* issues and not part of the API. For the same reason, the name should be just "is_prime" (or "isprime") and not disturb naive users by calling it "probably prime". Experts will read the documentation and/or the source code, and everyone else can happily ignore the microscopic[2] chance of a false positive with really huge numbers. (Sometimes ignorance really is bliss.) We can argue on technical grounds just what the implementation should be, but that's not part of the API, and the implementation could change: - Miller-Rabin, or Baillie–PSW, or AKS, or something else; - whether probabilistic or deterministic; - what error rate (if any) we are willing to accept; etc. If we choose the fast, simple Miller-Rabin test, with just 30 random iterations the error rate is less than approximately one in a trillion trillion. If we tested a million numbers ever second, it would take over 18,000 years (on average) to find one false positive. If that isn't good enough, increase the number of iterations to 50, in which case you would expect one false positive every 20,000 trillion years. In comparison, it is estimated that cosmic rays cause memory bit flips as often one error per 4GB of RAM per day. This probabilistic algorithm is more reliable than your determinstic computer. https://blogs.oracle.com/linux/attack-of-the-cosmic-rays-v2 https://www.johndcook.com/blog/2019/05/20/cosmic-rays-flipping-bits/ I don't have much time for worries about Miller-Rabin being "probabilistic". When I was young and naive I worried about it a lot, and maybe that was a legitimate worry for a naive Miller-Rabin implementation that did, say, five iterations (probability of a false positive about 0.1%). But with 30 or 50 rounds, I am confident that nobody will ever experience such a false positive, not if they spend their entire lifetime doing nothing but calling `is_prime`. Let's be real: if the banks are willing to trust the transfer of billions of dollars to probabilistic primality testing, why are we worring about this? (By the way: for smallish numbers, under 2**64, no more than twelve rounds of Miller-Rabin are sufficient to deterministically decide whether a number is prime or not. Likewise for Baillie–PSW, which is also completely deterministic for numbers up to 2**64.) [1] For ease of discussion, we'll count zero and one as composite. [2] More like nanoscopic or picoscopic. |

History | |||
---|---|---|---|

Date | User | Action | Args |

2020-02-02 07:30:48 | steven.daprano | set | recipients: + steven.daprano, lemburg, tim.peters, rhettinger, mark.dickinson, vstinner, stutzbach, serhiy.storchaka, veky, Ananthakrishnan |

2020-02-02 07:30:48 | steven.daprano | link | issue39479 messages |

2020-02-02 07:30:48 | steven.daprano | create |