On Fri, Jan 31, 2020 at 12:15:37PM +0000, Vedran Čačić wrote:
>
> Vedran Čačić <vedgar@gmail.com> added the comment:
>
> is_prime that's always correct is probably not the right thing to go into math. Besides, now we have isqrt, it's just
>
> n>1 and n&1 and all(n%d for d in range(3,isqrt(n)+1,2))
>
> -- yes, it's damn slow, but so is everything else you want to be absolutely correct. :-]
Lots of things are fast and still absolutely correct.
And I think that you probably are underestimating just how slow trial
division is for testing primes. I can imagine you've probably tested it
on "large numbers" like a trillion-trillion (1e18) and thought that's
acceptably fast. Try it on numbers like this:
P = int('29674495668685510550154174642905332730771991'
'79985304335099507553127683875317177019959423'
'8596428121188033664754218345562493168782883')
Q = P*(313*(P-1)+1)*(353*(P-1)+1)
Q is a 397 digit Carmichael Number. Its smallest factor is P, which has
131 digits. If your computer is fast enough to perform a thousand
trillion (1e15) divisions per second, your trial division will take more
than 3e107 years to complete. That's 10 million trillion trillion
trillion trillion trillion trillion trillion trillion trillion times
longer than the universe has existed.
In a practical sense, your algorithm is never going to terminate. The
sun will burn out and die first.
Upgrading your CPU is not going to help.
My old and slow PC can prove that Q is a composite number, using pure
Python, in less than six seconds. And I'm sure that a better programmer
than me would be able to shave off some of that time.
> is_probable_prime is another matter, but there is an enormous amount
> of bikeshedding about the API of that one.
What is there to bikeshed about the API?
The API should be a function that takes a single integer, and returns
False if that number is certainly composite[1] otherwise True.
I will argue that any other details -- including whether it is
probabilistic or not -- are *quality of implementation* issues and not
part of the API.
For the same reason, the name should be just "is_prime" (or "isprime")
and not disturb naive users by calling it "probably prime".
Experts will read the documentation and/or the source code, and everyone
else can happily ignore the microscopic[2] chance of a false positive
with really huge numbers. (Sometimes ignorance really is bliss.)
We can argue on technical grounds just what the implementation should
be, but that's not part of the API, and the implementation could change:
- Miller-Rabin, or Baillie–PSW, or AKS, or something else;
- whether probabilistic or deterministic;
- what error rate (if any) we are willing to accept;
etc.
If we choose the fast, simple Miller-Rabin test, with just 30 random
iterations the error rate is less than approximately one in a trillion
trillion. If we tested a million numbers ever second, it would take over
18,000 years (on average) to find one false positive.
If that isn't good enough, increase the number of iterations to 50, in
which case you would expect one false positive every 20,000 trillion
years.
In comparison, it is estimated that cosmic rays cause memory bit
flips as often one error per 4GB of RAM per day. This probabilistic
algorithm is more reliable than your determinstic computer.
https://blogs.oracle.com/linux/attack-of-the-cosmic-rays-v2
https://www.johndcook.com/blog/2019/05/20/cosmic-rays-flipping-bits/
I don't have much time for worries about Miller-Rabin being
"probabilistic". When I was young and naive I worried about it a lot,
and maybe that was a legitimate worry for a naive Miller-Rabin
implementation that did, say, five iterations (probability of a false
positive about 0.1%).
But with 30 or 50 rounds, I am confident that nobody will ever
experience such a false positive, not if they spend their entire
lifetime doing nothing but calling `is_prime`.
Let's be real: if the banks are willing to trust the transfer of
billions of dollars to probabilistic primality testing, why are we
worring about this?
(By the way: for smallish numbers, under 2**64, no more than twelve
rounds of Miller-Rabin are sufficient to deterministically decide
whether a number is prime or not. Likewise for Baillie–PSW, which is
also completely deterministic for numbers up to 2**64.)
[1] For ease of discussion, we'll count zero and one as composite.
[2] More like nanoscopic or picoscopic. |