This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author steven.daprano
Recipients Ananthakrishnan, lemburg, mark.dickinson, rhettinger, serhiy.storchaka, steven.daprano, stutzbach, tim.peters, veky, vstinner
Date 2020-02-02.07:30:48
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <20200202073043.GH6362@ando.pearwood.info>
In-reply-to <1580472937.77.0.880294060551.issue39479@roundup.psfhosted.org>
Content
On Fri, Jan 31, 2020 at 12:15:37PM +0000, Vedran Čačić wrote:
> 
> Vedran Čačić <vedgar@gmail.com> added the comment:
> 
> is_prime that's always correct is probably not the right thing to go into math. Besides, now we have isqrt, it's just
> 
>     n>1 and n&1 and all(n%d for d in range(3,isqrt(n)+1,2))
> 
> -- yes, it's damn slow, but so is everything else you want to be absolutely correct. :-]

Lots of things are fast and still absolutely correct.

And I think that you probably are underestimating just how slow trial 
division is for testing primes. I can imagine you've probably tested it 
on "large numbers" like a trillion-trillion (1e18) and thought that's 
acceptably fast. Try it on numbers like this:

P = int('29674495668685510550154174642905332730771991'
        '79985304335099507553127683875317177019959423'
        '8596428121188033664754218345562493168782883')
Q = P*(313*(P-1)+1)*(353*(P-1)+1)

Q is a 397 digit Carmichael Number. Its smallest factor is P, which has 
131 digits. If your computer is fast enough to perform a thousand 
trillion (1e15) divisions per second, your trial division will take more 
than 3e107 years to complete. That's 10 million trillion trillion 
trillion trillion trillion trillion trillion trillion trillion times 
longer than the universe has existed.

In a practical sense, your algorithm is never going to terminate. The 
sun will burn out and die first.

Upgrading your CPU is not going to help.

My old and slow PC can prove that Q is a composite number, using pure 
Python, in less than six seconds. And I'm sure that a better programmer 
than me would be able to shave off some of that time.

> is_probable_prime is another matter, but there is an enormous amount 
> of bikeshedding about the API of that one.

What is there to bikeshed about the API?

The API should be a function that takes a single integer, and returns 
False if that number is certainly composite[1] otherwise True.

I will argue that any other details -- including whether it is 
probabilistic or not -- are *quality of implementation* issues and not 
part of the API.

For the same reason, the name should be just "is_prime" (or "isprime") 
and not disturb naive users by calling it "probably prime".

Experts will read the documentation and/or the source code, and everyone 
else can happily ignore the microscopic[2] chance of a false positive 
with really huge numbers. (Sometimes ignorance really is bliss.)

We can argue on technical grounds just what the implementation should 
be, but that's not part of the API, and the implementation could change:

- Miller-Rabin, or Baillie–PSW, or AKS, or something else;
- whether probabilistic or deterministic;
- what error rate (if any) we are willing to accept;

etc.

If we choose the fast, simple Miller-Rabin test, with just 30 random 
iterations the error rate is less than approximately one in a trillion 
trillion. If we tested a million numbers ever second, it would take over 
18,000 years (on average) to find one false positive.

If that isn't good enough, increase the number of iterations to 50, in 
which case you would expect one false positive every 20,000 trillion 
years.

In comparison, it is estimated that cosmic rays cause memory bit 
flips as often one error per 4GB of RAM per day. This probabilistic 
algorithm is more reliable than your determinstic computer.

https://blogs.oracle.com/linux/attack-of-the-cosmic-rays-v2

https://www.johndcook.com/blog/2019/05/20/cosmic-rays-flipping-bits/

I don't have much time for worries about Miller-Rabin being 
"probabilistic". When I was young and naive I worried about it a lot, 
and maybe that was a legitimate worry for a naive Miller-Rabin 
implementation that did, say, five iterations (probability of a false 
positive about 0.1%).

But with 30 or 50 rounds, I am confident that nobody will ever 
experience such a false positive, not if they spend their entire 
lifetime doing nothing but calling `is_prime`.

Let's be real: if the banks are willing to trust the transfer of 
billions of dollars to probabilistic primality testing, why are we 
worring about this?

(By the way: for smallish numbers, under 2**64, no more than twelve 
rounds of Miller-Rabin are sufficient to deterministically decide 
whether a number is prime or not. Likewise for Baillie–PSW, which is 
also completely deterministic for numbers up to 2**64.)

[1] For ease of discussion, we'll count zero and one as composite.

[2] More like nanoscopic or picoscopic.
History
Date User Action Args
2020-02-02 07:30:48steven.dapranosetrecipients: + steven.daprano, lemburg, tim.peters, rhettinger, mark.dickinson, vstinner, stutzbach, serhiy.storchaka, veky, Ananthakrishnan
2020-02-02 07:30:48steven.dapranolinkissue39479 messages
2020-02-02 07:30:48steven.dapranocreate