Message 361220 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	steven.daprano
Recipients	Ananthakrishnan, lemburg, mark.dickinson, rhettinger, serhiy.storchaka, steven.daprano, stutzbach, tim.peters, veky, vstinner
Date	2020-02-02.07:30:48
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<20200202073043.GH6362@ando.pearwood.info>
In-reply-to	<1580472937.77.0.880294060551.issue39479@roundup.psfhosted.org>

Content
On Fri, Jan 31, 2020 at 12:15:37PM +0000, Vedran Čačić wrote: > > Vedran Čačić <vedgar@gmail.com> added the comment: > > is_prime that's always correct is probably not the right thing to go into math. Besides, now we have isqrt, it's just > > n>1 and n&1 and all(n%d for d in range(3,isqrt(n)+1,2)) > > -- yes, it's damn slow, but so is everything else you want to be absolutely correct. :-] Lots of things are fast and still absolutely correct. And I think that you probably are underestimating just how slow trial division is for testing primes. I can imagine you've probably tested it on "large numbers" like a trillion-trillion (1e18) and thought that's acceptably fast. Try it on numbers like this: P = int('29674495668685510550154174642905332730771991' '79985304335099507553127683875317177019959423' '8596428121188033664754218345562493168782883') Q = P(313(P-1)+1)(353(P-1)+1) Q is a 397 digit Carmichael Number. Its smallest factor is P, which has 131 digits. If your computer is fast enough to perform a thousand trillion (1e15) divisions per second, your trial division will take more than 3e107 years to complete. That's 10 million trillion trillion trillion trillion trillion trillion trillion trillion trillion times longer than the universe has existed. In a practical sense, your algorithm is never going to terminate. The sun will burn out and die first. Upgrading your CPU is not going to help. My old and slow PC can prove that Q is a composite number, using pure Python, in less than six seconds. And I'm sure that a better programmer than me would be able to shave off some of that time. > is_probable_prime is another matter, but there is an enormous amount > of bikeshedding about the API of that one. What is there to bikeshed about the API? The API should be a function that takes a single integer, and returns False if that number is certainly composite[1] otherwise True. I will argue that any other details -- including whether it is probabilistic or not -- are quality of implementation issues and not part of the API. For the same reason, the name should be just "is_prime" (or "isprime") and not disturb naive users by calling it "probably prime". Experts will read the documentation and/or the source code, and everyone else can happily ignore the microscopic[2] chance of a false positive with really huge numbers. (Sometimes ignorance really is bliss.) We can argue on technical grounds just what the implementation should be, but that's not part of the API, and the implementation could change: - Miller-Rabin, or Baillie–PSW, or AKS, or something else; - whether probabilistic or deterministic; - what error rate (if any) we are willing to accept; etc. If we choose the fast, simple Miller-Rabin test, with just 30 random iterations the error rate is less than approximately one in a trillion trillion. If we tested a million numbers ever second, it would take over 18,000 years (on average) to find one false positive. If that isn't good enough, increase the number of iterations to 50, in which case you would expect one false positive every 20,000 trillion years. In comparison, it is estimated that cosmic rays cause memory bit flips as often one error per 4GB of RAM per day. This probabilistic algorithm is more reliable than your determinstic computer. https://blogs.oracle.com/linux/attack-of-the-cosmic-rays-v2 https://www.johndcook.com/blog/2019/05/20/cosmic-rays-flipping-bits/ I don't have much time for worries about Miller-Rabin being "probabilistic". When I was young and naive I worried about it a lot, and maybe that was a legitimate worry for a naive Miller-Rabin implementation that did, say, five iterations (probability of a false positive about 0.1%). But with 30 or 50 rounds, I am confident that nobody will ever experience such a false positive, not if they spend their entire lifetime doing nothing but calling `is_prime`. Let's be real: if the banks are willing to trust the transfer of billions of dollars to probabilistic primality testing, why are we worring about this? (By the way: for smallish numbers, under 264, no more than twelve rounds of Miller-Rabin are sufficient to deterministically decide whether a number is prime or not. Likewise for Baillie–PSW, which is also completely deterministic for numbers up to 264.) [1] For ease of discussion, we'll count zero and one as composite. [2] More like nanoscopic or picoscopic.

On Fri, Jan 31, 2020 at 12:15:37PM +0000, Vedran Čačić wrote:
> 
> Vedran Čačić <vedgar@gmail.com> added the comment:
> 
> is_prime that's always correct is probably not the right thing to go into math. Besides, now we have isqrt, it's just
> 
>     n>1 and n&1 and all(n%d for d in range(3,isqrt(n)+1,2))
> 
> -- yes, it's damn slow, but so is everything else you want to be absolutely correct. :-]

Lots of things are fast and still absolutely correct.

And I think that you probably are underestimating just how slow trial 
division is for testing primes. I can imagine you've probably tested it 
on "large numbers" like a trillion-trillion (1e18) and thought that's 
acceptably fast. Try it on numbers like this:

P = int('29674495668685510550154174642905332730771991'
        '79985304335099507553127683875317177019959423'
        '8596428121188033664754218345562493168782883')
Q = P*(313*(P-1)+1)*(353*(P-1)+1)

Q is a 397 digit Carmichael Number. Its smallest factor is P, which has 
131 digits. If your computer is fast enough to perform a thousand 
trillion (1e15) divisions per second, your trial division will take more 
than 3e107 years to complete. That's 10 million trillion trillion 
trillion trillion trillion trillion trillion trillion trillion times 
longer than the universe has existed.

In a practical sense, your algorithm is never going to terminate. The 
sun will burn out and die first.

Upgrading your CPU is not going to help.

My old and slow PC can prove that Q is a composite number, using pure 
Python, in less than six seconds. And I'm sure that a better programmer 
than me would be able to shave off some of that time.

> is_probable_prime is another matter, but there is an enormous amount 
> of bikeshedding about the API of that one.

What is there to bikeshed about the API?

The API should be a function that takes a single integer, and returns 
False if that number is certainly composite[1] otherwise True.

I will argue that any other details -- including whether it is 
probabilistic or not -- are *quality of implementation* issues and not 
part of the API.

For the same reason, the name should be just "is_prime" (or "isprime") 
and not disturb naive users by calling it "probably prime".

Experts will read the documentation and/or the source code, and everyone 
else can happily ignore the microscopic[2] chance of a false positive 
with really huge numbers. (Sometimes ignorance really is bliss.)

We can argue on technical grounds just what the implementation should 
be, but that's not part of the API, and the implementation could change:

- Miller-Rabin, or Baillie–PSW, or AKS, or something else;
- whether probabilistic or deterministic;
- what error rate (if any) we are willing to accept;

etc.

If we choose the fast, simple Miller-Rabin test, with just 30 random 
iterations the error rate is less than approximately one in a trillion 
trillion. If we tested a million numbers ever second, it would take over 
18,000 years (on average) to find one false positive.

If that isn't good enough, increase the number of iterations to 50, in 
which case you would expect one false positive every 20,000 trillion 
years.

In comparison, it is estimated that cosmic rays cause memory bit 
flips as often one error per 4GB of RAM per day. This probabilistic 
algorithm is more reliable than your determinstic computer.

https://blogs.oracle.com/linux/attack-of-the-cosmic-rays-v2

https://www.johndcook.com/blog/2019/05/20/cosmic-rays-flipping-bits/

I don't have much time for worries about Miller-Rabin being 
"probabilistic". When I was young and naive I worried about it a lot, 
and maybe that was a legitimate worry for a naive Miller-Rabin 
implementation that did, say, five iterations (probability of a false 
positive about 0.1%).

But with 30 or 50 rounds, I am confident that nobody will ever 
experience such a false positive, not if they spend their entire 
lifetime doing nothing but calling `is_prime`.

Let's be real: if the banks are willing to trust the transfer of 
billions of dollars to probabilistic primality testing, why are we 
worring about this?

(By the way: for smallish numbers, under 2**64, no more than twelve 
rounds of Miller-Rabin are sufficient to deterministically decide 
whether a number is prime or not. Likewise for Baillie–PSW, which is 
also completely deterministic for numbers up to 2**64.)

[1] For ease of discussion, we'll count zero and one as composite.

[2] More like nanoscopic or picoscopic.

History
Date	User	Action	Args
2020-02-02 07:30:48	steven.daprano	set	recipients: + steven.daprano, lemburg, tim.peters, rhettinger, mark.dickinson, vstinner, stutzbach, serhiy.storchaka, veky, Ananthakrishnan
2020-02-02 07:30:48	steven.daprano	link	issue39479 messages
2020-02-02 07:30:48	steven.daprano	create