Message 361263 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	tim.peters
Recipients	Ananthakrishnan, lemburg, mark.dickinson, rhettinger, serhiy.storchaka, steven.daprano, stutzbach, tim.peters, veky, vstinner
Date	2020-02-03.03:10:03
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1580699405.15.0.279873378259.issue39479@roundup.psfhosted.org>
In-reply-to

Content
[Tim] >> The pure-Python Miller-Rabin code i posted in the >> aforementioned thread is typically at least 100 >> times faster than that. [Steven] > This is exactly the sort of argument about quality of > implementation which isn't, or shouldn't be, part of > the argument about the API, IMO. I wasn't at all talking about API at that point. I was backing the argument _you_ were making, that trial division is a hopelessly inefficient implementation, compared to what's possible with probabilistic methods, regardless of API. You were in fact underselling that argument, because it's straightforward to get an order or two of magnitude faster than you demonstrated. > the Q I quoted above was carefully designed (not by me, I hasten > to add!) I know the paper it came from. > to be a preudoprime to the first 307 prime bases, so it's > something of a worst-case scenario for my version. Which is why I have no problem picturing how this "should be" approached: the original Miller-Rabin (which uses random bases, not succumbing to the "premature optimization" catastrophe magnet) has no problem with Q (or with anything else!), and hasn't really been improved on for general-purpose use since it was published. It's a darned-near perfect mix of simplicity, brevity, clarity, robustness, generality, and speed. "Good enough" by a long shot on all counts. >> def probably_prime(n, maxsteps=20) >> >> supplies a _default_ instead. I don't want an API that's useful >> _only_ to people who don't know what they're doing ;-) > It's not just for people who don't know what they're doing. It > is for people who don't want to sweat the small details, Then they can accept the default. In what conceivable sense is that a burden? > they just want an nswer True or False and are prepared > to trust that the function author knows what they are > doing. But the function author can't possibly know what the _user_ needs this for. In some apps degree of certainty is far more important than speed, while in other apps it's the opposite. Be realistic here? Your argument here makes sense for always-right functions, but you're not willing to accept one of those for this specific purpose (& neither am I). > If someone cares about the small details like how > many bases to try, It's not a "small detail" where it matters: it is THE way to trade off computation time against confidence in the results. It's a _fundamental_ aspect of how Miller-Rabin works. > they might also care about details like: > > - which specific bases are used; > - whether to use Miller-Rabin at all; > - how many trial divisions to do first; > - which specific primality test to use; Then they should go use a special-purpose library ;-) Letting them fiddle the single most important parameter isn't down in the weeds, it's a top-level control knob. My answers to all the above: - bases are picked via randrange(2, n-1) - yes, because no better general-purpose algorithm is known - none - although I'll allow that there may be a speed advantage in some apps if a gcd is done with a relatively small primorial first (more efficient than one-at-a-time trial divisions by small primes) - Miller-Rabin > What if the implementation shifts away from Miller-Rabin > to (say) Baillie-PSW? It can't ;-) I would absolutely document that Miller-Rabin _is_ the algorithm being used, with the random-bases implementation choice. Then the curious can search the web for a mountain of good information about it. > Then your maxsteps parameter becomes meaningless. Do we > deprecate it or leave it there in case the implementation > changes again to something that has a concept of number > of steps? All moot, given that I have no interest in hiding the specific algorithm in use. YAGNI. > I think that if someone cares sufficiently about the details, > then they probably won't be satisfied with a single isprime > function, but may want is_fermat_prime, is_miller_rabin_prime, > is_lucas_prime etc. Again, go to a special-purpose library if that's what they want. And, again, I don't agree with your characterization of the Miller-Rabin maxsteps parameter as a "detail". It's a fundamental aspect of what the algorithm does. Which casual users can certainly ignore, but at their own risk. > ... > Correct. The first twelve primes make up such a minimal set. And if you don't care about picking the fixed bases from a prefix of the primes, you only need 7 bases for a 100% reliable test through 2**64: 2, 325, 9375, 28178, 450775, 9780504, and 1795265022. Or so this table claims: https://miller-rabin.appspot.com/ But I don't care here. Using a fixed set of bases is begging for embarrassment (for every fixed finite set of bases, there are an infinite number of composites they'll _claim_ are prime). There are no systemic failure modes when using random bases.

[Tim]
>> The pure-Python Miller-Rabin code i posted in the
>> aforementioned thread is typically at least 100
>> times faster than that.

[Steven]
> This is exactly the sort of argument about quality of
> implementation which isn't, or shouldn't be, part of
> the argument about the API, IMO.

I wasn't at all talking about API at that point.  I was backing the argument _you_ were making, that trial division is a hopelessly inefficient implementation, compared to what's possible with probabilistic methods, regardless of API.  You were in fact underselling that argument, because it's straightforward to get an order or two of magnitude faster than you demonstrated.


> the Q I quoted above was carefully designed (not by me, I hasten
> to add!)

I know the paper it came from.

> to be a preudoprime to the first 307 prime bases, so it's
> something of a worst-case scenario for my version.

Which is why I have no problem picturing how this "should be" approached:  the original Miller-Rabin (which uses random bases, not succumbing to the "premature optimization" catastrophe magnet) has no problem with Q (or with anything else!), and hasn't really been improved on for general-purpose use since it was published.  It's a darned-near perfect mix of simplicity, brevity, clarity, robustness, generality, and speed.  "Good enough" by a long shot on all counts.

>>    def probably_prime(n, maxsteps=20)
>>
>> supplies a _default_ instead.  I don't want an API that's useful
>> _only_ to people who don't know what they're doing ;-)

> It's not just for people who don't know what they're doing. It
> is for people who don't want to sweat the small details,

Then they can accept the default.  In what conceivable sense is that a burden?

> they just want an nswer True or False and are prepared
> to trust that the function author knows what they are
> doing.

But the function author can't possibly know what the _user_ needs this for.  In some apps degree of certainty is far more important than speed, while in other apps it's the opposite.  Be realistic here?  Your argument here makes sense for always-right functions, but you're not willing to accept one of those for this specific purpose (& neither am I).

> If someone cares about the small details like how
> many bases to try,

It's not a "small detail" where it matters:  it is THE way to trade off computation time against confidence in the results.  It's a _fundamental_ aspect of how Miller-Rabin works.

> they might also care about details like:
>
> - which specific bases are used;
> - whether to use Miller-Rabin at all;
> - how many trial divisions to do first;
> - which specific primality test to use;

Then they should go use a special-purpose library ;-)  Letting them fiddle the single most important parameter isn't down in the weeds, it's a top-level control knob.

My answers to all the above:

- bases are picked via randrange(2, n-1)
- yes, because no better general-purpose algorithm is known
- none - although I'll allow that there may be a
  speed advantage in some apps if a gcd is done with a
  relatively small primorial first (more efficient than
  one-at-a-time trial divisions by small primes)
- Miller-Rabin

> What if the implementation shifts away from Miller-Rabin
> to (say) Baillie-PSW?

It can't ;-)  I would absolutely document that Miller-Rabin _is_ the algorithm being used, with the random-bases implementation choice.  Then the curious can search the web for a mountain of good information about it.

> Then your maxsteps parameter becomes meaningless. Do we
> deprecate it or leave it there in case the implementation
> changes again to something that has a concept of number
> of steps?

All moot, given that I have no interest in hiding the specific algorithm in use.  YAGNI.

> I think that if someone cares sufficiently about the details,
> then they probably won't be satisfied with a single isprime
> function, but may want is_fermat_prime, is_miller_rabin_prime,
> is_lucas_prime etc.

Again, go to a special-purpose library if that's what they want.  And, again, I don't agree with your characterization of the Miller-Rabin maxsteps parameter as a "detail".  It's a fundamental aspect of what the algorithm does.  Which casual users can certainly ignore, but at their own risk.

> ...
> Correct. The first twelve primes make up such a minimal set.

And if you don't care about picking the fixed bases from a prefix of the primes, you only need 7 bases for a 100% reliable test through 2**64:  2, 325, 9375, 28178, 450775, 9780504, and 1795265022.  Or so this table claims:

https://miller-rabin.appspot.com/

But I don't care here.  Using a fixed set of bases is begging for embarrassment (for every fixed finite set of bases, there are an infinite number of composites they'll _claim_ are prime).  There are no systemic failure modes when using random bases.

History
Date	User	Action	Args
2020-02-03 03:10:05	tim.peters	set	recipients: + tim.peters, lemburg, rhettinger, mark.dickinson, vstinner, stutzbach, steven.daprano, serhiy.storchaka, veky, Ananthakrishnan
2020-02-03 03:10:05	tim.peters	set	messageid: <1580699405.15.0.279873378259.issue39479@roundup.psfhosted.org>
2020-02-03 03:10:05	tim.peters	link	issue39479 messages
2020-02-03 03:10:03	tim.peters	create