classification
Title: Math module method to find prime factors for non-negative int n
Type: enhancement Stage: needs patch
Components: Library (Lib) Versions: Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: jfine2358, mark.dickinson, rhettinger, serhiy.storchaka, steven.daprano, tim.peters, trrhodes
Priority: normal Keywords:

Created on 2020-03-20 19:52 by trrhodes, last changed 2020-03-23 04:14 by tim.peters.

Messages (13)
msg364711 - (view) Author: Ross Rhodes (trrhodes) * Date: 2020-03-20 19:52
Hello,

Thoughts on a new function in the math module to find prime factors for non-negative integer, n? After a brief search, I haven't found previous enhancement tickets raised for this proposal, and I am not aware of any built-in method within either Python's math module or numpy, but happy to be corrected on that front.

If there's no objection and the method does not already exist, I'm happy to implement it and open for review.

Ross
msg364719 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2020-03-20 21:14
Good idea, but yet another that really belongs in an `imath` module (which doesn't yet exist).

How ambitious should it be?  Sympy supplies a `factorint()` function for this, which uses 4 approaches under the covers:  perfect power, trial division, Pollard rho, and Pollard p-1.  All relatively simple to code with trivial memory burden, but not really practical (too slow) for "hard" composites well within the practical range of advanced methods.

But I'd be happy enough to settle for that.
msg364731 - (view) Author: Ross Rhodes (trrhodes) * Date: 2020-03-21 08:48
Hi Tim,

Are there any open discussions or threads following the proposed “imath” module? I’m a relatively new entrant to the Python community, so if there’s any ongoing discussion on that front I’d be happy to read further.

I think as a first step it would be good to implement this logic for a limited range of non-negative n, imposing an upper limit (suggestions welcome) to make sure all provided input can be safely processed. We can then build from there to support larger n going forward if the demand is out there.

Ross
msg364732 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-03-21 08:58
> Are there any open discussions or threads following the proposed “imath” module?

https://mail.python.org/archives/list/python-ideas@python.org/message/YYJ5YJBJNCVXQWK5K3WSVNMPUSV56LOR/

Issue37132.
msg364743 - (view) Author: Ross Rhodes (trrhodes) * Date: 2020-03-21 13:37
Hi Serhiy,

Thanks for sharing your thread. I support this proposal, and would be happy to help where time permits if we can gather sufficient support.

I inadvertently posted my support twice on your thread with no obvious means of deleting the duplicate post, since the first one had a delay appearing. Regardless, how can we best move forward with this idea?

Ross
msg364745 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-03-21 13:44
> Regardless, how can we best move forward with this idea?

Provide a pull request.
msg364747 - (view) Author: Ross Rhodes (trrhodes) * Date: 2020-03-21 13:51
Hi Serhiy,

> Provide a pull request.

Apologies, by "this idea" I should clarify I meant the "imath" module proposal. On this particular enhancement, yes, I'm happy to work on and later provide a pull request.
msg364749 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2020-03-21 14:16
I don't know... To my mind, if we are going to support working with primes, the minimum API is:

- is_prime(n)
- next_prime(n)
- prev_prime(n)
- factorise(n)
- generate_primes(start=0)

(I trust the names are self-explanatory.)

There are various other interesting prime-related factors which can be built on top of those, but the above five are, in my opinion, a minimal useful set.

Factorising negative numbers is simple: just include a factor of -1 with the prime factors. We would probably want to also support factorising 0 and 1 even though they don't have prime factors. The alternative is to raise an exception, which I expect would be more annoying than useful.
msg364750 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2020-03-21 14:25
Ross: 

"implement this logic for a limited range of non-negative n, imposing an upper limit (suggestions welcome) to make sure all provided input can be safely processed. We can then build from there to support larger n going forward if the demand is out there."

Urgh, please no! Arbitrary limits are horrible. Whatever maximum limit N you guess, somebody will want to factorise N+1. Consider this evidence of demand :-)

On what basis would you choose that limit? Basing it on the size of n is the wrong answer: factorising 2**10000000 is easy, and will be found by trial division almost instantly, even though it's a large number with over three million digits.

Another question: since factorization can take a long time, should it be a generator that yields the factors as they are found?
msg364754 - (view) Author: Ross Rhodes (trrhodes) * Date: 2020-03-21 14:35
Hi Steven,

I agree, your set of proposed methods seem sensible to me. I'm happy to start with an implementation of at least some of those methods and open for review, taking this one step at a time for easier review and regular feedback.

> Another question: since factorization can take a long time, should it be a generator that yields the factors as they are found?

Yes, I think a generator is a sensible shout. Happy to proceed with this suggestion.

Ross
msg364757 - (view) Author: Jonathan Fine (jfine2358) * Date: 2020-03-21 16:11
A pre-computed table of primes might be better. Of course, how long should the table be. There's an infinity of primes.

Consider
>>> 2**32
4294967296

This number is approximately 4 * (10**9). According to https://en.wikipedia.org/wiki/Prime_number_theorem, there are 50,847,534 primes less than 10**9. So, very roughly, there are 200,000,000 primes less than 2**32.

Thus, storing a list of all these prime numbers as 32 bit unsigned integers would occupy about
>>> 200_000_000 / (1024**3) * 4
0.7450580596923828
or in other words 3/4 gigabytes on disk.

A binary search into this list, using as starting point the expected location provided by the prime number theorem, might very well require on average less than two block reads into the file that holds the prime number list on disk. And if someone needs to find primes of this size, they've probably got a spare gigabyte or two.

I'm naturally inclined to this approach because by mathematical research involves spending gigahertz days computing tables. I then use the tables to examine hypotheses. See https://arxiv.org/abs/1011.4269. This involves subsets of the vertices of the 5-dimensional cube. There are of course 2**32 such subsets.
msg364785 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-03-22 05:52
I would just call gnu's gfactor for this task.
msg364835 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2020-03-23 04:14
Jonathan, _almost_ no factoring algorithms have any use for a table of primes.  Brute force trial division can use one, but that's about it.

A table of primes _is_ useful for implementing functions related to pi(x) = the number of primes <= x, and the bigger the table the better.  But that's not what this report is about.

Raymond, spinning up a process to factor a small integer is pretty wildly expensive and clumsy - even if you're on a box with gfactor.  This is the kind of frequently implemented thing where someone who knows what they're doing can easily whip up relatively simple Python code that's _far_ better than what most users come up with on their own.  For example, to judge from many stabs I've seen on StackOverflow, most users don't even realize that trial division can be stopped when the trial divisor exceeds the square root of what remains of the integer to be factored.

Trial division alone seems perfectly adequate for factoring 32-bit ints, even at Python speed, and even if it merely skips multiples of 2 and 3 (except, of course, for 2 and 3 themselves).

Pollard rho seems perfectly adequate for factoring 64-bit ints that _are_ composite (takes time roughly proportional to the square root of the smallest factor), but really needs to be backed by a fast "is it a prime?" test to avoid taking "seemingly forever" if fed a large prime.

To judge from the docs I could find, that's as far as gfactor goes too.  Doing that much in Python isn't a major project.  Arguing about the API would consume 10x the effort ;-)
History
Date User Action Args
2020-03-23 04:14:40tim.peterssetmessages: + msg364835
2020-03-22 05:52:04rhettingersetnosy: + rhettinger
messages: + msg364785
2020-03-21 16:11:06jfine2358setnosy: + jfine2358
messages: + msg364757
2020-03-21 14:35:22trrhodessetmessages: + msg364754
2020-03-21 14:25:26steven.dapranosetmessages: + msg364750
2020-03-21 14:16:10steven.dapranosetnosy: + steven.daprano
messages: + msg364749
2020-03-21 13:51:39trrhodessetmessages: + msg364747
2020-03-21 13:44:19serhiy.storchakasetmessages: + msg364745
2020-03-21 13:37:10trrhodessetmessages: + msg364743
2020-03-21 08:58:51serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg364732
2020-03-21 08:48:09trrhodessetmessages: + msg364731
2020-03-20 21:14:19tim.peterssetnosy: + tim.peters
messages: + msg364719

components: + Library (Lib)
type: enhancement
stage: needs patch
2020-03-20 20:31:18mark.dickinsonsetnosy: + mark.dickinson
2020-03-20 19:52:54trrhodescreate