This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: long.__str__ is quadratic time
Type: Stage:
Components: Library (Lib) Versions: Python 2.6
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Asymptotically faster divmod and str(long)
View: 3451
Assigned To: Nosy List: dbenbenn, mark.dickinson, rhettinger
Priority: normal Keywords:

Created on 2007-07-01 15:42 by dbenbenn, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (6)
msg32438 - (view) Author: David Benbennick (dbenbenn) Date: 2007-07-01 15:42
In the 2.5.1 source code, Objects/longobject.c:long_format() is used to convert a long int to a string.  When base is not a power of 2, it is quadratic time in the length of the output string.  Of course, this is fine on small numbers, but is catastrophic on huge numbers.

Suppose base is 10.  The problem is that the algorithm basically does the following: take the number mod 10 to get a digit, stick that digit on the output, divide the number by 10, and repeat.

To print an n digit number, there is an O(n log n) algorithm, using divide-and-conquer.  You break the number into 2 pieces, each n/2 digits long, and iterate on both pieces.

Converting string to long has the same quadratic time problem, in PyLong_FromString().  The solution is the same, in reverse: break the string in half, convert each piece to a long, and combine the two longs into one.


Alternatively, Python could just use GMP (GNU MP Bignum Library, http://gmplib.org/) to provide long integers.  That would make other operations, such as * and /, more efficient, too.  But it would require a much bigger change.
msg32439 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2007-07-03 14:22
I'd call this a feature request rather than a bug.

If I understand correctly, an O(n^(1+epsilon)) printing algorithm would rely on having an FFT-based fast multiplication algorithm, together with some form of divide-and-conquer division---is this right?  These algorithms are nontrivial to implement efficiently, and even then the crossover point (where the FFT-based method becomes faster than the quadratic method) is likely to be in the thousands of digits.  So I can't imagine there's much demand for this---even a 4096-bit RSA key is only 1233 (or 1234) digits long.  If you just want subquadratic printing (O(n^1.585) or so) then you'd still need a subquadratic division (Python already has Karatsuba multiplication for large integers); here I guess the crossover would be smaller.  A subquadratic division for Python might well be of interest to the developers, if someone could be persuaded to write and test one, and demonstrate a significant positive impact on performance.

What's your use-case for printing huge integers fast?  It doesn't seem like a very common need.

Regarding GMP, I believe there are licensing issues:  it's not legal to include GMP in core Python and release Python under its current non-GPL license, or something like that---I don't know anything about the details.  But have you encountered Martelli's gmpy? 

http://code.google.com/p/gmpy/

msg32440 - (view) Author: David Benbennick (dbenbenn) Date: 2007-07-03 18:23
> rely on having an FFT-based fast multiplication algorithm, together with
> some form of divide-and-conquer division---is this right?

Yes, that's true: fast str() relies on fast division.  I had assumed Python already had fast division; if it doesn't, I'd consider that a bug, too.

> What's your use-case for printing huge integers fast?

Note that it's not a question of printing them *fast*.  With a quadratic time algorithm, it's infeasible to print huge numbers *at all*.  My personal use case is doing computations in Thompson's group F; an element of F is a list of humongous fractions.  But I expect it's a problem that often comes up in mathematical programming.  I'll admit it isn't a very important bug, since anyone who is harmed by it will either use a different language, or use gmpy, or print in hex.  But it's still a bug.

> Regarding GMP, I believe there are licensing issues:  it's not legal to
> include GMP in core Python and release Python under its current non-GPL
> license, or something like that---I don't know anything about the details.

I don't see what the problem would be.  Python's LICENSE file says that Python's license is GPL compatible.  And in any case, GMP is LGPL, not GPL, so any program can link to it.
msg59938 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-01-14 23:33
I would be happy with a patch that does divide-and-conquer.  That code
would be much easier to get correct than the FFT algorithm and it would
still give nice Big-O results.
msg68586 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-06-22 22:24
Still waiting for the patch.
msg70703 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-08-04 15:57
Closing this as a duplicate; it's superseded by issue 3451.
History
Date User Action Args
2022-04-11 14:56:25adminsetgithub: 45145
2008-08-04 15:57:54mark.dickinsonsetstatus: open -> closed
resolution: duplicate
superseder: Asymptotically faster divmod and str(long)
messages: + msg70703
2008-06-22 22:25:00rhettingersetassignee: rhettinger ->
messages: + msg68586
2008-01-14 23:33:42rhettingersetmessages: + msg59938
2007-11-27 17:33:05rhettingersetassignee: rhettinger
nosy: + rhettinger
versions: + Python 2.6, - Python 2.5
2007-07-01 15:42:45dbenbenncreate