This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Quadratic time internal base conversions
Type: performance Stage: resolved
Components: Interpreter Core Versions:
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: Carl.Friedrich.Bolz, Dennis Sweeney, tim.peters
Priority: normal Keywords:

Created on 2022-01-28 02:31 by tim.peters, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
todecstr.py tim.peters, 2022-01-28 02:31
todecstr.py tim.peters, 2022-01-28 03:35
Messages (9)
msg411962 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2022-01-28 02:31
Our internal base conversion algorithms between power-of-2 and non-power-of-2 bases are quadratic time, and that's been annoying forever ;-) This applies to int<->str and int<->decimal.Decimal conversions. Sometimes the conversion is implicit, like when comparing an int to a Decimal.

For example:

>>> a = 1 << 1000000000 # yup! a billion and one bits
>>> s = str(a)

I gave up after waiting for over 8 hours, and the computation apparently can't be interrupted.

In contrast, using the function in the attached todecstr.py gets the result in under a minute:

>>> a = 1 << 1000000000
>>> s = todecstr(a)
>>> len(s)
301029996

That builds an equal decimal.Decimal in a "clever" recursive way, and then just applies str to _that_.

That's actually a best case for the function, which gets major benefit from the mountains of trailing 0 bits. A worst case is all 1-bits, but that still finishes in under 5 minutes:

>>> a = 1 << 1000000000
>>> s2 = todecstr(a - 1)
>>> len(s2)
301029996
>>> s[-10:], s2[-10:]
('1787109376', '1787109375')

A similar kind of function could certainly be written to convert from Decimal to int much faster, but it would probably be less effective. These things avoid explicit division entirely, but fat multiplies are key, and Decimal implements a fancier * algorithm than Karatsuba.

Not for the faint of heart ;-)
msg411966 - (view) Author: Dennis Sweeney (Dennis Sweeney) * (Python committer) Date: 2022-01-28 02:44
Is this similar to https://bugs.python.org/issue3451 ?
msg411969 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2022-01-28 03:12
Dennis, partly, although that was more aimed at speeding division, while the approach here doesn't use division at all.

However, thinking about it, the implementation I attached doesn't actually for many cases (it doesn't build as much of the power tree in advance as may be needed). Which I missed because all the test cases I tried had mountains of trailing 0 or 1 bits, not mixtures.

So I'm closing this anyway, at least until I can dream up an approach that always works. Thanks!
msg411971 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2022-01-28 03:35
Changed the code so that inner() only references one of the O(log log n) powers of 2 we actually precomputed (it could get lost before if `lo` was non-zero but within `n` had at least one leading zero bit - now we _pass_ the conceptual width instead of computing it on the fly).
msg412120 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2022-01-30 01:30
The test case here is a = (1 << 100000000) - 1, a solid string of 100 million 1 bits. The goal is to convert to a decimal string.

Methods:

native: str(a)

numeral: the Python numeral() function from bpo-3451's div.py after adapting to use the Python divmod_fast() from the same report's fast_div.py.

todecstr: from the Python file attached to this report.

gmp: str() applied to gmpy2.mpz(a).

Timings:

native: don't know; gave up after waiting over 2 1/2 hours.
numeral: about 5 1/2 minutes.
todecstr: under 30 seconds. (*)
gmp: under 6 seconds.

So there's room for improvement ;-)

But here's the thing: I've lost count of how many times someone has whipped up a pure-Python implementation of a bigint algorithm that leaves CPython in the dust. And they're generally pretty easy in Python.

But then they die there, because converting to C is soul-crushing, losing the beauty and elegance and compactness to mountains of low-level details of memory-management, refcounting, and checking for errors after every tiny operation.

So a new question in this endless dilemma: _why_ do we need to convert to C? Why not leave the extreme cases to far-easier to write and maintain Python code? When we're cutting runtime from hours down to minutes, we're focusing on entirely the wrong end to not settle for 2 minutes because it may be theoretically possible to cut that to 1 minute by resorting to C.


(*) I hope this algorithm tickles you by defying expectations ;-) It essentially stands `numeral()` on its head by splitting the input by a power of 2 instead of by a power of 10. As a result _no_ divisions are used. But instead of shifting decimal digits into place, it has to multiply the high-end pieces by powers of 2. That seems insane on the face of it, but hard to argue with the clock ;-) The "tricks" here are that the O(log log n) powers of 2 needed can be computed efficiently in advance of any splitting, and that all the heavy arithmetic is done _in_ the `decimal` module, which implements fancier-than-Karatsuba multiplication and whose values can be converted to decimal strings very quickly.
msg412122 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2022-01-30 03:01
Addendum: the "native" time (for built in str(a)) in the msg above turned out to be over 3 hours and 50 minutes.
msg412172 - (view) Author: Carl Friedrich Bolz-Tereick (Carl.Friedrich.Bolz) * Date: 2022-01-30 20:01
Somebody pointed me to V8's implementation of str(bigint) today:

https://github.com/v8/v8/blob/main/src/bigint/tostring.cc

They say that they can compute str(factorial(1_000_000)) (which is 5.5 million decimal digits) in 1.5s:

https://twitter.com/JakobKummerow/status/1487872478076620800

As far as I understand the code (I suck at C++) they recursively split the bigint into halves using % 10^n at each recursion step, but pre-compute and cache the divisors' inverses.
msg412191 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2022-01-31 05:27
The factorial of a million is much smaller than the case I was looking at. Here are rough timings on my box, for computing the decimal string from the bigint (and, yes, they all return the same string):

native:   475    seconds (about 8 minutes)
numeral:   22.3  seconds
todecstr:   4.10 seconds
gmp:        0.74 seconds

"They recursively split the bigint into halves using % 10^n at each recursion step". That's the standard trick for "output" conversions. Beyond that, there are different ways to try to use "fat" multiplications instead of division. The recursive splitting all on its own can help, but dramatic speedups need dramatically faster multiplication.

todecstr treats it as an "input" conversion instead, using the `decimal` module to work mostly _in_ base 10. That, by itself, reduces the role of division (to none at all in the Python code), and `decimal` has a more advanced multiplication algorithm than CPython's bigints have.
msg412192 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2022-01-31 06:11
> todecstr treats it as an "input" conversion instead, ...

Worth pointing this out since it doesn't seem widely known: "input" base conversions are _generally_ faster than "output" ones. Working in the destination base (or a power of it) is generally simpler.

In the math.factorial(1000000) example, it takes CPython more than 3x longer for str() to convert it to base 10 than for int() to reconstruct the bigint from that string. Not an O() thing (they're both quadratic time in CPython today).
History
Date User Action Args
2022-04-11 14:59:55adminsetgithub: 90716
2022-01-31 06:11:48tim.peterssetmessages: + msg412192
2022-01-31 05:27:19tim.peterssetmessages: + msg412191
2022-01-30 20:01:02Carl.Friedrich.Bolzsetnosy: + Carl.Friedrich.Bolz
messages: + msg412172
2022-01-30 03:01:06tim.peterssetmessages: + msg412122
2022-01-30 01:30:43tim.peterssetmessages: + msg412120
2022-01-28 03:35:22tim.peterssetfiles: + todecstr.py

messages: + msg411971
2022-01-28 03:12:39tim.peterssetstatus: open -> closed
resolution: wont fix
messages: + msg411969

stage: resolved
2022-01-28 02:44:55Dennis Sweeneysetnosy: + Dennis Sweeney
messages: + msg411966
2022-01-28 02:31:44tim.peterscreate