Message412120
The test case here is a = (1 << 100000000) - 1, a solid string of 100 million 1 bits. The goal is to convert to a decimal string.
Methods:
native: str(a)
numeral: the Python numeral() function from bpo-3451's div.py after adapting to use the Python divmod_fast() from the same report's fast_div.py.
todecstr: from the Python file attached to this report.
gmp: str() applied to gmpy2.mpz(a).
Timings:
native: don't know; gave up after waiting over 2 1/2 hours.
numeral: about 5 1/2 minutes.
todecstr: under 30 seconds. (*)
gmp: under 6 seconds.
So there's room for improvement ;-)
But here's the thing: I've lost count of how many times someone has whipped up a pure-Python implementation of a bigint algorithm that leaves CPython in the dust. And they're generally pretty easy in Python.
But then they die there, because converting to C is soul-crushing, losing the beauty and elegance and compactness to mountains of low-level details of memory-management, refcounting, and checking for errors after every tiny operation.
So a new question in this endless dilemma: _why_ do we need to convert to C? Why not leave the extreme cases to far-easier to write and maintain Python code? When we're cutting runtime from hours down to minutes, we're focusing on entirely the wrong end to not settle for 2 minutes because it may be theoretically possible to cut that to 1 minute by resorting to C.
(*) I hope this algorithm tickles you by defying expectations ;-) It essentially stands `numeral()` on its head by splitting the input by a power of 2 instead of by a power of 10. As a result _no_ divisions are used. But instead of shifting decimal digits into place, it has to multiply the high-end pieces by powers of 2. That seems insane on the face of it, but hard to argue with the clock ;-) The "tricks" here are that the O(log log n) powers of 2 needed can be computed efficiently in advance of any splitting, and that all the heavy arithmetic is done _in_ the `decimal` module, which implements fancier-than-Karatsuba multiplication and whose values can be converted to decimal strings very quickly. |
|
Date |
User |
Action |
Args |
2022-01-30 01:30:43 | tim.peters | set | recipients:
+ tim.peters, Dennis Sweeney |
2022-01-30 01:30:43 | tim.peters | set | messageid: <1643506243.72.0.655388844522.issue46558@roundup.psfhosted.org> |
2022-01-30 01:30:43 | tim.peters | link | issue46558 messages |
2022-01-30 01:30:43 | tim.peters | create | |
|