msg70309 - (view) |
Author: Fredrik Johansson (fredrikj) |
Date: 2008-07-27 01:11 |
A few weeks ago, I blogged about taking advantage of Karatsuba
multiplication and Newton's method to divide big integers quickly (some
of you may have read it, as it was posted to Daily Python-URL among
other places):
http://fredrik-j.blogspot.com/2008/07/making-division-in-python-faster.html
To summarize, this method blows the builtin division out of the water
already at ~(2000 digits)/(1000 digits).
The primary catch is that the result of the Newton division may be
slightly wrong (typically 1 ulp). However, a single extra multiplication
and a subtraction at the end allows one to compute a remainder, and
since the remainder must satisfy 0 <= r < q, the error is easily
corrected. From a quick test, the cost of the extra multiplication seems
to move the break-even point with the builtin division up to around
5000/2500 digits.
A pure Python implementation of divmod, with error correction based on
the remainder, can be found in this file:
http://www.dd.chalmers.se/~frejohl/code/libarith/libarith.py
(See the function idivmod)
Of particular note is that fast divmod gives a fast way to do radix
conversion, by recursively splitting the number in half. The function
numeral (see same .py file) does this, e.g:
>>> from time import clock
>>> a = 2**1257787-1
>>> t1=clock(); s1=str(a); t2=clock(); t2-t1
109.08065159119386
>>> t1=clock(); s2=numeral(a); t2=clock(); t2-t1
7.1394886780303182
>>> s1 == s2
True
(This recursive algorithm, by the way, is actually faster than str()
even with the slow builtin divmod.)
Would there be any interest in porting these algorithms to C and using
them in the standard Python long implementation?
There are likely some problems that I have overlooked. A critical review
will be most welcome.
|
msg70313 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2008-07-27 06:34 |
> Would there be any interest in porting these algorithms to C and using
> them in the standard Python long implementation?
Yes, definitely! Though it depends a little bit how much complication
is involved.
A subquadratic algorithm for converting strings to longs might also fit
well here.
Now I'm just waiting for you to propose an implementation of integer
square root :-).
|
msg70316 - (view) |
Author: Fredrik Johansson (fredrikj) |
Date: 2008-07-27 07:07 |
> Yes, definitely! Though it depends a little bit how much complication
is involved.
Not that much, I think, the pure Python version being just a few lines
of code (plus a few more lines of boilerplate). But I'm not too familiar
with converting Python to C myself, so I may underestimate the
difficulties. It might get a little more complicated if you want to
minimize temporary allocations, for example.
> Now I'm just waiting for you to propose an implementation of integer
square root :-).
Yes, there is also code for fast (O(M(n)) integer square roots in
libarith.py. This function would be much less useful overall as a
builtin, though.
|
msg70325 - (view) |
Author: Fredrik Johansson (fredrikj) |
Date: 2008-07-27 16:04 |
For your convenience, I have split the division and numeral code off to
a standalone .py file which I'm attaching here.
I also updated the remainder logic to use the default divmod instead of
repeated subtraction, which ensures worst-case performance similar to
the builtin divmod in the very unlikely case that div_newton would
return junk.
|
msg70326 - (view) |
Author: Fredrik Johansson (fredrikj) |
Date: 2008-07-27 16:17 |
And here some timings on my laptop.
Both int->str and balanced division become faster somewhere between 1000
and 2000 digits. This is after editing the division benchmark to use
random numbers instead of exact powers of ten (interestingly causing a
bit of slowdown). String conversion might be a little slower for lengths
that aren't close to powers of two.
|
msg70702 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2008-08-04 15:51 |
There's also the recursive division algorithm due
to Burnikel and Ziegler; this might be worth a look.
I think it's the same asymptotic complexity (constant
times karatsuba multiplication complexity), but may
turn out to be faster for one reason or another.
I had a Python implementation of this somewhere;
I'll see if I can dig it out.
Assigning this to me so that it doesn't get lost or
forgotten; but note that I don't intend to do
anything about it before 2.6/3.0 final. If anyone else
wants to take it off my hands before then, feel free.
|
msg70704 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2008-08-04 15:58 |
See also issue 1746088.
|
msg70719 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2008-08-04 20:38 |
Here's a pure Python implementation of the Burnikel and Ziegler recursive
division algorithm. I've no idea whether it's faster or slower than
Newton, but it might be worth a look. It depends heavily on bit
operations, which ought to be much faster when coded in C. (Some of the
shifts would be completely unnecessary---replaced by changes in indexing
instead.)
The original paper describing the algorithm is available here:
http://cr.yp.to/bib/1998/burnikel.ps
|
msg70720 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2008-08-04 20:38 |
Oops. Wrong file. Here's the right one.
|
msg70776 - (view) |
Author: Fredrik Johansson (fredrikj) |
Date: 2008-08-06 08:51 |
Indeed, that seems to be much faster. In the mean time, do you mind if I
steal the code? :-)
|
msg71010 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2008-08-11 10:40 |
> Indeed, that seems to be much faster. In the mean time, do you mind if I
> steal the code? :-)
Not at all!
I guess constant factors could well appear and/or disappear when
recoding in C; it might well be worth trying to implement both
methods to see which is faster.
There are some semi-obvious tricks available that might speed up the
recursive version. For one thing, all shifts should be in multiples
of 15 bits (because longs are stored in base 2**15). For another, it
ought to be possible to avoid the single-bit normalization shifts
every time the number of bits ('n') is odd---one can do a single
shift at the beginning of the calculation instead.
|
msg72772 - (view) |
Author: Pernici Mario (pernici) |
Date: 2008-09-08 09:39 |
I have translated in C the algorithm for long division by
Burnikel and Ziegler (BZ), using the Python version fast_div.py
and the suggestions by Mark.
Here is a benchmark for divmod(p. q), p = 7**np, q = 5**nq
digits = q_digits = p_digits/2; the time taken by Python division
is normalized to 1
tests on Debian, BZ with Python-2.6b3
Pentium 1.60GHz Athlon XP 2600+ Athlon 64 3800+
digits BZ fast_div BZ fast_div BZ fast_div
500 1.01 1.27 1.00 1.18 1.00 1.21
700 0.88 1.22 0.76 1.08 0.81 1.14
1000 0.82 1.17 0.72 1.04 0.76 1.10
2000 0.66 0.85 0.55 0.73 0.59 0.78
4000 0.51 0.62 0.43 0.52 0.45 0.56
10000 0.32 0.38 0.31 0.37 0.33 0.39
20000 0.24 0.25 0.23 0.25 0.25 0.27
100000 0.14 0.14 0.11 0.11 0.12 0.12
BZ starts being faster than Python division around 2000 bits, apart
from two cases:
for q with less than 4000 bits and p much smaller than q**2
x_divrem is faster than divmod_pos;
for p very much larger than q**2 x_divrem becomes
again faster.
I put a bound in long_divrem to fall back to x_divrem in those cases,
based on tests on my laptop Pentium 1.60GHz.
The treatment of exceptions is very rudimentary;
I use a simple macro STUB_EXC to return NULL,
but without releasing memory.
Help for doing this properly is welcome.
Please find attached the diff to the longobject.c file in Python-2.6b3 .
Mario
|
msg73698 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2008-09-24 09:55 |
Thanks for the patch, Mario! I'll give a look when I get the chance.
|
msg74679 - (view) |
Author: Pernici Mario (pernici) |
Date: 2008-10-13 10:14 |
In this patch I added to the patch by Mark in issue 3944 the code
in the previous patch, modified to release memory in case of exceptions.
Benchmark for division on Athlon 64 3800+ with respect to Python3.0:
(1) Python with patch 30bit.patch
(2) Python with this patch
up to 1000 digits (1) and (2) perform in the same way
digits (1) (2)
2000 4x 5x
4000 4x 7x
10000 4x 10x
20000 4x 13x
100000 4x 27x
|
msg84105 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2009-03-24 18:32 |
The longobject2.diff patch probably doesn't apply cleanly any more.
Anyone interested in updating it?
I think this patch looks like a promising beginning, but there's still
quite a lot of work to do. The main concerns at the moment are:
(1) the huge numbers of Py_DECREFs and Py_XDECREFs. Some refactoring
might help here ("goto" isn't all bad: it's fairly common to use lots of
"goto error" statements in Python's source code).
(2) I suspect that many of the operations could be turned into in-place
operations on digit vectors, thus saving lots of object allocations and
deallocations. This should also help out with (1).
I'm not yet 100% sold on getting subquadratic division into Python---it
remains to be seen how much complexity it adds. If it makes it easy to
implement subquadratic integer <-> string conversions that would be a big
plus.
|
msg84702 - (view) |
Author: Pernici Mario (pernici) |
Date: 2009-03-30 22:59 |
In this patch there is an implementation of the algorithm to convert
numbers in strings by recursively splitting the number in half, adapted
from Fredrik's div.py
|
msg84704 - (view) |
Author: Pernici Mario (pernici) |
Date: 2009-03-30 23:04 |
In this second patch to the above patch it is added the recursive
division algorithm by Burnikel and Ziegler (BZ)
from longobject2.diff, unmodified,
to see the effect of the subquadratic algorithm; there is still a lot of
work to be done on it, as Mark pointed out.
Here is a benchmark on hp pavilion Q8200 2.33GHz
a = 7**n; compute str(a) N times
n N unpatched patch1 patch2
100 100000 0.25 0.25 0.25
200 100000 0.63 0.64 0.63
300 100000 1.19 1.22 1.23
400 100000 1.87 1.74 1.75
500 10000 0.27 0.24 0.24
1000 10000 0.98 0.59 0.60
2000 1000 0.36 0.16 0.17
5000 1000 2.17 0.65 0.66
10000 100 0.86 0.20 0.19
20000 100 3.42 0.70 0.55
50000 10 2.13 0.37 0.24
100000 1 0.85 0.13 0.08
500000 1 21 2.9 0.94
1000000 1 85 11 2.8
2000000 1 339 44 8.4
str(n) in the first patch uses a quadratic algorithm, but asymptotically
it is almost 8 times faster than the unpatched version;
in the second patch the subquadratic algorithm starts showing
after 10000 digits.
|
msg103028 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2010-04-13 09:27 |
Unassigning myself for now. The most recent patch still needs work before it can be considered for inclusion.
|
msg190407 - (view) |
Author: Eric V. Smith (eric.smith) * |
Date: 2013-05-31 15:05 |
See also issue18107, in particular http://mail.python.org/pipermail/pypy-dev/2013-May/011433.html.
|
msg221270 - (view) |
Author: Mark Lawrence (BreamoreBoy) * |
Date: 2014-06-22 16:12 |
As issue18107 has been closed as a duplicate of this, should this be moved to open from languishing?
|
msg390940 - (view) |
Author: Carl Friedrich Bolz-Tereick (Carl.Friedrich.Bolz) * |
Date: 2021-04-13 08:55 |
FWIW, we have implemented a faster algorithm for divmod for big numbers using Mark's fast_div.py in PyPy. In particular, this speeds up str(long) for large numbers significantly (eg calling str on the result of math.factorial(2**17) is now 15x faster than CPython ;-)).
|
msg390966 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2021-04-13 13:55 |
> we have implemented a faster algorithm for divmod for big numbers using Mark's fast_div.py in PyPy
Urk! I hope your implementation included a healthy dose of validation, testing and cleanup. :-) (I have no doubts that it did.)
I'd consider fast_div.py to be proof-of-concept code rather than something suitable for immediate use.
|
msg390968 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2021-04-13 14:57 |
FWIW, I think we should close this; the same comments as in issue 26256 apply. See msg259408 onwards in that discussion.
|
msg390971 - (view) |
Author: Carl Friedrich Bolz-Tereick (Carl.Friedrich.Bolz) * |
Date: 2021-04-13 15:29 |
yes, that sounds fair. In PyPy we improve things occasionally if somebody feels like working on it, but in general competing against GMP is a fools errand.
|
msg390972 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2021-04-13 15:47 |
Python has a reasonable efficiency up to 1000 decimal digits (according to benchmark) which is already really great! Operations with more than 1000 decimal digits is an uncommon. IMO you should use a dedicated library like GMP for that.
I would prefer to keep CPython code simple enough to maintain. Objects/longobject.c is already 5744 lines of C code. longformat_BZ.diff adds around 700 lines of code, but it only makes str(int) starting at 2000 decimal digits.
Yeah, we could do better for integers with <many digits>, but I would prefer not to :-) Python has a limited number of volunteers working on it, and it's not specialized in numbers. We should keep the maintenance burden at an acceptable level ;-)
|
msg412121 - (view) |
Author: Tim Peters (tim.peters) * |
Date: 2022-01-30 01:41 |
Ha! This will never die. More discussion in bpo-46558.
Ya, I already closed it, but don't want to. I opened it to begin with to record an int->str method that doesn't use division, so it didn't really belong on this report.
What if we _didn't_ convert these things to C? That's the primary question the new report gets around to asking. These things are far easier to write, understand, and maintain in Python, and converting to C is generally only improving a constant factor, or _maybe_ removing a log(n) factor. When we're looking at massive O() improvements, squeezing those out is of comparatively little value.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:36 | admin | set | github: 47701 |
2022-01-30 01:41:10 | tim.peters | set | nosy:
+ tim.peters messages:
+ msg412121
|
2021-04-13 15:47:42 | vstinner | set | status: languishing -> closed resolution: wont fix messages:
+ msg390972
stage: patch review -> resolved |
2021-04-13 15:29:44 | Carl.Friedrich.Bolz | set | messages:
+ msg390971 |
2021-04-13 15:05:16 | BreamoreBoy | set | nosy:
- BreamoreBoy
|
2021-04-13 14:57:33 | mark.dickinson | set | messages:
+ msg390968 |
2021-04-13 13:55:02 | mark.dickinson | set | messages:
+ msg390966 |
2021-04-13 08:55:52 | Carl.Friedrich.Bolz | set | nosy:
+ Carl.Friedrich.Bolz messages:
+ msg390940
|
2014-06-22 16:12:51 | BreamoreBoy | set | nosy:
+ serhiy.storchaka, BreamoreBoy messages:
+ msg221270
|
2013-05-31 15:05:07 | eric.smith | set | nosy:
+ eric.smith messages:
+ msg190407
|
2011-04-27 11:57:55 | xuanji | set | nosy:
+ xuanji
|
2010-04-13 09:27:29 | mark.dickinson | set | status: open -> languishing assignee: mark.dickinson -> messages:
+ msg103028
versions:
- Python 2.7 |
2010-01-03 14:07:45 | mark.dickinson | set | versions:
- Python 3.1 |
2009-09-21 16:32:16 | steve21 | set | nosy:
+ steve21
|
2009-05-16 20:48:09 | ajaksu2 | set | nosy:
+ vstinner versions:
+ Python 3.2
stage: patch review |
2009-03-30 23:04:52 | pernici | set | files:
+ longformat_BZ.diff
messages:
+ msg84704 |
2009-03-30 22:59:13 | pernici | set | files:
+ longformat.diff
messages:
+ msg84702 |
2009-03-24 18:32:11 | mark.dickinson | set | messages:
+ msg84105 |
2008-10-13 10:14:34 | pernici | set | files:
+ longobject2.diff keywords:
+ patch messages:
+ msg74679 |
2008-09-24 09:55:34 | mark.dickinson | set | messages:
+ msg73698 |
2008-09-08 09:39:16 | pernici | set | files:
+ diff nosy:
+ pernici messages:
+ msg72772 |
2008-08-11 10:40:27 | mark.dickinson | set | messages:
+ msg71010 |
2008-08-06 08:51:51 | fredrikj | set | messages:
+ msg70776 |
2008-08-04 20:39:07 | mark.dickinson | set | files:
- fast_str.py |
2008-08-04 20:38:59 | mark.dickinson | set | files:
+ fast_div.py messages:
+ msg70720 |
2008-08-04 20:38:23 | mark.dickinson | set | files:
+ fast_str.py messages:
+ msg70719 |
2008-08-04 15:58:06 | mark.dickinson | set | messages:
+ msg70704 |
2008-08-04 15:57:54 | mark.dickinson | link | issue1746088 superseder |
2008-08-04 15:51:56 | mark.dickinson | set | priority: normal assignee: mark.dickinson type: performance messages:
+ msg70702 versions:
+ Python 3.1, Python 2.7 |
2008-07-27 16:17:46 | fredrikj | set | files:
+ div_timings.txt messages:
+ msg70326 |
2008-07-27 16:04:54 | fredrikj | set | files:
+ div.py messages:
+ msg70325 |
2008-07-27 07:07:39 | fredrikj | set | messages:
+ msg70316 |
2008-07-27 06:34:28 | mark.dickinson | set | messages:
+ msg70313 |
2008-07-27 01:13:09 | facundobatista | set | nosy:
+ mark.dickinson |
2008-07-27 01:11:04 | fredrikj | create | |