Asymptotically faster divmod and str(long) #47701

fredrikj · 2008-07-27T01:11:04Z

BPO	3451
Nosy	@tim-one, @mdickinson, @vstinner, @cfbolz, @ericvsmith, @IIIIllllIIIIllllIIIIllllIIIIllllIIIIll, @serhiy-storchaka
Files	div.py: python implementation of Newton division div_timings.txt: Comparison of performance between builtin and Newton-based divmod and str fast_div.py: Fast recursive division diff: division with Burnikel , Ziegler algorithm longobject2.diff longformat.diff: str(n) recursive algorithm longformat_BZ.diff: str(n) and BZ division algorithm

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2021-04-13.15:47:42.644>
created_at = <Date 2008-07-27.01:11:04.477>
labels = ['interpreter-core', 'performance']
title = 'Asymptotically faster divmod and str(long)'
updated_at = <Date 2022-01-30.01:41:10.989>
user = 'https://bugs.python.org/fredrikj'

bugs.python.org fields:

activity = <Date 2022-01-30.01:41:10.989>
actor = 'tim.peters'
assignee = 'none'
closed = True
closed_date = <Date 2021-04-13.15:47:42.644>
closer = 'vstinner'
components = ['Interpreter Core']
creation = <Date 2008-07-27.01:11:04.477>
creator = 'fredrikj'
dependencies = []
files = ['10990', '10991', '11060', '11423', '11773', '13496', '13497']
hgrepos = []
issue_num = 3451
keywords = ['patch']
message_count = 26.0
messages = ['70309', '70313', '70316', '70325', '70326', '70702', '70704', '70719', '70720', '70776', '71010', '72772', '73698', '74679', '84105', '84702', '84704', '103028', '190407', '221270', '390940', '390966', '390968', '390971', '390972', '412121']
nosy_count = 10.0
nosy_names = ['tim.peters', 'mark.dickinson', 'pernici', 'vstinner', 'Carl.Friedrich.Bolz', 'eric.smith', 'fredrikj', 'steve21', 'xuanji', 'serhiy.storchaka']
pr_nums = []
priority = 'normal'
resolution = 'wont fix'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue3451'
versions = ['Python 3.2']

fredrikj · 2008-07-27T01:11:01Z

A few weeks ago, I blogged about taking advantage of Karatsuba
multiplication and Newton's method to divide big integers quickly (some
of you may have read it, as it was posted to Daily Python-URL among
other places):

http://fredrik-j.blogspot.com/2008/07/making-division-in-python-faster.html

To summarize, this method blows the builtin division out of the water
already at ~(2000 digits)/(1000 digits).

The primary catch is that the result of the Newton division may be
slightly wrong (typically 1 ulp). However, a single extra multiplication
and a subtraction at the end allows one to compute a remainder, and
since the remainder must satisfy 0 <= r < q, the error is easily
corrected. From a quick test, the cost of the extra multiplication seems
to move the break-even point with the builtin division up to around
5000/2500 digits.

A pure Python implementation of divmod, with error correction based on
the remainder, can be found in this file:

http://www.dd.chalmers.se/~frejohl/code/libarith/libarith.py

(See the function idivmod)

Of particular note is that fast divmod gives a fast way to do radix
conversion, by recursively splitting the number in half. The function
numeral (see same .py file) does this, e.g:

>>> from time import clock
>>> a = 2**1257787-1
>>> t1=clock(); s1=str(a); t2=clock(); t2-t1
109.08065159119386
>>> t1=clock(); s2=numeral(a); t2=clock(); t2-t1
7.1394886780303182
>>> s1 == s2
True

(This recursive algorithm, by the way, is actually faster than str()
even with the slow builtin divmod.)

Would there be any interest in porting these algorithms to C and using
them in the standard Python long implementation?

There are likely some problems that I have overlooked. A critical review
will be most welcome.

mdickinson · 2008-07-27T06:34:28Z

Would there be any interest in porting these algorithms to C and using
them in the standard Python long implementation?

Yes, definitely! Though it depends a little bit how much complication
is involved.

A subquadratic algorithm for converting strings to longs might also fit
well here.

Now I'm just waiting for you to propose an implementation of integer
square root :-).

fredrikj · 2008-07-27T07:07:39Z

Yes, definitely! Though it depends a little bit how much complication
is involved.

Not that much, I think, the pure Python version being just a few lines
of code (plus a few more lines of boilerplate). But I'm not too familiar
with converting Python to C myself, so I may underestimate the
difficulties. It might get a little more complicated if you want to
minimize temporary allocations, for example.

Now I'm just waiting for you to propose an implementation of integer
square root :-).

Yes, there is also code for fast (O(M(n)) integer square roots in
libarith.py. This function would be much less useful overall as a
builtin, though.

fredrikj · 2008-07-27T16:04:52Z

For your convenience, I have split the division and numeral code off to
a standalone .py file which I'm attaching here.

I also updated the remainder logic to use the default divmod instead of
repeated subtraction, which ensures worst-case performance similar to
the builtin divmod in the very unlikely case that div_newton would
return junk.

fredrikj · 2008-07-27T16:17:46Z

And here some timings on my laptop.

Both int->str and balanced division become faster somewhere between 1000
and 2000 digits. This is after editing the division benchmark to use
random numbers instead of exact powers of ten (interestingly causing a
bit of slowdown). String conversion might be a little slower for lengths
that aren't close to powers of two.

mdickinson · 2008-08-04T15:51:55Z

There's also the recursive division algorithm due
to Burnikel and Ziegler; this might be worth a look.
I think it's the same asymptotic complexity (constant
times karatsuba multiplication complexity), but may
turn out to be faster for one reason or another.

I had a Python implementation of this somewhere;
I'll see if I can dig it out.

Assigning this to me so that it doesn't get lost or
forgotten; but note that I don't intend to do
anything about it before 2.6/3.0 final. If anyone else
wants to take it off my hands before then, feel free.

mdickinson · 2008-08-04T15:58:06Z

See also bpo-1746088.

mdickinson · 2008-08-04T20:38:21Z

Here's a pure Python implementation of the Burnikel and Ziegler recursive
division algorithm. I've no idea whether it's faster or slower than
Newton, but it might be worth a look. It depends heavily on bit
operations, which ought to be much faster when coded in C. (Some of the
shifts would be completely unnecessary---replaced by changes in indexing
instead.)

The original paper describing the algorithm is available here:

http://cr.yp.to/bib/1998/burnikel.ps

mdickinson · 2008-08-04T20:38:59Z

Oops. Wrong file. Here's the right one.

fredrikj · 2008-08-06T08:51:51Z

Indeed, that seems to be much faster. In the mean time, do you mind if I
steal the code? :-)

mdickinson · 2008-08-11T10:40:27Z

Indeed, that seems to be much faster. In the mean time, do you mind if I
steal the code? :-)

Not at all!

I guess constant factors could well appear and/or disappear when
recoding in C; it might well be worth trying to implement both
methods to see which is faster.

There are some semi-obvious tricks available that might speed up the
recursive version. For one thing, all shifts should be in multiples
of 15 bits (because longs are stored in base 2**15). For another, it
ought to be possible to avoid the single-bit normalization shifts
every time the number of bits ('n') is odd---one can do a single
shift at the beginning of the calculation instead.

pernici · 2008-09-08T09:39:14Z

I have translated in C the algorithm for long division by
Burnikel and Ziegler (BZ), using the Python version fast_div.py
and the suggestions by Mark.

Here is a benchmark for divmod(p. q), p = 7**np, q = 5**nq
digits = q_digits = p_digits/2; the time taken by Python division
is normalized to 1
tests on Debian, BZ with Python-2.6b3
Pentium 1.60GHz Athlon XP 2600+ Athlon 64 3800+
digits BZ fast_div BZ fast_div BZ fast_div
500 1.01 1.27 1.00 1.18 1.00 1.21
700 0.88 1.22 0.76 1.08 0.81 1.14
1000 0.82 1.17 0.72 1.04 0.76 1.10
2000 0.66 0.85 0.55 0.73 0.59 0.78
4000 0.51 0.62 0.43 0.52 0.45 0.56
10000 0.32 0.38 0.31 0.37 0.33 0.39
20000 0.24 0.25 0.23 0.25 0.25 0.27
100000 0.14 0.14 0.11 0.11 0.12 0.12

BZ starts being faster than Python division around 2000 bits, apart
from two cases:
for q with less than 4000 bits and p much smaller than q**2
x_divrem is faster than divmod_pos;
for p very much larger than q**2 x_divrem becomes
again faster.
I put a bound in long_divrem to fall back to x_divrem in those cases,
based on tests on my laptop Pentium 1.60GHz.

The treatment of exceptions is very rudimentary;
I use a simple macro STUB_EXC to return NULL,
but without releasing memory.
Help for doing this properly is welcome.

Please find attached the diff to the longobject.c file in Python-2.6b3 .

Mario

mdickinson · 2008-09-24T09:55:34Z

Thanks for the patch, Mario! I'll give a look when I get the chance.

pernici · 2008-10-13T10:14:29Z

In this patch I added to the patch by Mark in bpo-3944 the code
in the previous patch, modified to release memory in case of exceptions.

Benchmark for division on Athlon 64 3800+ with respect to Python3.0:
(1) Python with patch 30bit.patch
(2) Python with this patch
up to 1000 digits (1) and (2) perform in the same way
digits (1) (2)
2000 4x 5x
4000 4x 7x
10000 4x 10x
20000 4x 13x
100000 4x 27x

mdickinson · 2009-03-24T18:32:10Z

The longobject2.diff patch probably doesn't apply cleanly any more.
Anyone interested in updating it?

I think this patch looks like a promising beginning, but there's still
quite a lot of work to do. The main concerns at the moment are:

(1) the huge numbers of Py_DECREFs and Py_XDECREFs. Some refactoring
might help here ("goto" isn't all bad: it's fairly common to use lots of
"goto error" statements in Python's source code).

(2) I suspect that many of the operations could be turned into in-place
operations on digit vectors, thus saving lots of object allocations and
deallocations. This should also help out with (1).

I'm not yet 100% sold on getting subquadratic division into Python---it
remains to be seen how much complexity it adds. If it makes it easy to
implement subquadratic integer <-> string conversions that would be a big
plus.

pernici · 2009-03-30T22:59:12Z

In this patch there is an implementation of the algorithm to convert
numbers in strings by recursively splitting the number in half, adapted
from Fredrik's div.py

pernici · 2009-03-30T23:04:48Z

In this second patch to the above patch it is added the recursive
division algorithm by Burnikel and Ziegler (BZ)
from longobject2.diff, unmodified,
to see the effect of the subquadratic algorithm; there is still a lot of
work to be done on it, as Mark pointed out.

Here is a benchmark on hp pavilion Q8200 2.33GHz
a = 7**n; compute str(a) N times
n N unpatched patch1 patch2
100 100000 0.25 0.25 0.25
200 100000 0.63 0.64 0.63
300 100000 1.19 1.22 1.23
400 100000 1.87 1.74 1.75
500 10000 0.27 0.24 0.24
1000 10000 0.98 0.59 0.60
2000 1000 0.36 0.16 0.17
5000 1000 2.17 0.65 0.66
10000 100 0.86 0.20 0.19
20000 100 3.42 0.70 0.55
50000 10 2.13 0.37 0.24
100000 1 0.85 0.13 0.08
500000 1 21 2.9 0.94
1000000 1 85 11 2.8
2000000 1 339 44 8.4

str(n) in the first patch uses a quadratic algorithm, but asymptotically
it is almost 8 times faster than the unpatched version;
in the second patch the subquadratic algorithm starts showing
after 10000 digits.

mdickinson · 2010-04-13T09:27:29Z

Unassigning myself for now. The most recent patch still needs work before it can be considered for inclusion.

ericvsmith · 2013-05-31T15:05:07Z

See also bpo-18107, in particular http://mail.python.org/pipermail/pypy-dev/2013-May/011433.html.

BreamoreBoy · 2014-06-22T16:12:51Z

As bpo-18107 has been closed as a duplicate of this, should this be moved to open from languishing?

cfbolz · 2021-04-13T08:55:53Z

FWIW, we have implemented a faster algorithm for divmod for big numbers using Mark's fast_div.py in PyPy. In particular, this speeds up str(long) for large numbers significantly (eg calling str on the result of math.factorial(2**17) is now 15x faster than CPython ;-)).

mdickinson · 2021-04-13T13:55:02Z

we have implemented a faster algorithm for divmod for big numbers using Mark's fast_div.py in PyPy

Urk! I hope your implementation included a healthy dose of validation, testing and cleanup. :-) (I have no doubts that it did.)

I'd consider fast_div.py to be proof-of-concept code rather than something suitable for immediate use.

mdickinson · 2021-04-13T14:57:34Z

FWIW, I think we should close this; the same comments as in bpo-26256 apply. See msg259408 onwards in that discussion.

cfbolz · 2021-04-13T15:29:44Z

yes, that sounds fair. In PyPy we improve things occasionally if somebody feels like working on it, but in general competing against GMP is a fools errand.

vstinner · 2021-04-13T15:47:41Z

Python has a reasonable efficiency up to 1000 decimal digits (according to benchmark) which is already really great! Operations with more than 1000 decimal digits is an uncommon. IMO you should use a dedicated library like GMP for that.

I would prefer to keep CPython code simple enough to maintain. Objects/longobject.c is already 5744 lines of C code. longformat_BZ.diff adds around 700 lines of code, but it only makes str(int) starting at 2000 decimal digits.

Yeah, we could do better for integers with <many digits>, but I would prefer not to :-) Python has a limited number of volunteers working on it, and it's not specialized in numbers. We should keep the maintenance burden at an acceptable level ;-)

tim-one · 2022-01-30T01:41:11Z

Ha! This will never die. More discussion in bpo-46558.

Ya, I already closed it, but don't want to. I opened it to begin with to record an int->str method that doesn't use division, so it didn't really belong on this report.

What if we _didn't_ convert these things to C? That's the primary question the new report gets around to asking. These things are far easier to write, understand, and maintain in Python, and converting to C is generally only improving a constant factor, or _maybe_ removing a log(n) factor. When we're looking at massive O() improvements, squeezing those out is of comparatively little value.

fredrikj mannequin added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Jul 27, 2008

mdickinson self-assigned this Aug 4, 2008

mdickinson added the performance Performance or resource usage label Aug 4, 2008

mdickinson added the stale Stale PR or inactive for long period of time. label Apr 13, 2010

mdickinson removed their assignment Apr 13, 2010

vstinner closed this as completed Apr 13, 2021

vstinner removed the stale Stale PR or inactive for long period of time. label Apr 13, 2021

ezio-melotti transferred this issue from another repository Apr 10, 2022

tim-one mentioned this issue Sep 5, 2022

Quadratic time internal base conversions #90716

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Asymptotically faster divmod and str(long) #47701

Asymptotically faster divmod and str(long) #47701

fredrikj mannequin commented Jul 27, 2008

fredrikj mannequin commented Jul 27, 2008

mdickinson commented Jul 27, 2008

fredrikj mannequin commented Jul 27, 2008

fredrikj mannequin commented Jul 27, 2008

fredrikj mannequin commented Jul 27, 2008

mdickinson commented Aug 4, 2008

mdickinson commented Aug 4, 2008

mdickinson commented Aug 4, 2008

mdickinson commented Aug 4, 2008

fredrikj mannequin commented Aug 6, 2008

mdickinson commented Aug 11, 2008

pernici mannequin commented Sep 8, 2008

mdickinson commented Sep 24, 2008

pernici mannequin commented Oct 13, 2008

mdickinson commented Mar 24, 2009

pernici mannequin commented Mar 30, 2009

pernici mannequin commented Mar 30, 2009

mdickinson commented Apr 13, 2010

ericvsmith commented May 31, 2013

BreamoreBoy mannequin commented Jun 22, 2014

cfbolz mannequin commented Apr 13, 2021

mdickinson commented Apr 13, 2021

mdickinson commented Apr 13, 2021

cfbolz mannequin commented Apr 13, 2021

vstinner commented Apr 13, 2021

tim-one commented Jan 30, 2022

Asymptotically faster divmod and str(long) #47701

Asymptotically faster divmod and str(long) #47701

Comments

fredrikj mannequin commented Jul 27, 2008

fredrikj mannequin commented Jul 27, 2008

mdickinson commented Jul 27, 2008

fredrikj mannequin commented Jul 27, 2008

fredrikj mannequin commented Jul 27, 2008

fredrikj mannequin commented Jul 27, 2008

mdickinson commented Aug 4, 2008

mdickinson commented Aug 4, 2008

mdickinson commented Aug 4, 2008

mdickinson commented Aug 4, 2008

fredrikj mannequin commented Aug 6, 2008

mdickinson commented Aug 11, 2008

pernici mannequin commented Sep 8, 2008

mdickinson commented Sep 24, 2008

pernici mannequin commented Oct 13, 2008

mdickinson commented Mar 24, 2009

pernici mannequin commented Mar 30, 2009

pernici mannequin commented Mar 30, 2009

mdickinson commented Apr 13, 2010

ericvsmith commented May 31, 2013

BreamoreBoy mannequin commented Jun 22, 2014

cfbolz mannequin commented Apr 13, 2021

mdickinson commented Apr 13, 2021

mdickinson commented Apr 13, 2021

cfbolz mannequin commented Apr 13, 2021

vstinner commented Apr 13, 2021

tim-one commented Jan 30, 2022