classification
Title: Karatsuba multiplication
Type: Stage:
Components: Interpreter Core Versions: Python 2.3
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: tim.peters Nosy List: ccraig, gvanrossum, nnorwitz, tim.peters, tismer
Priority: normal Keywords: patch

Created on 2002-05-25 01:07 by ccraig, last changed 2002-08-12 22:13 by tim.peters. This issue is now closed.

Files
File name Uploaded Description Edit
k_mul.patch ccraig, 2002-05-25 01:07 patch
output.jpg ccraig, 2002-05-25 16:16 Timing graph
k_mul2.patch ccraig, 2002-05-25 23:41 patch, take 2 (splits on larger number)
k_mul3.patch ccraig, 2002-07-09 22:43 patch, take 3 (cleaned up, better comments)
Messages (11)
msg40123 - (view) Author: Christopher A. Craig (ccraig) Date: 2002-05-25 01:07
Adds Karatsuba multiplication to Python.  
Patches longobject.c to use Karatsuba multiplication in
place
of gradeschool math.

msg40124 - (view) Author: Christian Tismer (old) (tismer) Date: 2002-05-25 03:23
Logged In: YES 
user_id=105700

Hmm, not bad.

Q: You set the split fence at 40. Where does this number
come from? I think this could be optimzed per compiler/platform.

You say that you split based on the smaller number.
Why this? My intuitive guess would certainly be to always split
on the larger number. I just checked my Python implementation
which does this.
Open question: how to handle very small by very long the
best way? Probably the highschool version is better here,
and that might have led you to investigate the smaller one.
I'd say bosh should be checked.

good work! - cheers chris
msg40125 - (view) Author: Christopher A. Craig (ccraig) Date: 2002-05-25 05:53
Logged In: YES 
user_id=135050

I got 40 from testing.  Basically I generated 250 random
numbers each for a series of sizes between 5 and 2990 bits
long at 15 bit intervals (i.e. the word size), and stored it
in a dictionary.  Then timed 249 multiplies at each size for
a bunch of fence values and used gdchart to make a pretty
graph.   It cerntainly could be optimized better per
compiler/platform, but I don't know how much gain you'ld see.

I split on the smaller number because I guessed it would be
better.  My thought was that if I split on the smaller
number I'm guaranteed to reach the fence, at which point I
can use the gradeschool method at a near linear cost (since
it's O(n*m) and one of those two is at most the fence size).
 If I split on the larger number, I may run into a condition
where the smaller number is less than half the larger, but I
haven't reached the fence yet, and then gradeschool could be
much more expensive.
msg40126 - (view) Author: Christopher A. Craig (ccraig) Date: 2002-05-25 16:16
Logged In: YES 
user_id=135050

I just uploaded a graph with some sample timings in it.  
Red is a fence of 20. Green is a fence of 40. Blue is a
fence of 60. Black is done with unmodified Python 2.2.1.  

msg40127 - (view) Author: Christopher A. Craig (ccraig) Date: 2002-05-25 23:41
Logged In: YES 
user_id=135050

I made the needed changes to make to split on the bigger
number (basically chaged to split on bigger number, and
changed all of the places that need to check to see if there
are no bits left), and the new one is a little bit faster,
so I'm uploading it too.

I had been thinking about fixed precision numbers when I
wrote it, so I honestly didn't consider the fact that I
could just shift the smaller number to 0 and throw it
away... :-)
msg40128 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2002-06-05 21:38
Logged In: YES 
user_id=6380

Tim thinks this is cool, but the code can use cleanup and 
comments.

Also, let's not add platform specific hacks (Christian can sell 
those as an add-on :-).
msg40129 - (view) Author: Christopher A. Craig (ccraig) Date: 2002-07-09 22:43
Logged In: YES 
user_id=135050

I've brought the code into compliance with the coding
standards in the PEP7, and added some comments that I
thought were in line with the rest of the file.  If there is
something else you would like me to do, please tell me. 
msg40130 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2002-08-12 02:40
Logged In: YES 
user_id=31435

Thanks!  I checked in some code building on this.  Changes 
included:

+ Adjusted whitespace to meet the standard (spaces 
after "if" and "for", flanking binary operators, etc).

+ The refcount fiddling in x_mul caused assorted system 
crashes if KeyboardInterrupt was raised during a multiply.  
Repaired that.

+ More comments and asserts.

+ Removed k_join and built "the answer" piecemeal into the 
result object in k_mul.  This allows to free more chunks of 
memory sooner, reducing highwater mark and the probable 
size of the working set.

Since the cache behavior is quite different now, it would be 
cool if you could run your tuning tests again.  The cutoff 
value is now a #define, KARATSUBA_CUTOFF near the top 
of longobject.c.

Until I can make time for more thorough testing, k_mul isn't 
called by default:  multiplication invokes k_mul if and only if 
an environment variable named KARAT exists (its value is 
irrelevant; just its existence matters).
msg40131 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2002-08-12 03:36
Logged In: YES 
user_id=33168

Tim, did you want to leave this open?
msg40132 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2002-08-12 04:19
Logged In: YES 
user_id=31435

Yes, until the new algorithm is enabled w/o the envar 
trickery.
msg40133 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2002-08-12 22:13
Logged In: YES 
user_id=31435

Closing this, as I'm happy with the code now.  Added a 
new "lopsided" routine to remove the penalty (relative to 
2.2.1) when inputs are of vastly different sizes (that was a 
degenerate case for k_mul -- it didn't save any work then, 
but did entail a lot more overheads).
History
Date User Action Args
2002-05-25 01:07:03ccraigcreate