Issue 560379: Karatsuba multiplication

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/36647

classification

Title:	Karatsuba multiplication
Type:		Stage:
Components:	Interpreter Core	Versions:	Python 2.3

process

Status:	closed	Resolution:	accepted
Dependencies:		Superseder:
Assigned To:	tim.peters	Nosy List:	ccraig, gvanrossum, nnorwitz, tim.peters, tismer
Priority:	normal	Keywords:	patch

Created on 2002-05-25 01:07 by ccraig, last changed 2022-04-10 16:05 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
k_mul.patch	ccraig, 2002-05-25 01:07	patch
output.jpg	ccraig, 2002-05-25 16:16	Timing graph
k_mul2.patch	ccraig, 2002-05-25 23:41	patch, take 2 (splits on larger number)
k_mul3.patch	ccraig, 2002-07-09 22:43	patch, take 3 (cleaned up, better comments)

Messages (11)
msg40123 - (view)	Author: Christopher A. Craig (ccraig)	Date: 2002-05-25 01:07
Adds Karatsuba multiplication to Python. Patches longobject.c to use Karatsuba multiplication in place of gradeschool math.
msg40124 - (view)	Author: Christian Tismer (old) (tismer)	Date: 2002-05-25 03:23
Logged In: YES user_id=105700 Hmm, not bad. Q: You set the split fence at 40. Where does this number come from? I think this could be optimzed per compiler/platform. You say that you split based on the smaller number. Why this? My intuitive guess would certainly be to always split on the larger number. I just checked my Python implementation which does this. Open question: how to handle very small by very long the best way? Probably the highschool version is better here, and that might have led you to investigate the smaller one. I'd say bosh should be checked. good work! - cheers chris
msg40125 - (view)	Author: Christopher A. Craig (ccraig)	Date: 2002-05-25 05:53
Logged In: YES user_id=135050 I got 40 from testing. Basically I generated 250 random numbers each for a series of sizes between 5 and 2990 bits long at 15 bit intervals (i.e. the word size), and stored it in a dictionary. Then timed 249 multiplies at each size for a bunch of fence values and used gdchart to make a pretty graph. It cerntainly could be optimized better per compiler/platform, but I don't know how much gain you'ld see. I split on the smaller number because I guessed it would be better. My thought was that if I split on the smaller number I'm guaranteed to reach the fence, at which point I can use the gradeschool method at a near linear cost (since it's O(n*m) and one of those two is at most the fence size). If I split on the larger number, I may run into a condition where the smaller number is less than half the larger, but I haven't reached the fence yet, and then gradeschool could be much more expensive.
msg40126 - (view)	Author: Christopher A. Craig (ccraig)	Date: 2002-05-25 16:16
Logged In: YES user_id=135050 I just uploaded a graph with some sample timings in it. Red is a fence of 20. Green is a fence of 40. Blue is a fence of 60. Black is done with unmodified Python 2.2.1.
msg40127 - (view)	Author: Christopher A. Craig (ccraig)	Date: 2002-05-25 23:41
Logged In: YES user_id=135050 I made the needed changes to make to split on the bigger number (basically chaged to split on bigger number, and changed all of the places that need to check to see if there are no bits left), and the new one is a little bit faster, so I'm uploading it too. I had been thinking about fixed precision numbers when I wrote it, so I honestly didn't consider the fact that I could just shift the smaller number to 0 and throw it away... :-)
msg40128 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2002-06-05 21:38
Logged In: YES user_id=6380 Tim thinks this is cool, but the code can use cleanup and comments. Also, let's not add platform specific hacks (Christian can sell those as an add-on :-).
msg40129 - (view)	Author: Christopher A. Craig (ccraig)	Date: 2002-07-09 22:43
Logged In: YES user_id=135050 I've brought the code into compliance with the coding standards in the PEP7, and added some comments that I thought were in line with the rest of the file. If there is something else you would like me to do, please tell me.
msg40130 - (view)	Author: Tim Peters (tim.peters) *	Date: 2002-08-12 02:40
Logged In: YES user_id=31435 Thanks! I checked in some code building on this. Changes included: + Adjusted whitespace to meet the standard (spaces after "if" and "for", flanking binary operators, etc). + The refcount fiddling in x_mul caused assorted system crashes if KeyboardInterrupt was raised during a multiply. Repaired that. + More comments and asserts. + Removed k_join and built "the answer" piecemeal into the result object in k_mul. This allows to free more chunks of memory sooner, reducing highwater mark and the probable size of the working set. Since the cache behavior is quite different now, it would be cool if you could run your tuning tests again. The cutoff value is now a #define, KARATSUBA_CUTOFF near the top of longobject.c. Until I can make time for more thorough testing, k_mul isn't called by default: multiplication invokes k_mul if and only if an environment variable named KARAT exists (its value is irrelevant; just its existence matters).
msg40131 - (view)	Author: Neal Norwitz (nnorwitz) *	Date: 2002-08-12 03:36
Logged In: YES user_id=33168 Tim, did you want to leave this open?
msg40132 - (view)	Author: Tim Peters (tim.peters) *	Date: 2002-08-12 04:19
Logged In: YES user_id=31435 Yes, until the new algorithm is enabled w/o the envar trickery.
msg40133 - (view)	Author: Tim Peters (tim.peters) *	Date: 2002-08-12 22:13
Logged In: YES user_id=31435 Closing this, as I'm happy with the code now. Added a new "lopsided" routine to remove the penalty (relative to 2.2.1) when inputs are of vastly different sizes (that was a degenerate case for k_mul -- it didn't save any work then, but did entail a lot more overheads).

History
Date	User	Action	Args
2022-04-10 16:05:21	admin	set	github: 36647
2002-05-25 01:07:03	ccraig	create