Message 45791 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	trevp
Recipients
Date	2004-04-17.08:16:31
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
For crypto-sized numbers, Python mod-exp is several times slower than GMP or OpenSSL (6x or more). Those libraries do crazy special-case stuff, + assembly, platform-specific tuning, and so on. However, there's some low-hanging fruit: this patch has a few basic optimizations giving a 2-3x speedup for numbers in the 1000-8000 bit range (that's what I've mostly tested; but the patch should improve, or at least not hurt, everything else): - x_mul() is special-cased for squaring, which is almost twice as fast as general multiplication. - x_mul() uses pointers instead of indices for iteration, giving ~10% speedup (under VC6). - the right-to-left square-and-multiply exponentiation algorithm is replaced with a left-to-right square-and-multiply, which takes advantage of small bases. - when the exponent is above a certain size, "k-ary" exponentiation is used to reduce the number of multiplications via precalculation. - when the modulus is odd, Montgomery reduction is used. - the Karatsuba cutoff seems too low. For multiplicands in the range of 500-5000 bits, Karatsuba slows multiplication by around ~25% (VC6sp4, Intel P4M 1.7 Ghz). For larger numbers, the benefits of Karatsuba are less than they could be. Currently, the cutoff is 35 digits (525 bits). I've tried 70, 140, 280, and 560. 70, 140, and 280 are roughly the same: they don't slow down small values, and they have good speedup on large ones. 560 is not quite as good for large values, but at least it doesn't hurt small ones. I know this is platform-dependent, but I think we should err on the side of making the cutoff too high and losing some optimization, instead of putting it too low and slowing things down. I suggest 70. A couple misc. things: - Negative exponents with a modulus no longer give an error, when the base is coprime with the modulus. Instead, it calculates the multiplicative inverse of the base with respect to the modulus, using the extended euclidean algorithm, and exponentiates that. Libraries like GMP and LibTomMath work the same way. Being able to take inverses mod a number is useful for cryptography (e.g. RSA, DSA, and Elgamal). - The diff includes patch 923643, which supports converting longs to byte-strings. Ignore the last few diff entries, if you don't want this. - I haven't looked into harmonizing with int_pow(). Something may have to be done.

For crypto-sized numbers, Python mod-exp is several
times slower than GMP or OpenSSL (6x or more).  Those
libraries do crazy special-case stuff, + assembly,
platform-specific tuning, and so on.

However, there's some low-hanging fruit: this patch has
a few basic optimizations giving a 2-3x speedup for
numbers in the 1000-8000 bit range (that's what I've
mostly tested; but the patch should improve, or at
least not hurt, everything else):

 - x_mul() is special-cased for squaring, which is
almost twice as fast as general multiplication.
 
 - x_mul() uses pointers instead of indices for
iteration, giving ~10% speedup (under VC6).
 
 - the right-to-left square-and-multiply exponentiation
algorithm is replaced with a left-to-right
square-and-multiply, which takes advantage of small bases.
 
 - when the exponent is above a certain size, "k-ary"
exponentiation is used to reduce the number of
multiplications via precalculation.
 
 - when the modulus is odd, Montgomery reduction is used.

 - the Karatsuba cutoff seems too low.  For
multiplicands in the range of 500-5000 bits, Karatsuba
slows multiplication by around ~25% (VC6sp4, Intel P4M
1.7 Ghz).  For larger numbers, the benefits of
Karatsuba are less than they could be.
 
 Currently, the cutoff is 35 digits (525 bits).  I've
tried 70, 140, 280, and 560.  70, 140, and 280 are
roughly the same: they don't slow down small values,
and they have good speedup on large ones.  560 is not
quite as good for large values, but at least it doesn't
hurt small ones.
 
I know this is platform-dependent, but I think we
should err on the side of making the cutoff too high
and losing some optimization, instead of putting it too
low and slowing things down.  I suggest 70.
 

A couple misc. things:

 - Negative exponents with a modulus no longer give an
error, when the base is coprime with the modulus. 
Instead, it calculates the multiplicative inverse of
the base with respect to the modulus, using the
extended euclidean algorithm, and exponentiates that.
 
 Libraries like GMP and LibTomMath work the same way. 
Being able to take inverses mod a number is useful for
cryptography (e.g. RSA, DSA, and Elgamal).
 
 - The diff includes patch 923643, which supports
converting longs to byte-strings.  Ignore the last few
diff entries, if you don't want this.
 
 - I haven't looked into harmonizing with int_pow(). 
Something may have to be done.

History
Date	User	Action	Args
2007-08-23 15:37:16	admin	link	issue936813 messages
2007-08-23 15:37:16	admin	create