classification
Title: PyLong_FromString optimization
Type: Stage:
Components: Interpreter Core Versions: Python 2.5
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: Nosy List: alanmcintyre, georg.brandl, tim.peters
Priority: normal Keywords: patch

Created on 2006-03-04 06:21 by alanmcintyre, last changed 2006-05-24 21:45 by tim.peters. This issue is now closed.

Files
File name Uploaded Description Edit
longopt.zip alanmcintyre, 2006-03-04 06:21 Patch, performance test code & results
longopt.tgz alanmcintyre, 2006-03-06 22:33 Patch, perf test code & results (version #2)
longopt3.tar.bz2 alanmcintyre, 2006-03-07 03:05 Patch, performance test code & results (version 3)
Messages (7)
msg49637 - (view) Author: Alan McIntyre (alanmcintyre) (Python committer) Date: 2006-03-04 06:21
The current implementation of PyLong_FromString in
Python 2.5 uses muladd1 to add each digit of the input
string into the final number.  Because muladd1 creates
a new long to hold the result on every call, an
intermediate long object is created/destroyed for each
digit in the input string.  

This patch improves on the current implementation of
PyLong_FromString in 3 main ways:

1. Creates and manipulates (in-place) a single long
object to hold the result, skipping the creation of all
those intermediate long objects.

2. Multiple digits from the input string are
consolidated into a single long digit before adding
them into the long integer object.  This greatly
reduces the number of "multiply/add" cycles required to
push all the digits into the long object.

3. Three chunks of code like "if (ch <= '9') k = ch -
'0'" in longobject.c are replaced by a digit value
lookup vector.  I'm not irreversibly stuck on this
idea; it doesn't measurably add to performance, but it
just seems (to me, anyway) to make the code in
long_from_binary_base and PyLong_FromString a little
less cluttered.  This is the same lookup table from
patch 1335972 (an optimization for int()).  I expect if
both patches get accepted it would be best to make them
both reference a single instance of this table; if it
looks like that's what will happen I'll tweak one or
both patches as necessary.


My cheezy test results (included in the attached file
in an OpenOffice spreadsheet) show that the patch makes
long() about 50% faster than the existing
implementation for decimal input strings of about 10
characters.   Longer input strings show even better
performance improvement, leveling off around 3x faster
for very long strings.

This patch passes regression tests on my machine
(WinXP, Visual C++ .net Standard 2003).  I plan to try
out the tests on my Linux box this weekend just to make
sure the performance boost still remains when Python
gets compiled by a C compiler that isn't neutered
(standard .net 2003 doesn't appear to allow any
optimizations).

The test and test data generation scripts I used for
this performance comparison are included in the
attached zip file. 

At the moment I don't have any added tests; if somebody
can suggest some things that ought to be tested I'll
gladly write some tests.
msg49638 - (view) Author: Alan McIntyre (alanmcintyre) (Python committer) Date: 2006-03-06 22:33
Logged In: YES 
user_id=1115903

Version #2 is attached.  I made a couple of tweaks and
tested the patch out on Linux just to make sure the
performace is still as good with compiler optimizations. 
For short numbers (numbers that would fit into an int),
long() is 10-30% *slower* than before applying the patch. 
For longer numbers, long() is up to 249% faster, with the
peak occurring around 1000 digits.

If the negative performance impact for int-sized digits is
unacceptable, I will see if I can do something about it. 
However, one always has the option of using int() on very
long strings anyway, and it will automatically fall through
to PyLong_FromString if the number is too long.  The
performance impact on int() for small numbers is so small as
to be negligible (<5%), which is to be expected since the
modified code isn't called when using int() on input strings
< 2**32. 
msg49639 - (view) Author: Alan McIntyre (alanmcintyre) (Python committer) Date: 2006-03-07 03:05
Logged In: YES 
user_id=1115903

Version #3 is attached; it has an across-the-board
improvement of ~10% over version 2.  The performance hit for
calling long() on 9-digit numbers is now only about -10%,
breakeven happens somewhere around 11 digits, and the best
performance is about +282% in the vicinity of 1000 digits.

Sorry to keep commenting on my own patch. :)  I think I'm
done now.
msg49640 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2006-05-16 07:40
Logged In: YES 
user_id=849994

Assigned to Tim. Perhaps something for Iceland?
msg49641 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2006-05-16 07:44
Logged In: YES 
user_id=31435

Thanks for reminding me, Georg!  This is a good possiblity
for the Iceland sprint.
msg49642 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2006-05-16 07:46
Logged In: YES 
user_id=849994

If someone ;) created a new tracker category, I'd go through
the patches and flag all I can find for the sprint.
msg49643 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2006-05-24 21:45
Logged In: YES 
user_id=31435

Thanks again, Alan!  I did some major fiddling for
portability and to try to eradicate the penalty for "short"
input strings.  It's still slower on 1-digit inputs, by
about 4-5%, but faster than before with at least 2-digit
inputs.  The peak speedup remains at around 800-1000 decimal
digits, but it's 6x faster there now.  Much of that came
from eliminating code :-)  For example, there was no actual
need for the memset(), and reducing the main loop to:

	for (; pz < pzstop; ++pz) {
		c += (twodigits)*pz * convmult;
		*pz = (digit)(c & MASK);
		c >>= SHIFT;
	}

was a huge win under VC 7.1 (in the patch, it has a branch
testing whether c is 0, and I didn't believe the comment
that said the branch made it faster ;-)).

Anyway, this is checked in now.  The table of digit values
is duplicated for the moment, and I hope to refactor that
soon (given that ints and longs have become increasingly
unified in Python, it's become increasingly confusing to
have two string->int routines -- while they have to remain
for backward compatibility, there's nothing to stop
rewriting the core to use a new unified conversion function).
History
Date User Action Args
2006-03-04 06:21:13alanmcintyrecreate