This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: alternate fast builtin sum, rev'd
Type: performance Stage:
Components: Interpreter Core Versions: Python 2.6
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: MrJean1, rhettinger
Priority: normal Keywords: patch

Created on 2008-05-08 19:53 by MrJean1, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
bltinmodule3.c.diff MrJean1, 2008-05-08 19:53 forward diff against bltinmodule,c rev 62728 from the trunk
bltinmodule4.c.diff MrJean1, 2008-05-08 21:33 forward diff against bltinmodule,c rev 62728 from the trunk
Messages (3)
msg66434 - (view) Author: Jean Brouwers (MrJean1) Date: 2008-05-08 19:53
Here is one more, final attempt to improve fast summation, somewhat.

This version inspects the type of both the sum and the first item and 
adds all ints and floats without any PyNumber_Add() call and in order.

Also, the results for this test case are compatible:

>>> d = [1000000000000000, -1000000000000000.0, .0000000000000001, 
.0000000000000001]
>>> sum(d)
2e-16
>>> d[0] + sum(d[1:])
0.0

/Jean Brouwers
msg66449 - (view) Author: Jean Brouwers (MrJean1) Date: 2008-05-08 21:33
Attached is a slightly better version, making the float loop cleaner.  Use 
this one and disregard the earlier one, bltinmodule3.c.diff.

/Jean Brouwers
msg66451 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-05-09 04:41
Sorry, I really do not want to change the existing code.  It's clear, 
does exactly what it's supposed to do, is easily checked for 
correctness, been seen and exercised in the alphas for a good while.  
It gives a good tenfold speed-up (YMMV depending on machine and 
compiler).  Also, it is setup in a way that will make it easy to add 
special handling of LongLongs.  

I spend several days on the existing code to make sure that the 
conversions and accumulations exactly matches what sum() was already 
doing behind the scenes.  Then, a good deal of time was spent making 
before/after timings of various cases and comparing the result to psyco 
generated code. I'm reluctant to throw-away those efforts and then 
invest the same time with another patch that cannot produce any 
signficant speedups.

The submitted code is longer and is harder for me to verify that it is 
correct -- I've spent hours with it already and am not confident about 
the code.  ISTM, that the code is twisting itself in knots just to 
avoid a single call to PyNumber_Add in an uncommon case.  I find the 
coding style uncomfortable and do not readily see how to extend it to 
the LongLong case.

The existing code follows are series of required steps to assume a type 
and continuously verify that assumption.  Those steps which comprise 
the bulk of the work are mandatory.  So, all you can do with alternate 
patches is rearrange the loops and in-inlining for a microscopic speed-
up at best.  There is no new approach here that is worth the time spent 
reviewing and re-reviewing patches.  Unless you can get meaningful 
speed-ups in the common cases, please stop re-arranging this code.

Also, the submission should have been accompanied by a full set of 
before/after timings across a variety of use cases including short 
lists, summing all ints, summing all floats, summing a random mix of 
ints and floats, all longs, etc.  And, it would have been nice to have 
a conceptual statement of what the code purports to do differently -- 
what you think makes it better.

Sorry.  I know you're having fun with this.  But micro-tweaks at this 
point are a total waste of time.  It takes too much effort to verify 
correctness, run all the timings, and create code that is 
maintainable/extendable. Try to be content with the tenfold speedup 
we've already gotten.

A better use of time would be to scan the code base for other places 
that would benefit from the pattern of assuming a type, verifying the 
assumption, running type specific code, and falling back if the 
assumption fails.  Focus the effort of common use cases (like for sum() 
where the effort focused on all ints or all floats and a mixed-case.
History
Date User Action Args
2022-04-11 14:56:34adminsetgithub: 47041
2008-05-09 04:42:00rhettingersetstatus: open -> closed
resolution: rejected
messages: + msg66451
2008-05-09 03:54:49rhettingersetassignee: rhettinger
nosy: + rhettinger
2008-05-08 21:33:32MrJean1setfiles: + bltinmodule4.c.diff
messages: + msg66449
2008-05-08 19:53:19MrJean1create