Author gregsmith
Recipients
Date 2005-02-11.03:45:59
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
Logged In: YES 
user_id=292741

I started by just factoring out the inner switch loop. But then
it becomes evident that when op = '^', you always have
maska == maskb, so there's no point in doing the ^mask at all.
And when op == '|', then maska==maskb==0. So likewise.
And if you put a check in so that len(a) >= len(b), then the
calculation of len_z can be simplified. It also becomes easy
to break the end off the loops, so that, say, or'ing a small
number with a really long becomes mostly a copy. etc.
It's was just a series of small simple changes following
from the refactoring of the loop/switch. 

I see a repeatable 1.5 x speedup at 300 bits, which
I think is significant (I wasn't using negative #s, which
of course have their own extra overhead). The difference
should be even higher on CPUs that don't have several
100 mW of branch-prediction circuitry.

One use case is that you can simulate an array
of hundreds or thousands of simple 1-bit processors
in pure python using long operations, and get very
good performance, even better with this fix. This app
involves all logical ops, with the occasional shift.


IMHO, I don't think the changed code is more complex; it's a
little longer, but it's more explicit in what is really
being done, and it doesn't roll together 3 cases, which
don't really have that much in common, for the sake of
brevity.  It wasn't obvious to
me about the masks being redundant until after I did the
factoring, and this is my point - rolling it together hides
that.
The original author may not have noticed the redundancy.

 I see a lot of effort being expended on very complex
multiply operations, why should the logical ops be left
behind for
the sake of a few lines?







History
Date User Action Args
2007-08-23 15:40:55adminlinkissue1087418 messages
2007-08-23 15:40:55admincreate