Message47380
Logged In: YES
user_id=292741
I started by just factoring out the inner switch loop. But then
it becomes evident that when op = '^', you always have
maska == maskb, so there's no point in doing the ^mask at all.
And when op == '|', then maska==maskb==0. So likewise.
And if you put a check in so that len(a) >= len(b), then the
calculation of len_z can be simplified. It also becomes easy
to break the end off the loops, so that, say, or'ing a small
number with a really long becomes mostly a copy. etc.
It's was just a series of small simple changes following
from the refactoring of the loop/switch.
I see a repeatable 1.5 x speedup at 300 bits, which
I think is significant (I wasn't using negative #s, which
of course have their own extra overhead). The difference
should be even higher on CPUs that don't have several
100 mW of branch-prediction circuitry.
One use case is that you can simulate an array
of hundreds or thousands of simple 1-bit processors
in pure python using long operations, and get very
good performance, even better with this fix. This app
involves all logical ops, with the occasional shift.
IMHO, I don't think the changed code is more complex; it's a
little longer, but it's more explicit in what is really
being done, and it doesn't roll together 3 cases, which
don't really have that much in common, for the sake of
brevity. It wasn't obvious to
me about the masks being redundant until after I did the
factoring, and this is my point - rolling it together hides
that.
The original author may not have noticed the redundancy.
I see a lot of effort being expended on very complex
multiply operations, why should the logical ops be left
behind for
the sake of a few lines?
|
|
Date |
User |
Action |
Args |
2007-08-23 15:40:55 | admin | link | issue1087418 messages |
2007-08-23 15:40:55 | admin | create | |
|