Message 47380 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	gregsmith
Recipients
Date	2005-02-11.03:45:59
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
Logged In: YES user_id=292741 I started by just factoring out the inner switch loop. But then it becomes evident that when op = '^', you always have maska == maskb, so there's no point in doing the ^mask at all. And when op == '\|', then maska==maskb==0. So likewise. And if you put a check in so that len(a) >= len(b), then the calculation of len_z can be simplified. It also becomes easy to break the end off the loops, so that, say, or'ing a small number with a really long becomes mostly a copy. etc. It's was just a series of small simple changes following from the refactoring of the loop/switch. I see a repeatable 1.5 x speedup at 300 bits, which I think is significant (I wasn't using negative #s, which of course have their own extra overhead). The difference should be even higher on CPUs that don't have several 100 mW of branch-prediction circuitry. One use case is that you can simulate an array of hundreds or thousands of simple 1-bit processors in pure python using long operations, and get very good performance, even better with this fix. This app involves all logical ops, with the occasional shift. IMHO, I don't think the changed code is more complex; it's a little longer, but it's more explicit in what is really being done, and it doesn't roll together 3 cases, which don't really have that much in common, for the sake of brevity. It wasn't obvious to me about the masks being redundant until after I did the factoring, and this is my point - rolling it together hides that. The original author may not have noticed the redundancy. I see a lot of effort being expended on very complex multiply operations, why should the logical ops be left behind for the sake of a few lines?

Logged In: YES 
user_id=292741

I started by just factoring out the inner switch loop. But then
it becomes evident that when op = '^', you always have
maska == maskb, so there's no point in doing the ^mask at all.
And when op == '|', then maska==maskb==0. So likewise.
And if you put a check in so that len(a) >= len(b), then the
calculation of len_z can be simplified. It also becomes easy
to break the end off the loops, so that, say, or'ing a small
number with a really long becomes mostly a copy. etc.
It's was just a series of small simple changes following
from the refactoring of the loop/switch. 

I see a repeatable 1.5 x speedup at 300 bits, which
I think is significant (I wasn't using negative #s, which
of course have their own extra overhead). The difference
should be even higher on CPUs that don't have several
100 mW of branch-prediction circuitry.

One use case is that you can simulate an array
of hundreds or thousands of simple 1-bit processors
in pure python using long operations, and get very
good performance, even better with this fix. This app
involves all logical ops, with the occasional shift.


IMHO, I don't think the changed code is more complex; it's a
little longer, but it's more explicit in what is really
being done, and it doesn't roll together 3 cases, which
don't really have that much in common, for the sake of
brevity.  It wasn't obvious to
me about the masks being redundant until after I did the
factoring, and this is my point - rolling it together hides
that.
The original author may not have noticed the redundancy.

 I see a lot of effort being expended on very complex
multiply operations, why should the logical ops be left
behind for
the sake of a few lines?

History
Date	User	Action	Args
2007-08-23 15:40:55	admin	link	issue1087418 messages
2007-08-23 15:40:55	admin	create