Message 384975 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	jneb
Recipients	jneb, mark.dickinson, tim.peters
Date	2021-01-12.18:56:30
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1610477791.0.0.213498568866.issue42911@roundup.psfhosted.org>
In-reply-to

Content
Some more information for the interested: The algorithm I made tries to smoothly change the"chunk size" with growing length of the exponent. So the exponents that win the most (up to 14% fewer multiplication) are the long exponents that are just shorter than the FIVEARY_CUTOFF. But, I worked hard to make an algorithm that also saves multiplications for shorter numbers. Numbers of up to about 20 bits will be using the optimal chunk size. And, of course, the decision must be made quickly because for some frequently occurring parameters (e.g., 3**25), the routine doesn't take that long anyway. This is obtained by checking two properties of the exponent that strongly influence the addition chain: the higher four bits, and (surprise!) the number of pairs of bits with distance 2: in other words, (n&n>>2).bit_count(). After days of trying out all kinds of heuristics, and days of crunching,I measured the optimal parameters. I added the code I used to do that. Guido may remember that I wrote a chapter in my Ph.D. on the subject of addition chains. The interesting thing is that I then used Python for that too: that was almost 30 years ago! When implementing, I discovered that lots of the code around it had been in flux, so I didn't manage to "cherry pick" it into 3.10 yet. (One example: the bit_length and bit_count routines were renamed and moved around). And, I don't have windows 10:-( But anyway, have a look and let me hear what you think of it. I'll also want to test and measure it a bit more, but I am sure it is quite stable.

Some more information for the interested:
The algorithm I made tries to smoothly change the"chunk size" with growing length of the exponent. So the exponents that win the most (up to 14% fewer multiplication) are the long exponents that are just shorter than the FIVEARY_CUTOFF.
But, I worked hard to make an algorithm that also saves multiplications for shorter numbers. Numbers of up to about 20 bits will be using the optimal chunk size.
And, of course, the decision must be made quickly because for some frequently occurring parameters (e.g., 3**25), the routine doesn't take that long anyway.
This is obtained by checking two properties of the exponent that strongly influence the addition chain: the higher four bits, and (surprise!) the number of pairs of bits with distance 2: in other words, (n&n>>2).bit_count().
After days of trying out all kinds of heuristics, and days of crunching,I measured the optimal parameters. I added the code I used to do that.
Guido may remember that I wrote a chapter in my Ph.D. on the subject of addition chains. The interesting thing is that I then used Python for that too: that was almost 30 years ago!
When implementing, I discovered that lots of the code around it had been in flux, so I didn't manage to "cherry pick" it into 3.10 yet. (One example: the bit_length and bit_count routines were renamed and moved around). And, I don't have windows 10:-(
But anyway, have a look and let me hear what you think of it. I'll also want to test and measure it a bit more, but I am sure it is quite stable.

History
Date	User	Action	Args
2021-01-12 18:56:31	jneb	set	recipients: + jneb, tim.peters, mark.dickinson
2021-01-12 18:56:30	jneb	set	messageid: <1610477791.0.0.213498568866.issue42911@roundup.psfhosted.org>
2021-01-12 18:56:30	jneb	link	issue42911 messages
2021-01-12 18:56:30	jneb	create