Author tim.peters
Recipients mark.dickinson, serhiy.storchaka, steven.daprano, tim.peters
Date 2021-06-10.18:18:00
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
Under the released 3.9.5 for 64-bit Win10, raising to the power 2 is clearly much slower than multiplying directly:

C:\Windows\System32>py -3 -m timeit -s "x=151" "x*x"
10000000 loops, best of 5: 30 nsec per loop

C:\Windows\System32>py -3 -m timeit -s "x=151" "x**2"
1000000 loops, best of 5: 194 nsec per loop

Since the multiplication itself is cheap, overheads must account for it. Offhand, looks to me like the `x**2` spelling is actually doing 31 multiplications under the covers, although most of them are as cheap as Python-int multiplies get.

        for (i = Py_SIZE(b) - 1; i >= 0; --i) {
            digit bi = b->ob_digit[i];

            for (j = (digit)1 << (PyLong_SHIFT-1); j != 0; j >>= 1) {
                MULT(z, z, z);
                if (bi & j)
                    MULT(z, a, z);

Python ints on a 64-bit box are stored internally in base 2**30 (PyLong_SHIFT is 30). z starts life at 1. The first 28 trips through the loop are chewing up the 28 leading zero bits in exponent 2, so MULT(z, z, z) multiplies 1 by 1 to get 1, 28 times. Then again on the 29th iteration, but now "bi & j" is finally true (we finally found the leading one bit in exponent 2), so z is replaced by 1 times the base = the base. On the final, 30th, iteration, MULT(z, z, z) replaces z with its square, and we're done.

It would probably be worthwhile to add code special-casing the leading Python "digit" of the exponent, fast-forwarding without any multiplies to the leading one bit, and setting z directly to the base then.
Date User Action Args
2021-06-10 18:18:00tim.peterssetrecipients: + tim.peters, mark.dickinson, steven.daprano, serhiy.storchaka
2021-06-10 18:18:00tim.peterssetmessageid: <>
2021-06-10 18:18:00tim.peterslinkissue44376 messages
2021-06-10 18:18:00tim.peterscreate