This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Change long_pow() to sliding window algorithm
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.11
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: tim.peters Nosy List: tim.peters
Priority: normal Keywords: patch

Created on 2022-01-01 01:46 by tim.peters, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 30319 merged tim.peters, 2022-01-01 03:47
Messages (4)
msg409446 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2022-01-01 01:46
As discussed on python-dev, it would be nice to break long_pow()'s reliance on that the number of bits in a CPython long "digit" is a multiple of 5.

Moving to the sliding-window algorithm would do that (it has to be able to cross digit boundaries to fill the current window), and would be better on some other counts too: the precomputed table can be half the size (or we can add an extra bit to the window for the same table size), and it can - depending on exponent bit patterns - sometimes reduce the number of multiplies needed too.

So I intend to do that, and bump the window size from 5 to 6 (which would yield a significant, although modest, speed improvement for giant-exponent cases).
msg409480 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2022-01-01 22:33
Nope, boosting the window size to 6 doesn't appear to really help at all, at least not on my box. Regardless of the window size, we have to do a bigint square for every bit position in the exponent. I don't know of any way to speed that (short of speeding multiplication, and we already make a special case of squaring).

So that dominates. Even for an exponent of a million solid 1-bits, the overall timing difference appears insignificant.

So I cut the window size back to 5 bits again. That has the benefit of cutting the old table size in half, which also cuts the precomputation time to fill it about in half, which should have some minor benefit for saner (smaller) exponents.

As a sanity check, I also tried cutting the window size to 3 bits. That had an obvious bad timing impact on huge exponents of solid 1 bits.
msg409485 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2022-01-02 02:03
Since cutting the window size by 1 cut the size of the dynamic precomputed table in half, the overhead of the sliding window method was correspondingly reduced. It can pay off at exponents of half the bit length now, so that threshold was also changed.
msg409513 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2022-01-02 19:18
New changeset 863729e9c6f599286f98ec37c8716e982c4ca9dd by Tim Peters in branch 'main':
bpo-46218: Change long_pow() to sliding window algorithm (GH-30319)
https://github.com/python/cpython/commit/863729e9c6f599286f98ec37c8716e982c4ca9dd
History
Date User Action Args
2022-04-11 14:59:54adminsetgithub: 90376
2022-01-02 19:20:45tim.peterssetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2022-01-02 19:18:36tim.peterssetmessages: + msg409513
2022-01-02 02:03:32tim.peterssetmessages: + msg409485
2022-01-01 22:33:51tim.peterssetmessages: + msg409480
2022-01-01 03:47:32tim.peterssetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request28535
2022-01-01 01:46:39tim.peterscreate