This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Minor algorithmic improvements for math.isqrt
Type: performance Stage: resolved
Components: Extension Modules Versions: Python 3.11
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: mark.dickinson
Priority: normal Keywords: patch

Created on 2022-01-04 17:30 by mark.dickinson, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 30333 merged mark.dickinson, 2022-01-04 17:30
Messages (2)
msg409693 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2022-01-04 17:30
There are a couple of minor algorithmic improvements possible for the math.isqrt fast path (which is used for nonnegative integers smaller than 2**64). On my machine those improvements produce a little over a 10% speedup.

The current algorithm for values under 2**64 involves exactly four division instructions, corresponding to four Newton steps. The proposal is to:

- 1. Replace the first division with a table lookup. The necessary table is extremely small: 12 entries at one byte per entry.
- 2. Arrange for the return type of the _approximate_sqrt helper function to be uint32_t rather than uint64_t. That means that the correction step only involves a 32-bit-by-32-bit multiplication, not a 64-bit-by-64-bit multiplication.

The second part is a bit subtle: the input to _approximate_sqrt is a 64-bit integer `n` in the range [2**62, 2**64). Barring any overflow, the output `u` is guaranteed to satisfy `(u-1)**2 < n < (u+1)**2`. That implies that `(u-1)**2 < 2**64`, from which it follows that `u <= 2**32`. So the only possible case where `u` might overflow a `uint32_t` is when `u == 2**32`. But from the earlier inequality, that can only happen if `n > (2**32 - 1)**2`, and in that case the top 31 bits of `n` are completely determined and since the first steps of the algorithm only depend on the topmost bits of `n`, it's easy to follow through the algorithm and see that it's not possible for `u` to be `2**32` in that case. (We always get `u = 128` from the lookup, followed by `u = 255` after the first division, then `u = 65536` after the second.)
msg410635 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2022-01-15 09:58
New changeset d02c5e9b55a8651b7d396ac3f2bdedf1fc1780b5 by Mark Dickinson in branch 'main':
bpo-46258: Streamline isqrt fast path (#30333)
https://github.com/python/cpython/commit/d02c5e9b55a8651b7d396ac3f2bdedf1fc1780b5
History
Date User Action Args
2022-04-11 14:59:54adminsetgithub: 90416
2022-01-15 09:58:40mark.dickinsonsetstatus: open -> closed
stage: patch review -> resolved
2022-01-15 09:58:09mark.dickinsonsetmessages: + msg410635
2022-01-04 17:30:23mark.dickinsonsetkeywords: + patch
stage: commit review -> patch review
pull_requests: + pull_request28612
2022-01-04 17:30:08mark.dickinsoncreate