Title: Faster code for trial quotient in x_divrem
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.11
Status: closed Resolution: fixed
Assigned To: tim.peters Nosy List: gregory.p.smith, mark.dickinson, tim.peters
Created on 2022-01-24 18:46 by tim.peters, last changed 2022-04-11 14:59 by admin.

PR 30856 merged tim.peters, 2022-01-24 18:55
msg411505 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2022-01-24 18:46
x_divrem1() was recently (bpo-46406) changed to generate faster code for division, essentially nudging optimizing compilers into recognizing that modern processors compute the quotient and remainder with a single machine instruction.

The same can be done for x_divrem(), although it's less valuable there because the HW division generally accounts for a much smaller percent of its total runtime.

Still, it does cut a multiply and subtract out of the loop, and makes the code more obvious (since it brings x_divrem1() and x_divrem() back into synch).
msg411542 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2022-01-25 01:06
New changeset 7c26472d09548905d8c158b26b6a2b12de6cdc32 by Tim Peters in branch 'main':
bpo-46504: faster code for trial quotient in x_divrem() (GH-30856)
