Message 82533 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	mark.dickinson
Recipients	christian.heimes, collinwinter, gregory.p.smith, jyasskin, loewis, mark.dickinson, pernici, pitrou, schuppenies, vstinner
Date	2009-02-20.16:01:35
SpamBayes Score	3.5090592e-10
Marked as misclassified	No
Message-id	<1235145713.49.0.825099294592.issue4258@psf.upfronthosting.co.za>
In-reply-to

Content
Here's an updated patch that includes the x_divrem fixes and optimizations. I've also updated the Rietveld issue with these fixes. I think this is (modulo any requested changes) the version that I'd like to get into py3k. After that we can look at the possibility of optimizing the multiplication algorithm; the discussion for this should probably go back to issue 3944. Here's a detailed list of the changes to x_divrem. 0. Most importantly, in the inner loop, we make sure that the multiplication is digit x digit -> twodigits; previously it was digit x twodigits -> twodigits, which is likely to expand to several instructions on a 32-bit machine. I suspect that this is the main cause of the slowdown that Victor was seeing. 1. Make all variables unsigned. This eliminates the need for Py_ARITHMETIC_RIGHT_SHIFT, and also removes the only essential use of stwodigits in the entire long object implementation. 2. Save an iteration of the outer loop when possible by comparing top digits. 3. Remove double tests in the main inner loop and correction loop. 4. Quotient digit correction step follows Knuth more closely, and uses fewer operations. The extra exit condition (r >= PyLong_BASE) will be true more than 50% of the time, and should be cheaper than the main test. 5. In quotient digit estimation step, remove unnecessary special case when vj == w->ob_digits[w_size-1]. Knuth only needs this special case because his base is the same as the wordsize. 6. There's no need to fill the eliminated digits of v with zero; instead, set Py_SIZE(v) at the end of the outer loop. 7. More comments; some extra asserts. There are many other optimizations possible; I've tried only to include those that don't significantly complicate the code. An obvious remaining inefficiency is that on every iteration of the outer loop we divide by the top word of w; on many machines I suspect that it would be more efficient to precompute an inverse and multiply by that instead.

Here's an updated patch that includes the x_divrem fixes and
optimizations.  I've also updated the Rietveld issue with these
fixes.

I think this is (modulo any requested changes) the version
that I'd like to get into py3k.  After that we can look
at the possibility of optimizing the multiplication algorithm;
the discussion for this should probably go back to issue 3944.

Here's a detailed list of the changes to x_divrem.

0. Most importantly, in the inner loop, we make sure that the
   multiplication is digit x digit -> twodigits; previously it was
   digit x twodigits -> twodigits, which is likely to expand to
   several instructions on a 32-bit machine.  I suspect that
   this is the main cause of the slowdown that Victor was
   seeing.

1. Make all variables unsigned.  This eliminates the need for
   Py_ARITHMETIC_RIGHT_SHIFT, and also removes the only essential use
   of stwodigits in the entire long object implementation.

2. Save an iteration of the outer loop when possible by comparing top
   digits.

3. Remove double tests in the main inner loop and correction loop.

4. Quotient digit correction step follows Knuth more closely, and uses
   fewer operations.  The extra exit condition (r >= PyLong_BASE) will
   be true more than 50% of the time, and should be cheaper than the
   main test.

5. In quotient digit estimation step, remove unnecessary special case
   when vj == w->ob_digits[w_size-1].  Knuth only needs this special 
case
   because his base is the same as the wordsize.

6. There's no need to fill the eliminated digits of v with zero;
   instead, set Py_SIZE(v) at the end of the outer loop.

7. More comments; some extra asserts.

There are many other optimizations possible;  I've tried only to include 
those that don't significantly complicate the code.  An obvious 
remaining inefficiency is that on every iteration of the outer loop we 
divide by the top word of w;  on many machines I suspect that it would 
be more efficient to precompute an inverse and multiply by that instead.

History
Date	User	Action	Args
2009-02-20 16:01:53	mark.dickinson	set	recipients: + mark.dickinson, loewis, collinwinter, gregory.p.smith, pitrou, pernici, vstinner, christian.heimes, jyasskin, schuppenies
2009-02-20 16:01:53	mark.dickinson	set	messageid: <1235145713.49.0.825099294592.issue4258@psf.upfronthosting.co.za>
2009-02-20 16:01:52	mark.dickinson	link	issue4258 messages
2009-02-20 16:01:51	mark.dickinson	create