classification
Title: Use divide-and-conquer for faster factorials
Type: performance Stage: patch review
Components: Extension Modules Versions: Python 3.2
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: mark.dickinson Nosy List: Alexander.Belopolsky, belopolsky, draghuram, mark.dickinson, rhettinger, stutzbach
Priority: normal Keywords: needs review, patch

Created on 2010-05-11 20:22 by stutzbach, last changed 2010-05-16 08:45 by mark.dickinson. This issue is now closed.

Files
File name Uploaded Description Edit
factorial4.py belopolsky, 2010-05-13 21:50
factorial-no-recursion.patch belopolsky, 2010-05-13 23:40
factorial-test.patch stutzbach, 2010-05-14 03:55 Improve unit tests for math.factorial
factorial-precompute-partials.patch belopolsky, 2010-05-14 04:05
factorial-speed.sh stutzbach, 2010-05-14 04:19 Script to test factorial speed for several n
factorial.py belopolsky, 2010-05-14 18:05
factorial.patch stutzbach, 2010-05-14 20:09 Divide-and-conquer factorial algorithm
factorial2.patch mark.dickinson, 2010-05-14 23:42
factorial3.patch mark.dickinson, 2010-05-15 10:31
factorial4.patch mark.dickinson, 2010-05-15 11:43
unnamed Alexander.Belopolsky, 2010-05-15 15:12
Messages (72)
msg105537 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-05-11 20:22
(Making an educated guess about who to add to the Nosy list)

Attached is a patch to improve math.factorial by using a divide-and-conquer algorithm.  The old algorithm performs n-1 multiplications, mostly on numbers with a very large number of digits.

The algorithm in this patch:
- implicitly factors out all powers of two and applies a left-shift at the end.
- performs roughly half as many multiplications (around n/2 + 2*lg n)
- groups the multiplications so most multiplications are on small numbers
- uses a lookup table for n <= 12

There are faster factorial algorithms available, but they're significantly more complicated and rely on a fast prime factorization function.  This one is around 125 lines of code in C (with comments).  I have a pure-Python version that's around 25 lines of code, if anyone is interested.

Here are some timing results for different values of n:

n : old algorithm : new algorithm
1       0.14 us      0.10 us
10      0.63 us      0.12 us
13      0.81 us      0.76 us
100    12.6  us      4.92 us
1k    576    us    118    us
10k    53.6  ms      8.16 ms
100k   12.1  s     443    ms
1M      27   min    23    s
10M    forget it    20    min

I tested that both algorithms return the same answer for all values of n up to 10,000.
msg105548 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-05-11 21:18
I've noticed that your patch changes 

>>> math.factorial(2.**63)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C long

to

>>> math.factorial(2.**63)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: factorial() not defined for negative values


While the error message is wrong in both cases, I think OverflowError is a better exception in this case and there should not be a difference between math.factorial(2.**63) and math.factorial(2**63) behavior.
msg105551 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-05-11 21:39
On Tue, May 11, 2010 at 4:18 PM, Alexander Belopolsky
<report@bugs.python.org> wrote:
> While the error message is wrong in both cases, I think OverflowError is a better exception in this case and there should not be a difference between math.factorial(2.**63) and math.factorial(2**63) behavior.

Good catch!  I will fix that tomorrow.
msg105552 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-05-11 21:58
Yes, I'm interested in seeing the pure Python version.  It could go into test_math, and would be a useful form of documentation.

Are there sufficient tests already in test_math.py to exercise the code thoroughly, or are more needed?

I'll try to find time to do a thorough code review in the next few days.
msg105553 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-05-11 22:33
I also started to wonder if a tighter upper limit for an acceptable argument can be found.

In discussion of issue2138 I saw the following exchange:

"""
> Should there be some upper limit on the argument math.factorial would
take?

I'd say not.  Any such limit would be artificial, and an arbitrary
choice.  Just let the natural time and space requirements be the
limiting factor.
""" - msg62541 - Mark Dickinson -

Still, the original and proposed implementations bail out if n is larger than system LONG_MAX.  This is not a limitation because because the result for LONG_MAX! would exceed the number of digits that python long integer can hold.

It seems to me that the value of n for which number of digits will exceed sys.maxsize can be estimated fairly accurately using Stirling formula.  Only two values are relevant in practice - one for sys.maxsize = 2**63-1 and the other for sys.maxsize = 2**31-1.  These values can be hardcoded and factorial can quickly report the case when n! will exceed maxsize digits instead of hanging until memory is exhausted.
msg105557 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-05-11 23:20
On Tue, May 11, 2010 at 11:33 PM, Alexander Belopolsky
<report@bugs.python.org> wrote:
> It seems to me that the value of n for which number of digits will exceed sys.maxsize can be estimated fairly accurately using Stirling formula.  Only two values are relevant in practice - one for sys.maxsize = 2**63-1 and the other for sys.maxsize = 2**31-1.  These values can be hardcoded and factorial can quickly report the case when n! will exceed maxsize digits instead of hanging until memory is exhausted.

Sure, bailing out for ridiculously large arguments sounds fine to me.
On a 64-bit machine, there can be at most 2**61 4-byte digits, each
digit giving containing 30 bits of the long.  So the maximum
representable long (under the implausible assumption that someone
could actually find 2**63 bytes of storage) would be around
2**(30*2**61).  The following quick search gives me a value of around
1.18e18 for the first n such that n! exceeds this value:

from math import log, lgamma

def bisect(f, a, b):
    c = (a + b)/2.0
    while a != c and b != c:
        a, b = (a, c) if f(c) else (c, b)
        c = (a + b)/2.0
    return c

BOUND = 2**62*15*log(2)
print(bisect(lambda x: lgamma(x) > BOUND, 2.0, 1e30))
msg105559 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-05-11 23:42
On Tue, May 11, 2010 at 5:33 PM, Alexander Belopolsky
<report@bugs.python.org> wrote:
> It seems to me that the value of n for which number of digits will exceed sys.maxsize can be estimated fairly accurately using Stirling formula.  Only two values are relevant in practice - one for sys.maxsize = 2**63-1 and the other for sys.maxsize = 2**31-1.  These values can be hardcoded and factorial can quickly report the case when n! will exceed maxsize digits instead of hanging until memory is exhausted.

Isn't that adding an extra check in every case to speed up a
you-can't-seriously-expect-that-to-work corner case?
msg105561 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-05-12 00:12
On Tue, May 11, 2010 at 7:42 PM, Daniel Stutzbach
<report@bugs.python.org> wrote:
..
> Isn't that adding an extra check in every case to speed up a
> you-can't-seriously-expect-that-to-work corner case?
>

The check is cheap - just a machine integer comparison, so I would not
even take that cost into account.   In my view math.factorial() is
primarily of interest in educational settings where it is quite likely
that someone would be curious enough to pass sys.maxsize to it.

The main value in setting a theoretically justified limit is that
overflow exception can carry a meaningful message, e.g. "factorial
result would have too many digits", rather than an unhelpful "Python
int too large to convert to C long".
msg105562 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-05-12 00:38
On Tue, May 11, 2010 at 7:42 PM, Daniel Stutzbach
<report@bugs.python.org> wrote:
..
> Isn't that adding an extra check in every case ...

Speaking of micro-optimizations, did you consider a better than naive
algorithm for "Count the number of set bits in n" in your patch?
HAKMEM 169 comes to mind and being a divide and conquer too, it seems
like a good fit.   Certainly an overkill if used just for
math.factorial(), but this is probably a useful function to have
around.
msg105563 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-05-12 01:00
On Tue, May 11, 2010 at 7:15 PM, Alexander Belopolsky
<report@bugs.python.org> wrote:
> The main value in setting a theoretically justified limit is that
> overflow exception can carry a meaningful message, e.g. "factorial
> result would have too many digits", rather than an unhelpful "Python
> int too large to convert to C long".

I'm pretty sure this is an orthogonal issue to speeding up
math.factorial.  If you want to improve the error and/or impose a
tighter maximum limit on n, would you mind opening it as a separate
issue?

I like extra checks and I like speed, but I can't think about adding
extra checks and a speed patch at the same time. ;-)
msg105564 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-05-12 01:07
On Tue, May 11, 2010 at 7:38 PM, Alexander Belopolsky
<report@bugs.python.org> wrote:
> Speaking of micro-optimizations, did you consider a better than naive
> algorithm for "Count the number of set bits in n" in your patch?
> HAKMEM 169 comes to mind and being a divide and conquer too, it seems
> like a good fit.   Certainly an overkill if used just for
> math.factorial(), but this is probably a useful function to have
> around.

I considered it, but decided to stick with code readability and
portability.  Counting the number of set bits is only done once per
factorial, so it's not on the critical path.

FWIW, the following page has a pretty extensive summary of performance
comparisons of different solutions to the "count the set bits"
problem:
http://www.dalkescientific.com/writings/diary/archive/2008/07/03/hakmem_and_other_popcounts.html
msg105565 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-05-12 02:19
On Tue, May 11, 2010 at 9:07 PM, Daniel Stutzbach
<report@bugs.python.org> wrote:
>
> Daniel Stutzbach <daniel@stutzbachenterprises.com> added the comment:
>
> On Tue, May 11, 2010 at 7:38 PM, Alexander Belopolsky
> <report@bugs.python.org> wrote:
>> Speaking of micro-optimizations, did you consider a better than naive
>> algorithm for "Count the number of set bits in n" in your patch?
>> ..
> I considered it, but decided to stick with code readability and
> portability.

Speaking of readability, with a separate popcount() function, you can
simply write

nminusnumbits_ob = PyLong_FromLong(n - popcount(n))

and eliminate not only the loop, but also num_bits and tmp variables
from math_factorial()

The popcount function can be defined as a __builtin_popcount on GCC
and your loop otherwise.

>  Counting the number of set bits is only done once per
> factorial, so it's not on the critical path.
>

I agree, performance consideration are completely irrelevant here.

Similarly, while unlikely to improve performance, I would prefer not
to use any bit-trick implementation of ilog2 (in a separate function,
of course) instead of calling floating point log2.  In my head, an
assignment of floating point result to an integer variable always
raises a red flag.

Another readability nit:  for me k % 2 == 0 is a more readable check
for even number than (k & 1) != 1.  Performance-wise the two choices
are the same, and either can be improved by combining k = (n + m) / 2
and k & 1 into one ldiv call.

I have not tried to do it, but my gut feeling is that
factorial_part_product() can benefit from passing semi-open rather
than closed interval.  (Also renaming n and m to start and stop in
this case will help understanding.)
msg105568 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-05-12 03:44
On Tue, May 11, 2010 at 10:19 PM, Alexander Belopolsky
<report@bugs.python.org> wrote:
..
> Similarly, while unlikely to improve performance, I would prefer not
> to use any bit-trick implementation of ilog2 (in a separate function,
> of course) instead of calling floating point log2.  In my head, an
> assignment of floating point result to an integer variable always
> raises a red flag.
>

Searching for relevant past issues, I've come across a similar
sentiment from Mark:

"""
floor(log(n, 2)) is poor code.  This is not supposed to be a dramatic
statement, just a statement of fact.  Its correctness is dependent on
minute details of floating point.  It is poor code in exactly the same way
that "while x < 1.0: x += 0.1" is poor code---behaviour in boundary cases
is almost entirely unpredictable.
""" - msg78066 - Mark Dickinson -

I also noticed that the reference implementation does not require this
calculation because the loop is implemented recursively.  Did you find
recursive implementation to give worse performance?
msg105579 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-05-12 08:50
Some quick comments:

(1) Agree about the extra bound checks:  let's treat those as a separate issue and just concentrate on performance here.

(2) log2 isn't portable:  it's not part of C89 (though it is in C99) and I don't think it exists on Windows.  Which is a shame, since it probably *does* reliably work well for boundary cases on most platforms.  I'm embarrassed to read that snippet that Alexander found, but it's true that an alternative like log(n)/log(2) has problems in boundary cases, thanks to the usual floating-point issues.  There's a bit-counting method in the int.bit_length() implementation (in Objects/longobject.c) that could possibly be re-used here.  Alternatively, if a simple for-loop to count bits doesn't have any noticeable impact on speed, then we could use that.

(3) Is the 'count set bits' code a bottleneck?  If not, then it looks fine to me as it is.  Doesn't it just get called once per factorial computation?

(4) I wonder whether the recursion in factorial_part_product slows things down;  it might be interesting to compare with an iterative version (but still one that clumps together small pieces rather than doing lots of small*big multiplies).  It seems possible that the cost of the recursive calls is insignificant compared to the cost of the various Py* calls, though.

(5) Was there a reason for using long rather than unsigned long for the computations?  Using unsigned long would give you an easily computable multiply_cutoff, and save the need for that extra static variable (it could be a macro instead).
msg105590 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-05-12 14:39
On Tue, May 11, 2010 at 5:58 PM, Mark Dickinson <report@bugs.python.org> wrote:
> Yes, I'm interested in seeing the pure Python version.

Here is my translation of the reference implementation.

> It could go into test_math, and would be a useful form of documentation.

Note that I copied the reference implementation recursive logic rather
than that in the proposed patch.  It may be better for documentation
this way.

If we end up using something like this in documentation, I would
rename nminusnumofbits() to something more readable.  Maybe "ntwos" or
"count_trailing_zeros" with an explanation why number of factors of 2
in factorial(n) is n - popcount(n).
msg105592 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-05-12 15:27
On Tue, May 11, 2010 at 10:19 PM, Alexander Belopolsky
<report@bugs.python.org> wrote:
..
> Another readability nit:  for me k % 2 == 0 is a more readable check
> for even number than (k & 1) != 1.  Performance-wise the two choices
> are the same, and either can be improved by combining k = (n + m) / 2
> and k & 1 into one ldiv call.

Strike this comment.  For some reason I though GCC would optimize
division by 2 and inline ldiv, but apparently neither is true.

Still,

if ((k & 1) != 1)
          k = k - 1;

looks odd to me. Maybe k += (k & 1) - 1?
msg105593 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-05-12 15:36
Thank you both for the valuable feedback.  I'm working on a revised patch that incorporates your many suggestions.

I decided to use an iterative version for two reasons:
- the reference has a note at the bottom in a tiny font suggesting it
- I found it counter-intuitive to implement product() as a recursive function.  I had a much easier time matching the code up with the formula when it was implemented iteratively.

I agree that my use of log2() was an ugly hack. ;-)  I'll fix that.

Would it be worthwhile to create a pybits.h and .c that defines _Py_FindLastSetBit and _Py_CountSetBits? (with appropriate logic in the .h and configure.in to use system/compiler versions if available)

There are already two implementations of find-last-set-bit in Python: 
bits_in_digit() in Objects/longobject.c and hi0bits() in Python/dtoa.c.  It would be nice to consolidate them.

(hi0bits counts the leading 0 bits which is a trivial transformation of finding the highest set bit)

I don't know of anyplace else in Python that needs count-set-bits.
msg105594 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-05-12 16:00
On Wed, May 12, 2010 at 10:31 AM, Alexander Belopolsky
<report@bugs.python.org> wrote:
> if ((k & 1) != 1)
>          k = k - 1;
>
> looks odd to me. Maybe k += (k & 1) - 1?

If we change the logic slightly to put the odd entry on the right
instead of the left, we can do:

        k = (n + m) / 2;
        k |= 1; /* Ensure that k is odd */
        left = factorial_part_product(n, k-2);
        if (left == NULL) goto done;
        right = factorial_part_product(k, m);
        if (right == NULL) goto done;

That will split 1*3*5*7*9*11 into (1*3*5) * (7*9*11), just like the
old code.  It will split 1*3*5*7*9 into (1*3) * (5*7*9) while the old
code did (1*3*5) * (7*9), which is fine.

It's easier to read and fewer operations. :-)
msg105596 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-05-12 17:59
Now that I've looked at the patch properly:

I'm +1 on including some version of this patch.  At the time that the original math.factorial went in (see issue 2138) we were hurrying a bit to get it in before the first 2.6 beta, so ended up with a simple implementation, with the understanding (I think) that it could be improved later.

It looks like you and Alexander are doing a great job of hammering out the fine detail;  I only have a few comments at this stage.  I predict that you're not going to like the first one ;-).  The others are just technical issues.

(1) In the interests of simplicity, please could we lose the 'long' optimization in factorial_part_product?  That is, get rid of the if (m == n+2) branch, and just let that case recurse normally---which means that we end up calling PyNumber_Multiply in some cases instead of doing a C long by C long multiplication.  Then we can get rid of multiply_cutoff entirely.  I'm +1 on the improved algorithm, and I realize that the optimization does have an effect (some unscientific tests showed me a 18% speed increase in typical cases) but for me this optimization goes past the simplicity/speed tradeoff.  There's always the option of adding something like this back in later, once the new algorithm's gone in.

(2) You're missing a Py_DECREF(part) in factorial_loop, so factorial(n) leaks references (for n > 12).

(3) The line "k = (n + m) / 2;" in factorial_part_product invokes undefined behaviour (from signed overflow) if n and m are large.  We're not going to get meaningful results in this case anyway, but UB should be avoided if at all possible.  Perhaps rewrite this as "k = n + (m - n) / 2;"?

(4) And please do restore the PyLong_FromDouble line in the main routine, rather than using a C double-to-long cast.  The direct conversion again leads to undefined behaviour for large doubles (cf. C99 6.3.1.4,p2).
msg105597 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-05-12 18:03
> (cf. C99 6.3.1.4,p2).

Oops.  C99 6.3.1.4,p1.  That'll teach me not to cite chapter and verse.
msg105600 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-05-12 18:37
On Wed, May 12, 2010 at 12:59 PM, Mark Dickinson <report@bugs.python.org> wrote:
> (4) And please do restore the PyLong_FromDouble line in the main routine, rather than using a C double-to-long cast.  The direct conversion again leads to undefined behaviour for large doubles (cf. C99 6.3.1.4,p2).

I was planning to add a "if (dx > (double) LONG_MAX)" check.  Would
that be sufficient?
msg105602 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-05-12 19:07
>I was planning to add a "if (dx > (double) LONG_MAX)" check.  Would
> that be sufficient?

Hmm.  It's subtle.  On an LP64 machine, LONG_MAX will be 2**63-1, which isn't exactly representable as a double.  So (double) LONG_MAX would likely be 2.0**63 exactly (depending on rounding mode, but round-half-to-even is probably a safe assumption unless someone's deliberately messing around).  Then that check would fail for dx == 2.**63 exactly.

Turn it into '>=' rather than '>', and I *think* it's okay.
msg105603 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-05-12 19:24
On Wed, May 12, 2010 at 4:50 AM, Mark Dickinson  wrote:
..
> (4) I wonder whether the recursion in factorial_part_product slows things down;  it might be interesting to compare with an iterative version (but still one that clumps together small pieces rather than doing lots of small*big multiplies).  It seems possible that the cost of the recursive calls is insignificant compared to the cost of the various Py* calls, though.

I am attaching a little study of three different part_product
implementations in python: the recursive one, straight product, and
not-recursive binary division:

$ ./python.exe -m timeit -s "import factorial3 as fm;
fm.partial_product = fm.partial_product; f = fm.factorial " "f(10000)"
10 loops, best of 3: 66.1 msec per loop
$ ./python.exe -m timeit -s "import factorial3 as fm;
fm.partial_product = fm.partial_product1; f = fm.factorial "
"f(10000)"
10 loops, best of 3: 67.6 msec per loop
$ ./python.exe -m timeit -s "import factorial3 as fm;
fm.partial_product = fm.partial_product2; f = fm.factorial "
"f(10000)"
10 loops, best of 3: 43.4 msec per loop

The last one seems to b a clear winner, but I am not certain where the
gain comes from - no recursion or first by last instead of ith by
(i+1)st multiplication.   Also python recursion overhead is probably
different from C.
msg105604 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-05-12 19:47
Interesting---thanks for the analysis!

Realistically though, I don't see an iterative version of factorial_part_product as an option for the C patch, without a significant increase in complexity.  Daniel's current patch is remarkably clean and simple, and I'd like to keep it that way.

I did think about various evil schemes for an iterative version, but didn't come up with anything I'd want to see in the Python codebase.  (The worst of these schemes involved using a union of long and PyObject * to try to increase the possibilities for doing simple C long multiplication, and using the fact that you can easily tell the difference between an odd long and a (4-byte aligned) PyObject * just by looking at the last bit.  But I'm fairly sure that comes under the 'evil' heading.  :) )
msg105607 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-05-12 20:34
Here is one more datapoint.

$ ./python.exe -m timeit -s "import factorial4 as fm;
fm.partial_product = fm.partial_product; f = fm.factorial " "f(10000)"
10 loops, best of 3: 66.1 msec per loop
[32794 refs]
$ ./python.exe -m timeit -s "import factorial4 as fm;
fm.partial_product = fm.partial_product3; f = fm.factorial "
"f(10000)"
10 loops, best of 3: 63 msec per loop
[32794 refs]
$ ./python.exe -m timeit -s "import factorial4 as fm;
fm.partial_product = fm.partial_product2; f = fm.factorial "
"f(10000)"
10 loops, best of 3: 43.3 msec per loop

partial_product3 multiplies adjacent numbers instead of first by last.
 I am not sure it reproduces the order of multiplication in the
recursive version exactly, but it does show that the order of
multiplication matters a lot.

I wonder if one could write an elegant recursive version that would
multiply first by last in partial_product.
msg105610 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-05-12 21:26
On Wed, May 12, 2010 at 3:34 PM, Alexander Belopolsky
<report@bugs.python.org> wrote:
> I wonder if one could write an elegant recursive version that would
> multiply first by last in partial_product.

That could be done with a three-parameter partial_product, where the
third parameter designates how many numbers to multiply.  Something
like this:

part_product(9, 17, 5) -> 9*11*13*15*17
  part_product(9, 17, 2) -> 9*17
  part_product(11, 15, 3) -> 11*13*15
msg105653 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-05-13 19:35
After experimenting with changing the order of the multiplications and not having much luck, I went back and looked for other differences in Alexander's Python functions that might cause the speed difference.  I believe partial_product2 is fast because it performs all of its operations in-place using a single list, whereas partial_product3 creates a new list during each iteration.  Here's a version of partial_product3 that operates in-place and is just as fast as partial_product2:

def partial_product3(j, i):
    a = [l << 1 | 1 for l in range(j, i + 1)]
    n = len(a)
    while 1:
        if n == 1:
            return a[0]
        half = n//2
        for k in range(0,half):
            a[k] = a[k*2] * a[k*2+1]
        if n & 1:
            a[half] = a[n-1]
        n = half
msg105655 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-05-13 20:57
Daniel,

Your variant does not seem to work:

>>> def partial_product3(j, i):
...     a = [l << 1 | 1 for l in range(j, i + 1)]
...     n = len(a)
...     while 1:
...         if n == 1:
...             return a[0]
...         half = n//2
...         for k in range(0,half):
...             a[k] = a[k*2] * a[k*2+1]
...         if n & 1:
...             a[half] = a[n-1]
...         n = half

>>> partial_product3(4,6)
99
>>> 9 * 11 * 13
1287

but it looks like I posted a buggy version of partial_product2 as well.  Strange because I thought I had enough doctests to catch the errors.  I'll redo the testing.
msg105657 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-05-13 21:13
Isn't it amazing how fast one can make incorrect code? ;-)

Here is a fixed version of my partial_product3, but now it is no faster than partial_product.

def partial_product3(j, i):
    a = [l << 1 | 1 for l in range(j, i + 1)]
    n = len(a)
    while 1:
        if n == 1:
            return a[0]
        half = n//2
        for k in range(0,half):
            a[k] = a[k*2] * a[k*2+1]
        if n & 1:
            a[half] = a[n-1]
            n = half + 1
        else:
            n = half
msg105660 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-05-13 21:28
Does anyone feel like doing a speed comparison between Daniel's C patch and a version with a direct no-frills iterative version of factorial_part_product (i.e., just a simple  'for (i = n; i <= m; i += 2) { <multiply running product by i> }?  I have a sneaking suspicion that the iterative version will be faster even for quite large values of n, but I'd be happy to be proven wrong.
msg105661 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-05-13 21:31
And why are we trying to speed up the pure Python factorial code here?  I don't imagine that those speed differences are going to translate well to C.
msg105663 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-05-13 21:50
On Thu, May 13, 2010 at 5:31 PM, Mark Dickinson <report@bugs.python.org> wrote:
> And why are we trying to speed up the pure Python factorial code here?

I would expect that for large factorials the performance will be
determined by the number of long multiplications and the size of
multiplicands.

> I don't imagine that those speed differences are going to translate well to C.

The differences between recursive and non-recursive versions are not
likely to translate well, but the difference (if any) between the
order of multiplication most likely will.

In any case, I am attaching fixed version of factorial4.

$ ./python.exe -m timeit -s "from factorial4 import f0 as f" "f(10000)"
10 loops, best of 3: 65.5 msec per loop
$ ./python.exe -m timeit -s "from factorial4 import f1 as f" "f(10000)"
10 loops, best of 3: 66.9 msec per loop
$ ./python.exe -m timeit -s "from factorial4 import f2 as f" "f(10000)"
10 loops, best of 3: 56.5 msec per loop
$ ./python.exe -m timeit -s "from factorial4 import f3 as f" "f(10000)"
10 loops, best of 3: 63 msec per loop
msg105664 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-05-13 21:58
> I would expect that for large factorials the performance will be
> determined by the number of long multiplications and the size of
> multiplicands.

Okay, but I don't think we should care about the performance of *really* large factorials for Python.  People who care about every bit of speed in that situation should be using GMP or something similar.  An optimization that only makes a difference for (say) factorial(50000) or higher isn't going to make much difference to most Python users.  Optimizations that speed up, say, factorial(n) for n <= 1000 would seem more valuable.

> The differences between recursive and non-recursive versions are not
> likely to translate well, but the difference (if any) between the
> order of multiplication most likely will.

Perhaps.  But the differences between the various Python versions here are small enough that they could easily be swamped by other factors involved in the Python-to-C translation.

We already have a working C patch here (modulo minor issues), and I'd like to move forward with that patch;  I think this issue discussion is getting a bit side-tracked.

grumpily-yours...
msg105665 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-05-13 22:09
Speaking of getting side-tracked, I didn't see an answer to a question I asked earlier.  I'd like to get some feedback before I proceed with revising the patch.

For the find-last-set-bit (to replace log2) and count-set-bits operations, would it be worthwhile to create a pybits.h and .c that defines _Py_FindLastSetBit and _Py_CountSetBits? (with appropriate logic in the .h and configure.in to use system/compiler versions if available)

There are already two implementations of find-last-set-bit in Python:
bits_in_digit() in Objects/longobject.c and hi0bits() in Python/dtoa.c, which I could consolidate.

Alternately, I could just add static functions to mathmodule.c with the simplest possible implementation (they're only called once per factorial, so the performance impact is minimal).
msg105666 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-05-13 22:37
On Thu, May 13, 2010 at 6:09 PM, Daniel Stutzbach
<report@bugs.python.org> wrote:
..
> Speaking of getting side-tracked, I didn't see an answer to a question I asked
> earlier.  I'd like to get some feedback before I proceed with revising the patch.

I did not respond because I don't have an answer. :-)  Maybe it is
time to revisit Raymond's reasoning in msg63969 where he argued that
factorial should be an int method because it would benefit from access
to integer implementation details.

He also argued that
"""
Compared to  numbits() and isqrt(), a factorial() method is more basic
in that it is self explanatory and everyone knows what it means by the
time they are in middle school.
"""
nevertheless numbits() aka bit_length() has become an int method, but
factorial landed in math.

>
> For the find-last-set-bit (to replace log2) and count-set-bits operations, would it
> be worthwhile to create a pybits.h and .c that defines _Py_FindLastSetBit and >_Py_CountSetBits? (with appropriate logic in the .h and configure.in to use
> system/compiler versions if available)

Since it is unlikely that either factorial() or bit_length() will be
moved from their current location, I would be +1 on creating pybits.
I would give them different names, though.  Popcount seems to be the
most popular name for CountSetBits and instead of FindLastSetBit, a
more common function seems to be nlz, number of leading zeros.  On the
other hand, since  bit_length is already established, maybe
_Py_bit_length_long(long) and (if needed) _Py_bit_length_int(int)
would make sense.

>
> There are already two implementations of find-last-set-bit in Python:
> bits_in_digit() in Objects/longobject.c and hi0bits() in Python/dtoa.c,
> which I could consolidate.

How did you find it?!  I hope we'll end up with a better name than that.

>
> Alternately, I could just add static functions to mathmodule.c with the simplest
> possible implementation (they're only called once per factorial, so the
> performance impact is minimal).

In the scope of this issue I would say do that.  Pybits proposal seem
to deserve it's own issue.
msg105668 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-05-13 23:40
On Wed, May 12, 2010 at 3:47 PM, Mark Dickinson <report@bugs.python.org> wrote:
>...
> Realistically though, I don't see an iterative version of factorial_part_product as
> an option for the C patch, without a significant increase in complexity.  Daniel's
> current patch is remarkably clean and simple, and I'd like to keep it that way.
>

I am attaching an iterative version in C patch.  I don't think it
represents a dramatic increase in complexity ~ 40 lines over Daniel's
30.

> I did think about various evil schemes for an iterative version, ...

I would not say my patch is evil, maybe a bit naughty. :-) It can be
made less evil by resizing the list instead of filling its tail with
NULLs or more evil by using a tuple instead of list.

The performance  appears to be identical to Daniel's with no small
integer multiplication optimization.  The later gives about 2%
improvement.
msg105678 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-05-14 02:17
On Thu, May 13, 2010 at 5:58 PM, Mark Dickinson <report@bugs.python.org> wrote:
> Optimizations that speed up, say, factorial(n) for n <= 1000 would seem more valuable.

I am attaching a variant of my patch which precomputes partial
products that fit in 32 bit unsigned int.  This results in speed up
over Daniel's code which varies from 1.8x for 20! down to 7% for 100!
and no measurable improvement for 1000!.

This optimization is orthogonal to the choice of partial_product
algorithm and can be easily extended on platforms with long long to
precompute 64 bit products.
msg105679 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-05-14 03:02
That's a clever idea.  Do you have a Python script that generates the
precomputed values?
msg105680 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-05-14 03:12
It's a little too clever though.  It gives the wrong answer for 29!.

I'll have a revised version of my patch done sometime tomorrow.
msg105681 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-05-14 03:54
Oh, my!

How did that last term get into precomputed list?!

It should have been 

precomputed[] = {3, 15, 5, 35, 315, 63, 693, 9009, 1287, 19305, 328185, 36465, 692835,
                 14549535, 1322685, 30421755, 760543875, 58503375, 1579591125ul};

The next term is 36 bit
>>> product(i<<1|1 for i in range(7,15))
45808142625
>>> _.bit_length()
36

I'll replace the patch.
msg105682 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-05-14 03:55
Attached is a patch to improve the unit tests for the factorial function.

To compute the check value, it keeps a running total instead of recomputing the factorial from scratch inside the loop.  It checks up to range(999) and is quite fast.  The previous code checked range(10).

It also adds the following checks:
 * 0! == 1
 * (sys.maxsize+1)! => OverflowError
 * 10e100 => OverflowError
msg105683 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-05-14 04:19
Attached is a simple bash script to run math.factorial(n) through timeit for several values of n.  It makes comparing the speed of different builds MUCH easier.
msg105684 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-05-14 04:42
Mark> Does anyone feel like doing a speed comparison between Daniel's C patch and a version with a direct no-frills iterative version of factorial_part_product (i.e., just a simple  'for (i = n; i <= m; i += 2) { <multiply running product by i> }?

Not a direct answer to your question, but replacing a bisect with a no-frills algorithm in my precompute patch gives the following timings:

   n    bisect   no-frills
   100  38.9 us  38.5 us
  1000  .904 ms  1.08 ms
 10000  35.4 ms  50.3 ms


The no-frills product still takes 20 lines of C code though:

    n = last - first + 1;
    if (n <= 0)
        return PyLong_FromLong(1L);

    result = PyLong_FromUnsignedLong(ODD(first));
    if (result == NULL)
        return NULL;
    for  (i = 1; i < n; ++i) {
        x = PyLong_FromUnsignedLong(ODD(first + i));
        if (x == NULL)
            goto error;
        tmp = PyNumber_Multiply(result, x);
        Py_DECREF(x);
        if (tmp == NULL)
            goto error;
        Py_DECREF(result);
        result = tmp;
    }
    return result;
 error:
    Py_DECREF(result);
    return NULL;
msg105689 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-05-14 07:45
> Daniel Stutzbach <daniel@stutzbachenterprises.com> added the comment:
>
> Speaking of getting side-tracked, I didn't see an answer to a question I asked earlier.  I'd like to get some feedback before I proceed with revising the patch.
>
> For the find-last-set-bit (to replace log2) and count-set-bits operations, would it be worthwhile to create a pybits.h and .c that defines _Py_FindLastSetBit and _Py_CountSetBits? (with appropriate logic in the .h and configure.in to use system/compiler versions if available)

How about putting them in pymath.c and pymath.h?   Then there's no
need for new files.  There's the possible issue that the two
bit-counting methods operate on different types, though.

> There are already two implementations of find-last-set-bit in Python:
> bits_in_digit() in Objects/longobject.c and hi0bits() in Python/dtoa.c, which I could consolidate.

dtoa.c should be left alone, ideally:  it's currently almost
completely self-contained, and also very close to the original dtoa.c
from David Gay, which makes it easy to incorporate fixes from
upstream.  So it's just Objects/longobject.c that would share the
code.

> Alternately, I could just add static functions to mathmodule.c with the simplest possible implementation (they're only called once per factorial, so the performance impact is minimal).

That would work, too.
msg105691 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-05-14 08:04
> I am attaching an iterative version in C patch.

Thanks for doing this comparison.

> The performance  appears to be identical to Daniel's with no small
> integer multiplication optimization.

Okay, let's stick with the recursive version, then.  It has the advantage that it uses less space (no need to store the entire list of odd terms).
msg105715 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-05-14 14:58
Attached is an updated patch.

In addition to the code cleanup and bug fix suggestions, it includes a new base-case for factorial_partial_product.  Instead of:

if (n >= m)
    return n
if (n + 2 == m)
    return n*m
otherwise divide-and-conquer

It now does:

if (the_answer_will_fit_in_unsigned_long)
    compute_product_via_tight_loop()
    return answer
otherwise divide-and-conquer

It's around half the code of the previous factorial_partial_product(), if you don't count comments.  It's also much faster for small n and somewhat faster for moderate n.

original patch:
13: 0.848 usec
50: 2.4 usec
100: 4.68 usec
1000: 121 usec
10000: 7.68 msec
100000: 434 msec

new patch:
13: 0.5 usec
50: 1.2 usec
100: 2.28 usec
1000: 100 usec
10000: 7.32 msec
100000: 447 msec

I also experimented with adding Alexander's precompute logic to my factorial_loop.  However, it did not result in a significant speedup, since all of the cases that could leverage the precompute table now execute in the new fast base case.

The new patch includes the new unit tests I uploaded earlier.
msg105729 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-05-14 18:05
I agree, recursive version of partial_product is much simpler to follow.  While allocation of all numbers can be avoided in iterative version, doing so would further complicate code with little benefit.

I still believe, however that an iterative version can benefit from redefining partial_product to be product(2*i+1 for i in range(start, stop)).

Divide and conquer algorithm for that is simply

def partial_product(start, stop):
    length = stop - start
    .. handle length = 1 and 2 ..
    middle = start + (length >> 1)
    return partial_product(start, middle) * partial_product(middle, stop)

I would also reconsider the decision of using iterative outer loop.  Recursive outer loop matching redefined partial_product() can be written as

def loop(n):
    p = r = 1
    if n > 2:
        p, r = loop(n >> 1)
        p *= partial_product((n >> 2) + (n >> 1 & 1),
                             (n >> 1) + (n & 1))
        r *= p
    return p, r

which I believe is not harder to follow than the iterative alternative and does not require bit_length calculation.

I am replacing my python implementation, factorial.py, with the one that uses algorithms outlined above.

If there is any interest, I can convert it to C.
msg105731 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-05-14 18:06
> I still believe, however that an iterative version can benefit ..

s/iterative/recursive/
msg105754 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-05-14 20:09
I made a few minor updates to the patch.

It redefines partial_product to product(range(n, m, 2)), which saved me a few operations and is more Pythonic than what I had before. :-)

(Not quite what Alexander was after, but it's a step in the right direction.  His proposed defining of partial_product would have complicated my new base case.)
msg105756 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-05-14 20:25
The patch looks good.  Just one issue: I think the overflow check for num_operands * last_bit is bogus.  It's possible for the product to overflow and still end up being less than num_operands.  How about:

if (num_operands <= BITS_IN_LONG && num_operands * last_m_bit <= BITS_IN_LONG) ...

instead?  If num_operands <= BITS_IN_LONG then there's no danger of overflow in the multiplication, and if num_operands > BITS_IN_LONG then the second test will certainly fail.

Apart from that, and some very minor style issues (e.g., please put the body of an if statement on the following line;  I don't think this is codified in PEP7, but it seems to be the prevailing style in the Python codebase) I think the patch is good to go.

[Alexander]
"... benefit from redefining partial_product to be product(2*i+1 for i in range(start, stop))"

You mean rewriting to use half-open intervals [start, stop) rather than closed intervals?  In which case I agree it would look slightly cleaner that way, but I don't have particularly strong feelings on the issue.  I'll leave it up to Daniel whether he wants to change this or not.


"I would also reconsider the decision of using iterative outer loop."

I actually prefer the iterative version, perhaps out of a Python-bred aversion to recursion (in languages where the frames end up on the stack, anyway).  The bit_length calculation doesn't seem like a terrible price to pay to me.  Again, I leave this up to Daniel.


Marking this as accepted:  Daniel, if you like I can apply the overflow check change suggested above and the minor style fixes and apply this;  or if you want to go another round and produce a third patch, that's fine too.  Let me know.

(N.B.  If you do produce another patch, please name it something different and don't remove the earlier version---removing the earlier patch versions makes life harder for anyone trying to follow this issue.)
msg105757 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-05-14 20:29
Daniel, sorry:  I spent so long writing that last message that I didn't read your update until now.

The new patch looks fine;  same caveat about the overflow check as before.
Let me know when you want me to apply this.
msg105759 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-05-14 20:32
One other note:  I find the bit numbering in find_last_set_bit peculiar:  isn't the least significant bit usually bit 0?  (Well, okay some people number the msb 0, but that's just weird. :)  I know the ffs and fls functions also start their bit numbering at one, but this seems very unconventional to me.

Perhaps rename find_last_set_bit to bit_length?  i.e., it's the *size* of the small bitfield that can contain the given value, rather than the index of the highest bit in that bitfield.
msg105760 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-05-14 20:44
> it's the *size* of the small bitfield

*smallest
msg105770 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-05-14 21:38
On Fri, May 14, 2010 at 3:25 PM, Mark Dickinson <report@bugs.python.org> wrote:
> The patch looks good.  Just one issue: I think the overflow check for num_operands * last_bit is bogus.  It's possible for the product to overflow and still end up being less than num_operands.  How about:

You're right; I'm not sure what I was thinking. (actually, I'm pretty
sure I was thinking about addition ;) )

> if (num_operands <= BITS_IN_LONG && num_operands * last_m_bit <= BITS_IN_LONG) ...
>
> instead?  If num_operands <= BITS_IN_LONG then there's no danger of overflow in the multiplication, and if num_operands > BITS_IN_LONG then the second test will certainly fail.

Yes, that looks good.

> Marking this as accepted:  Daniel, if you like I can apply the overflow check change suggested above and the minor style fixes and apply this;  or if you want to go another round and produce a third patch, that's fine too.  Let me know.

That would be great. :)

With regards to find_last_set_bit, I understand where you're coming
from, but I'm following the convention of ffs/fls and also
bits_in_digit() from longobject.c.  I'd like to keep them
interchangeable, in case they're consolidated at some point in the
future.  If you want to rename the function, that's fine with me.

FWIW, I think the MSB=0th people are all electrical engineers.  If
you're serializing down to bits and the MSB is the first (0th) bit off
the wire, it makes sense in that context.

> (N.B.  If you do produce another patch, please name it something different and don't remove the earlier version---removing the earlier patch versions makes life harder for anyone trying to follow this issue.)

Good to know.  I'll keep that in mind going forward.

Thanks to both of you for the rigorous review.  This was a fun patch
to work on. :-)
msg105777 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-05-14 23:42
Okay, here's what I'm planning to commit (tomorrow), in case anyone wants to give it one last check.  I've added some comments half-explaining the algorithm used (just in case the referenced URL goes out of existence).  I made a couple of other minor changes to the algorithm:

(1) in factorial_partial_product, use 'k = (n + num_operands) | 1;' rather than 'k = (n + num_operands - 1) | 1;'.  This saves an operation, and means that when a range including an odd number of terms is split then the bottom half gets the extra term, which makes the partial products a teensy bit more balanced, since the bottom half consists of smaller numbers.

(2) in factorial_loop, I set the initial value of 'upper' to 3 rather than 1.  This avoids factorial_partial_product ever being called with a start of 1.

Apart from that, no significant changes.
msg105782 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-05-15 02:14
Looks good.
msg105783 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-05-15 02:18
Mark, It is always a pleasure to read your algorithm descriptions!

Just a few comments:

- It would be nice if a pure python implementation would make it
somewhere in the code base.  Maybe it can be placed in comments in C
file or added to the test module somehow.

- Please rename arguments to factorial_partial_product() from n, m to
start, stop.   It is confusing enough that n has a different meaning
loop() and partial_product(), but n, m coming in reverse alphabetical
order makes it really hard to follow the code.

- Using n as a variable in the fast loop saves one integer assignment
(likely to be optimized away) at the expense of clarity.    One has to
realize that there is an unconditional return after the loop before
concluding that n is the same in the recursive part.

- "We know that m is the largest number to be multiplied."  Shouldn't
that be m - 2?

- Is find_last_set_bit() the same as bit_length()?  If so, why isn't
it called the same?   I don't want to bikeshed, but seeing
find_last_set_bit makes me wonder if "last" means most or least
significant bit.

- There is a comment line /* r *= inner; */ in factorial_loop() but no
r variable.  Shouldn't that be outer *= inner?

I'll review math_factorial() after reading Daniel's message that just arrived.
msg105784 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-05-15 02:27
Some more ...

- Mark's notes talk about "odd-part-of-factorial".  It would be
clearer if r variable in math_factorial() was called odd_part.

- I would also rename nminusnumbits to ntwos or something else that
explains what it is rather than how it is computed.
msg105797 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-05-15 10:31
New patch (factorial3.patch) addressing all of Alexander's points except the one about including Python source somewhere.

I also expanded the lookup table to 20 entries on LP64 systems.
msg105801 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-05-15 11:43
And the same patch, but with a (deliberately simple) pure Python version of the algorithm in test_math.py.
msg105806 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-05-15 12:47
On Sat, May 15, 2010 at 6:32 AM, Mark Dickinson <report@bugs.python.org> wrote:
> New patch (factorial3.patch) addressing all of Alexander's points except the one about including Python source somewhere.

Thanks for making the changes.  I think we converged to a really neat
implementation.

>
> I also expanded the lookup table to 20 entries on LP64 systems.
>

It's a matter of taste, but I was taught that C allows trailing commas
in initializers specifically for the cases like this to avoid using a
leading comma.
msg105807 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-05-15 12:54
> It's a matter of taste, but I was taught that C allows trailing commas
> in initializers specifically for the cases like this to avoid using a
> leading comma.

Unfortunately, I think C89 doesn't allow for this, and there are tracker issues about Python compiles failing on some platforms (AIX?) thanks to extra trailing commas in the source.  I prefer the extra comma too, but it's not legal C89.
msg105808 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-05-15 13:03
Ah;  Alexander's right:  I was misremembering.  An extra comma in an *enum* list isn't allowed (cf. issue 5889);  an extra comma in an array initializer is.  I'll rewrite that bit.
msg105811 - (view) Author: Alexander Belopolsky (Alexander.Belopolsky) Date: 2010-05-15 13:45
There is one place in the notes still referring to  
factorial_part_product.
msg105812 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-05-15 14:14
The comment for bit_length is missing a space or two:
"Objects/longobject.c.Someday"
msg105813 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-05-15 14:27
> There is one place in the notes still referring to
> factorial_part_product.

Hmm.  I can't find it.  Can you be more specific?

I'll fix the spaces before 'Someday'.
msg105814 - (view) Author: Alexander Belopolsky (Alexander.Belopolsky) Date: 2010-05-15 15:12
Sorry for terseness. Sending it from my phone.

The line was in factorial4.patch:

+ * The factorial_partial_product function computes the product of all  
odd j in

>>
>
> Hmm.  I can't find it.  Can you be more specific?
>
>
msg105815 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-05-15 15:16
Okay, thanks.  I'm still not seeing what's wrong with this, though  (sorry for being slow :(  )
msg105816 - (view) Author: Alexander Belopolsky (Alexander.Belopolsky) Date: 2010-05-15 15:22
s/partial_product/odd_part/

It looks like you made this change in some places but not all.

On May 15, 2010, at 11:16 AM, Mark Dickinson <report@bugs.python.org>  
wrote:

>
> Mark Dickinson <dickinsm@gmail.com> added the comment:
>
> Okay, thanks.  I'm still not seeing what's wrong with this, though   
> (sorry for being slow :(  )
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue8692>
> _______________________________________
msg105817 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-05-15 15:26
But factorial_partial_product and factorial_odd_part both exist:  the former is just computing the product of all odd integers in the given interval, while the latter computes the odd part of factorial(n).

I've double checked the comments and they still look okay to me.  That particular reference really is to factorial_partial_product.
msg105818 - (view) Author: Alexander Belopolsky (Alexander.Belopolsky) Date: 2010-05-15 16:08
Sorry I didn't realize that ...
msg105851 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2010-05-16 08:45
Committed in r81196.  Thanks, everyone!
History
Date User Action Args
2010-05-16 08:45:30mark.dickinsonsetstatus: open -> closed
keywords: patch, patch, needs review
messages: + msg105851
2010-05-15 16:08:23Alexander.Belopolskysetmessages: + msg105818
2010-05-15 15:26:34mark.dickinsonsetkeywords: patch, patch, needs review

messages: + msg105817
2010-05-15 15:22:30Alexander.Belopolskysetmessages: + msg105816
2010-05-15 15:16:27mark.dickinsonsetkeywords: patch, patch, needs review

messages: + msg105815
2010-05-15 15:12:55Alexander.Belopolskysetfiles: + unnamed

messages: + msg105814
2010-05-15 14:27:03mark.dickinsonsetkeywords: patch, patch, needs review

messages: + msg105813
2010-05-15 14:14:09stutzbachsetmessages: + msg105812
2010-05-15 13:45:31Alexander.Belopolskysetnosy: + Alexander.Belopolsky
messages: + msg105811
2010-05-15 13:03:41mark.dickinsonsetkeywords: patch, patch, needs review

messages: + msg105808
2010-05-15 12:54:50mark.dickinsonsetkeywords: patch, patch, needs review

messages: + msg105807
2010-05-15 12:47:41belopolskysetmessages: + msg105806
2010-05-15 11:43:35mark.dickinsonsetkeywords: patch, patch, needs review
files: + factorial4.patch
messages: + msg105801
2010-05-15 10:32:05mark.dickinsonsetkeywords: patch, patch, needs review
files: + factorial3.patch
messages: + msg105797
2010-05-15 02:27:57belopolskysetmessages: + msg105784
2010-05-15 02:18:24belopolskysetmessages: + msg105783
2010-05-15 02:14:28stutzbachsetkeywords: patch, patch, needs review

messages: + msg105782
2010-05-14 23:42:14mark.dickinsonsetkeywords: patch, patch, needs review
assignee: stutzbach -> mark.dickinson
messages: + msg105777

files: + factorial2.patch
2010-05-14 21:38:51stutzbachsetmessages: + msg105770
2010-05-14 20:44:42mark.dickinsonsetkeywords: patch, patch, needs review

messages: + msg105760
2010-05-14 20:32:56mark.dickinsonsetkeywords: patch, patch, needs review

messages: + msg105759
2010-05-14 20:29:19mark.dickinsonsetkeywords: patch, patch, needs review

messages: + msg105757
2010-05-14 20:25:02mark.dickinsonsetkeywords: patch, patch, needs review
resolution: accepted
messages: + msg105756
2010-05-14 20:09:47stutzbachsetkeywords: patch, patch, needs review
files: + factorial.patch
messages: + msg105754
2010-05-14 19:59:31stutzbachsetfiles: - factorial.patch
2010-05-14 18:06:40belopolskysetkeywords: patch, patch, needs review

messages: + msg105731
2010-05-14 18:05:06belopolskysetkeywords: patch, patch, needs review
files: + factorial.py
messages: + msg105729
2010-05-14 17:45:53belopolskysetfiles: - factorial3.py
2010-05-14 17:45:48belopolskysetfiles: - factorial.py
2010-05-14 14:58:04stutzbachsetkeywords: patch, patch, needs review
files: + factorial.patch
messages: + msg105715
2010-05-14 14:35:07stutzbachsetfiles: - factorial.patch
2010-05-14 08:04:54mark.dickinsonsetkeywords: patch, patch, needs review

messages: + msg105691
2010-05-14 07:45:58mark.dickinsonsetmessages: + msg105689
2010-05-14 04:42:52belopolskysetkeywords: patch, patch, needs review

messages: + msg105684
2010-05-14 04:19:02stutzbachsetkeywords: patch, patch, needs review
files: + factorial-speed.sh
messages: + msg105683
2010-05-14 04:05:01belopolskysetkeywords: patch, patch, needs review
files: + factorial-precompute-partials.patch
2010-05-14 04:04:50belopolskysetfiles: - factorial-precompute-partials.patch
2010-05-14 04:04:45belopolskysetfiles: - factorial-precompute-partials.patch
2010-05-14 04:04:36belopolskysetkeywords: patch, patch, needs review
files: + factorial-precompute-partials.patch
2010-05-14 03:55:58stutzbachsetkeywords: patch, patch, needs review
files: + factorial-test.patch
messages: + msg105682
2010-05-14 03:54:42belopolskysetkeywords: patch, patch, needs review

messages: + msg105681
2010-05-14 03:12:21stutzbachsetmessages: + msg105680
2010-05-14 03:02:38stutzbachsetmessages: + msg105679
2010-05-14 02:17:00belopolskysetfiles: + factorial-precompute-partials.patch

messages: + msg105678
2010-05-13 23:40:20belopolskysetfiles: + factorial-no-recursion.patch

messages: + msg105668
2010-05-13 22:37:15belopolskysetmessages: + msg105666
2010-05-13 22:09:06stutzbachsetkeywords: patch, patch, needs review

messages: + msg105665
2010-05-13 21:58:46mark.dickinsonsetkeywords: patch, patch, needs review

messages: + msg105664
2010-05-13 21:57:53belopolskysetfiles: - factorial4.py
2010-05-13 21:50:45belopolskysetfiles: + factorial4.py

messages: + msg105663
2010-05-13 21:31:35mark.dickinsonsetkeywords: patch, patch, needs review

messages: + msg105661
2010-05-13 21:28:07mark.dickinsonsetkeywords: patch, patch, needs review

messages: + msg105660
2010-05-13 21:13:33stutzbachsetkeywords: patch, patch, needs review

messages: + msg105657
2010-05-13 20:57:02belopolskysetkeywords: patch, patch, needs review

messages: + msg105655
2010-05-13 19:35:28stutzbachsetkeywords: patch, patch, needs review

messages: + msg105653
2010-05-12 21:26:51stutzbachsetmessages: + msg105610
2010-05-12 20:34:31belopolskysetfiles: + factorial4.py

messages: + msg105607
2010-05-12 19:47:43mark.dickinsonsetkeywords: patch, patch, needs review

messages: + msg105604
2010-05-12 19:24:14belopolskysetfiles: + factorial3.py

messages: + msg105603
2010-05-12 19:07:59mark.dickinsonsetkeywords: patch, patch, needs review

messages: + msg105602
2010-05-12 18:37:49stutzbachsetmessages: + msg105600
2010-05-12 18:03:29mark.dickinsonsetkeywords: patch, patch, needs review

messages: + msg105597
2010-05-12 17:59:03mark.dickinsonsetkeywords: patch, patch, needs review

messages: + msg105596
2010-05-12 16:01:01stutzbachsetmessages: + msg105594
2010-05-12 15:36:36stutzbachsetkeywords: patch, patch, needs review

messages: + msg105593
2010-05-12 15:28:23belopolskysetmessages: + msg105592
2010-05-12 15:13:47draghuramsetkeywords: patch, patch, needs review
nosy: + draghuram
2010-05-12 14:41:53belopolskysetfiles: + factorial.py

messages: + msg105590
2010-05-12 08:50:06mark.dickinsonsetkeywords: patch, patch, needs review

messages: + msg105579
2010-05-12 03:44:58belopolskysetmessages: + msg105568
2010-05-12 02:19:42belopolskysetmessages: + msg105565
2010-05-12 01:07:54stutzbachsetmessages: + msg105564
2010-05-12 01:00:42stutzbachsetmessages: + msg105563
2010-05-12 00:38:40belopolskysetmessages: + msg105562
2010-05-12 00:13:50belopolskysetmessages: + msg105561
2010-05-11 23:42:26stutzbachsetmessages: + msg105559
2010-05-11 23:20:52mark.dickinsonsetmessages: + msg105557
2010-05-11 22:33:29belopolskysetkeywords: patch, patch, needs review

messages: + msg105553
2010-05-11 21:58:38mark.dickinsonsetkeywords: patch, patch, needs review

messages: + msg105552
2010-05-11 21:39:42stutzbachsetmessages: + msg105551
2010-05-11 21:18:00belopolskysetkeywords: patch, patch, needs review
nosy: + belopolsky
messages: + msg105548

2010-05-11 20:22:59stutzbachcreate