classification
Title: Search not needed in combinations_with_replacement
Type: performance Stage: resolved
Components: Extension Modules Versions: Python 3.5, Python 2.7
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: rhettinger, serhiy.storchaka, terry.reedy, tim.peters
Priority: low Keywords: easy

Created on 2013-05-07 21:38 by tim.peters, last changed 2013-09-02 21:57 by tim.peters. This issue is now closed.

Messages (5)
msg188686 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2013-05-07 21:38
Each time thru, CWR searches for the rightmost position not containing the maximum index.  But this is wholly determined by what happened the last time thru - search isn't really needed.  Here's Python code:

def cwr2(iterable, r):
    pool = tuple(iterable)
    n = len(pool)
    if not n and r:
        return
    indices = [0] * r
    yield tuple(pool[i] for i in indices)
    j = r-1 if n > 1 else -1
    while j >= 0:
        newval = indices[j] + 1
        indices[j:] = [newval] * (r - j)
        yield tuple(pool[i] for i in indices)
        j = r-1 if newval < n-1 else j-1

There `j` is the rightmost position not containing the maximum index.  A little thought suffices to see that the next j is either r-1 (if newval is not the maximum index) or j-1 (if newval is the maximum index:  since the indices vector is non-decreasing, if indices[j] was r-2 then indices[j-1] is also at most r-2).

I don't much care if this goes in, but Raymond should find it amusing so assigning it to him ;-)
msg188687 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2013-05-07 21:41
Oops!  Last part should read

"since the indices vector is non-decreasing, if indices[j] was n-2 then indices[j-1] is also at most n-2"

That is, the instances of "r-2" in the original should have been "n-2".
msg188735 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2013-05-08 20:48
There's another savings to be had when an index becomes the maximum:  in that case, all the indices to its right are already at the maximum, so no need to overwrite them.  This isn't as big a savings as skipping the search, but still buys about 10% more in the Python code.  Like so:

    def cwr3(iterable, r):
        pool = tuple(iterable)
        n = len(pool)
        if n == 0 and r:
            return
        indices = [0] * r
        yield tuple(pool[i] for i in indices)
        rmax, nmax = r-1, n-1
        j = rmax if n > 1 else -1
        while j >= 0:
            newval = indices[j] + 1
            if newval < nmax:
                indices[j:] = [newval] * (r - j)
                j = rmax
            else:
                assert newval == nmax
                indices[j] = newval
                j -= 1
            yield tuple(pool[i] for i in indices)
msg188897 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-05-11 06:51
Thanks Tim :-)
msg196815 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2013-09-02 21:57
I'm closing this.  While it makes a big difference for a cwr coded in Python, it turn out to be minor in C.  The extra complications (more state to remember and update across next() invocations) isn't worth the minor speedup in C.
History
Date User Action Args
2013-09-02 21:57:11tim.peterssetstatus: open -> closed
resolution: rejected
messages: + msg196815

stage: needs patch -> resolved
2013-08-15 22:17:00pitrousetnosy: + serhiy.storchaka
2013-05-11 06:51:28rhettingersetmessages: + msg188897
2013-05-10 19:29:25terry.reedysetnosy: + terry.reedy
2013-05-08 20:48:20tim.peterssetmessages: + msg188735
2013-05-07 21:41:49tim.peterssetmessages: + msg188687
2013-05-07 21:38:43tim.peterscreate