Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch of itertools.{combinations,permutations} for empty combinations #49066

Closed
TFinley mannequin opened this issue Jan 3, 2009 · 22 comments
Closed

Patch of itertools.{combinations,permutations} for empty combinations #49066

TFinley mannequin opened this issue Jan 3, 2009 · 22 comments
Assignees
Labels
stdlib Python modules in the Lib dir tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error

Comments

@TFinley
Copy link
Mannequin

TFinley mannequin commented Jan 3, 2009

BPO 4816
Nosy @rhettinger, @mdickinson
Files
  • itertools-empty-combinations.diff
  • combperm4.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/rhettinger'
    closed_at = <Date 2009-01-08.05:21:20.892>
    created_at = <Date 2009-01-03.10:51:15.007>
    labels = ['tests', 'type-bug', 'library']
    title = 'Patch of itertools.{combinations,permutations} for empty combinations'
    updated_at = <Date 2009-01-08.05:21:20.890>
    user = 'https://bugs.python.org/TFinley'

    bugs.python.org fields:

    activity = <Date 2009-01-08.05:21:20.890>
    actor = 'rhettinger'
    assignee = 'rhettinger'
    closed = True
    closed_date = <Date 2009-01-08.05:21:20.892>
    closer = 'rhettinger'
    components = ['Library (Lib)', 'Tests']
    creation = <Date 2009-01-03.10:51:15.007>
    creator = 'TFinley'
    dependencies = []
    files = ['12563', '12647']
    hgrepos = []
    issue_num = 4816
    keywords = ['patch']
    message_count = 22.0
    messages = ['78943', '78948', '79111', '79116', '79308', '79366', '79367', '79369', '79370', '79371', '79372', '79373', '79374', '79379', '79380', '79381', '79382', '79383', '79386', '79393', '79394', '79399']
    nosy_count = 4.0
    nosy_names = ['rhettinger', 'mark.dickinson', 'LambertDW', 'TFinley']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue4816'
    versions = ['Python 2.6', 'Python 3.0', 'Python 3.1', 'Python 2.7']

    @TFinley
    Copy link
    Mannequin Author

    TFinley mannequin commented Jan 3, 2009

    This is a patch for the Python 3.1 build checked out from
    http://svn.python.org/projects/python/branches/py3k

    The current behavior of itertools.combinations(iterable,r) and
    itertools.permutations(iterable,r) is to throw a ValueError if iterable
    yields fewer than r items. This changes these functions so the
    generator instead yields 0 items.

    As for my argument for acceptance, while the original behavior is not a
    bug insofar as its implementation was deliberate, I'd argue this is
    undesirable on grounds of mathematical consistency and functionality.

    In mathematics the "choose" function is defined as "(n choose r)=0" for
    r>n, so itertools.combinations' behavior is inconsistent with its
    obvious combinatorial analogy. (Similar for permutations and the
    combinatorial "P" function.)

    For functionality I'll cite my own case as anecdote. In writing
    regression tests I wanted to ensure that a group of items obeyed a
    certain "mutually consistent" property between all triples. (Something
    akin to the triangle inequality.) So:

    all(triangle_respected(*triple) for triple in
    itertools.combinations(group, 3))

    If len(group)<=2, that's fine, since with no triangles there is no
    inconsistency, and all(())==True. However, the code failed with a
    ValueError when groups were that small. Working around this was fairly
    awkward, since I wanted to continue to fail noisily if my
    triangle_respected function threw a ValueError, and I wasn't sure at all
    that it wouldn't if my items were

    The patch modifies Modules/itertoolsmodule.c slightly, changing
    combinationsobject_new and permutationsobject_new. (Deleting the error
    throwing code, and have one line changes in both structures to handle
    the n>r case. One could handle this special case more efficiently than
    I do by not bothering to allocate certain structures in this case, but I
    didn't want to tempt fate.) The patch also changes the appropriate
    tests in Lib/test/test_itertools.py .

    Obviously, this would break code that depends upon Python throwing
    ValueError in the event of the combination or permutation sequence being
    empty. However, I hope given that combinations and permutations are a
    relative novelty that their behavior in this corner case is not yet
    etched in stone.

    Sorry if this belongs in a PEP -- it seems quite minor, but I don't
    quite have a feel what the threshold is.

    @TFinley TFinley mannequin added stdlib Python modules in the Lib dir tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error labels Jan 3, 2009
    @mdickinson
    Copy link
    Member

    I agree that the proposed behaviour seems more correct: the collection of
    all subsets of size 4 of range(3) is perfectly valid and well-defined; it
    just happens to be empty. I've also encountered this in practice (in code
    that was enumerating partitions of a list into a fixed number of pieces),
    but in my case it was easier to work around.

    I guess the counterargument is that the current ValueError might catch
    bugs early; I don't know how often this is true in practice.

    In any case, this is Raymond's call.

    @rhettinger
    Copy link
    Contributor

    Will spend a while mulling this over and taking it under advisement.

    Initially, I am disinclined for several reasons.

    1. For many use cases, r>n is an error condition that should not pass
      silently.

    2. While it's possible to make definitions of comb/perm that define the
      r>n case as having an empty result, many definitions do not. See the
      wikipedia article on permutations:

    """
    In general the number of permutations is denoted by P(n, r), nPr, or
    sometimes P_n^r, where:

    * n  is the number of elements available for selection, and
    * r  is the number of elements to be selected (0 ≤ r ≤ n).
    

    For the case where r = n it has just been shown that P(n, r) = n!. The
    general case is given by the formula:

    P(n, r) = n! / (n-r)!.
    

    """

    That discussion is typical and I think it important the number of items
    returned matches the formula (which fails when r>n because (n-r)! would
    call for a negative factorial).

    Also see the wikipedia article on combinations (the "choose" function)
    which also expresses a factorial formula that fails for r>n. No mention
    is made for special handling for r>n.

    1. For cases where you need a more inclusive definition that assigns a
      zero length result to cases where r>n, the workaround is easy.
      Moreover, in such cases, there is some advantage to being explicit about
      those cases being handled. In the example provided by the OP, the
      explicit workaround is:

      all(triangle_respected(*triple) for triple in
      itertools.combinations(group, 3) if len(group)>=3)

    or if you factor-out the unvarying constant expression in the inner-loop:

    len(group)>=3 or all(triangle_respected(*triple) for triple in
    itertools.combinations(group, 3))

    For other cases, it may be preferable to write your own wrapper:

       def myperm(pool, r=None):
          'custom version of perm returning an empty iterator when r>n'
          if r is not None and r > len(pool):
              return iter([])
          return itertools.permutations(pool, r)

    I like this because it is explicit about its intended behavior but
    doesn't slow-down the common case where some work actually gets done.

    1. Don't want to break any existing code that relies on the ValueError
      being thrown. It's too late to change this for 2.6 and 3.0 and possibly
      for 2.7 and 3.1 without having a deprecation period. While this hasn't
      been out long, it will have been once 2.7 is released. Code that relies
      on the ValueError may be hard to spot because it is implicitly relying
      on the ValueError aborting the program for invalid input (invalid in the
      sense of how the result is being used).

    2. I personally find the r>n case to be weird and would have a hard time
      explaining it to statistics students who are counting permutations and
      relating the result back to the factorial formulas.

    On the flip side, I do like Mark's thought that the r>n case being empty
    doesn't muck-up the notion of combinations as returning all "r-length
    subsequences of elements from the input iterable." Also, I can see a
    parallel in the string processing world where substring searches using
    __contains__ simply return False instead of raising an exception when
    the substring is longer that the string being searched.

    Those are my initial thoughts. Will ponder it a bit more this week.

    @rhettinger
    Copy link
    Contributor

    Quick notes so I don't forget:

    The two main pure python equivalents in the docs would need to be
    updated with the new special cases (currently they raise IndexError for
    r>n). The other two pure python equivalents would automatically handle
    the proposed new cases.

    The Mathematica definition of permutations does not discuss or allow for
    the r>n case, http://mathworld.wolfram.com/Permutation.html . One of
    the other tools that actually lists permutations and takes a second
    argument (most do not) is in Mathematica,
    http://reference.wolfram.com/mathematica/ref/Permutations.html . It is
    unclear from their docs whether the r>n case is supported as a
    zero-length output or as an error condition.

    @TFinley
    Copy link
    Mannequin Author

    TFinley mannequin commented Jan 7, 2009

    Hi Raymond, thanks for your well reasoned and thorough reply. Just a
    couple small thoughts... I'll borrow your numbers if you don't mind.

    1. I'd ask you to discount this argument. There are many situations in
      the Python library where empty results are possible return values.
      There are likewise many places in mathematics where a set as defined
      winds up being empty. And sure, absolutely, sometimes this empty result
      will be an error condition. However, right off the top of my head, I
      can't really think of any other place in the Python library where an
      empty returned iterator/list is automatically assumed to necessarily be
      an error.

    The closest I can think of right now might be key/index errors on dicts
    and sequences... but that's a user explicitly asking for exactly one
    element, in which case non-existence would be bad.

    In a larger sense, if a result computed by a function is sensible and
    well defined in itself, it doesn't seem even possible for a function to
    anticipate what answers might be considered error conditions by calling
    code. Potentially any return value for any function might be an error
    for some user code. So... the question then becomes, is the empty
    result a sensible and well defined answer. And here we differ again I
    see. :D

    1. OK on perms. If you don't mind, I'm going to talk about combinations
      since that was my real motivation.

      Also see the wikipedia article on combinations (the
      "choose" function) which also expresses a factorial
      formula that fails for r>n. No mention is made for
      special handling for r>n.

    Hmmm. Are you looking at the definition of "choose function" under
    Wikipedia?
    http://en.wikipedia.org/wiki/Choose_function
    Notice the material after the "and," after the first part of the
    definition, in case you missed it in your first reading. So, this is
    the def I was taught through high school and undergrad, perhaps the same
    for Mark judging from his reply. You were taught it was undefined? It
    does sort of complicate some other defs that rely on binomial
    coefficients though, I'd think, but I guess you could make those work
    with little effort. It's not as though subtly different but
    incompatible definitions of basic terms are that uncommon in math...
    never knew the binomial coef was in that category though. You teach
    stats I take it? I suppose you'd know then... never heard of it, but
    that doesn't mean much. Oh well. (You can perhaps feel my pain at
    seeing the current behavior, given my understanding.)

    1. For your "replacement" myperm (let's say mycomb), if you insert a
      "pool=tuple(iterable)" into the beginning, you'll have my fix, statement
      for statement. You advance this as desirable, while to my eye it was
      nasty enough to motivate me to submit my first patch to Python. So, you
      can imagine we have a slight difference of opinion. ;) Let's see if I
      can't present my view effectively...

    Saying the fix is more explicit suggests there's something implicit
    about the concise solution. If the choose function is defined for those
    values, then itertools.combinations is just unnecessarily forcing code
    to treat something as a special case even when it's not. So, I'd be
    careful to draw a distinction between "more verbose" and "more explicit."

    1. Can't really speak to that. It's probably worth changing
      documentation in any event, even if you reject the patch. The provided
      equiv code and the actual behavior in this "error case" differ anyway as
      you noted in your second reply. Also, this error condition isn't
      evident from the documentation without reading and parsing the source code.

    A third and extraneous comment, if you do wind up changing the docs
    anyway, perhaps a shorter recursive solution would be easier to
    understand? Maybe it's just me since I'm sort of a functional type of
    guy, but I found the equivalent code a little hard to parse.

    1. Yeah. Actually, I'm pretty sure that's why the choose function is
      defined piecewise, since negative factorials are undefined as you say.

    @rhettinger
    Copy link
    Contributor

    1. This is the most important issue. Is the r>n case typically an
      error in the user's program or thought process. If so, then the utility
      of a ValueError can be weighed against the utility of other cases where
      you want combs/perms to automatically handle the r>n cases with an empty
      output. These use cases are what I'm mulling over this week.

    2. I see your wikipedia link. The one I was looking at was:
      http://en.wikipedia.org/wiki/Combination

    3. I wasn't clear on the issue of explictness. The potential problem
      that it isn't immediately obvious to me what comb/perm does when r>n, so
      if I see code that isn't explictly handling that case, I have to lookup
      what comb/perm does do to make sure the code works.

    In the math module, it is a virtue that math.sqrt(-1) raises a
    ValueError. In the cmath module, it is a virtue that it does not. The
    latter case is distinguished because the programmer has explicitly
    requested a routine that can handle complex numbers -- that is a good
    clue that the surrounding code was designed to handle the complex result.

    1. Not too worried about this one. Essentially the thought is that code
      that wasn't designed with the r>n case in mind probably benefits from
      having a ValueError raised when that condition is encountered.

    2. This one bugs me a bit. It is nice to have all the factorial
      formulas just work and not have a piecewise definition.

    BTW, we don't really have a difference of opinion. My mind is open. I
    aspire to pick the option that is the best for most users including
    students and novice users. The trick in language design is to
    anticipate use cases and to understand that people have differing world
    views (i.e. it is a perfectly reasonable point-of-view that taking n
    things r at a time makes no sense when r is greater than n).

    In choosing, there is some bias toward sticking the API as it was
    released. Changing horses in midstream always causes a headache for
    some user somewhere.

    Am still thinking this one through and will let you know in a week or
    so. I also want to continue to research into how this is handled in
    other libraries and other languages.

    @rhettinger
    Copy link
    Contributor

    Another thought before I forget: The piecewise definition of the choose
    function or for binomial coefficients suggests that supporting the r>n
    case should be accompanied by supporting the r<0 case.

    @mdickinson
    Copy link
    Member

    1. This one bugs me a bit. It is nice to have all the factorial
      formulas just work and not have a piecewise definition.

    IIRC, in the 'Concrete Mathematics' book, Knuth and co use something
    like

    (n choose k) = (n-to-the-k-falling)/k!

    to get around this: this definition works for all k >= 0 and all
    integers (or even real numbers) n. Here x-to-the-k-falling means
    x * (x-1) * ... * (x-k+1). See:

    http://mathworld.wolfram.com/FallingFactorial.html

    Another thought before I forget: The piecewise definition of the
    choose function or for binomial coefficients suggests that
    supporting the r>n case should be accompanied by supporting
    the r<0 case.

    I'd say not: in the context of sets of combinations/permutations,
    (rather than just defining the nCr or nPr symbols) r has a
    definite meaning as a cardinality: "enumerate all subsets of
    iter of cardinality r" and so it makes sense to me to restrict to
    the case where r is nonnegative.

    (If we were talking about the integer-valued function nCr, then I'd
    agree.)

    My own guess would be that having combinations(range(n), r)
    give the empty iterator for r > n is the right thing in the
    vast majority of situations, even (especially?) where the
    programmer hasn't anticipated the possibility of r > n.
    I have yet to see a case where it's useful to raise ValueError.

    I have access to the Magma algebra package at work; I'll
    check tomorrow or Friday to see what that does.

    @lambertdw
    Copy link
    Mannequin

    lambertdw mannequin commented Jan 7, 2009

    Mathematica returns an empty list.

    In[1]:= Permutations[{1,2},{1}]

    Out[1]= {{1}, {2}}

    In[2]:= Permutations[{1,2},{4}]

    Out[2]= {}

    In[3]:=

    @rhettinger
    Copy link
    Contributor

    David, thanks for the data point. What does it return for

    In[1]:= Permutations[{a, b, c}, {-1}]

    @lambertdw
    Copy link
    Mannequin

    lambertdw mannequin commented Jan 7, 2009

    Mathematica indicates for the user to define it later. An error.

    In[3]:= Permutations[{1,2},{-2}]

    Permutations::nninfseq:
    Position 2 of Permutations[{1, 2}, {-2}]
    must be All, Infinity, a non-negative integer, or a List whose
    first
    element (required) is a non-negative integer, second element
    (optional)
    is a non-negative integer or Infinity, and third element (optional)
    is a
    nonzero integer.

    Out[4]= Permutations[{1, 2}, {-2}]

    @mdickinson
    Copy link
    Member

    Results from Magma, Sage and Gap:

    Magma provides functions "Subsets(S, k)" and "Permutations(S, k)". From
    the documentation:

    Subsets(S, k) : The set of subsets of the set S of size k. If k is
    larger than the cardinality of S then the result will be empty.

    Permutations(S, k) : The set of permutations (stored as sequences) of
    each of the subsets of the set S of cardinality k.

    GAP has the same behaviour: even if you've never encountered GAP before,
    it's fairly Pythonesque, so the following extract from a GAP 4 REPL
    means exactly what you think it does:

    gap> Combinations([1,2,3,4], 3);
    [ [ 1, 2, 3 ], [ 1, 2, 4 ], [ 1, 3, 4 ], [ 2, 3, 4 ] ]
    gap> Combinations([1,2,3,4], 5);
    [ ]

    Permutations work the same way (but the function is called
    Arrangements):

    gap> Arrangements([1,2,3,4], 3);
    [ [ 1, 2, 3 ], [ 1, 2, 4 ], [ 1, 3, 2 ], [ 1, 3, 4 ], [ 1, 4, 2 ],
    [ 1, 4, 3 ], [ 2, 1, 3 ], [ 2, 1, 4 ], [ 2, 3, 1 ], [ 2, 3, 4 ],
    [ 2, 4, 1 ], [ 2, 4, 3 ], [ 3, 1, 2 ], [ 3, 1, 4 ], [ 3, 2, 1 ],
    [ 3, 2, 4 ], [ 3, 4, 1 ], [ 3, 4, 2 ], [ 4, 1, 2 ], [ 4, 1, 3 ],
    [ 4, 2, 1 ], [ 4, 2, 3 ], [ 4, 3, 1 ], [ 4, 3, 2 ] ]
    gap> Arrangements([1,2,3,4], 5);
    [ ]

    Combinations([1,2,3,4], -1) gives an error. Interestingly,
    Arrangements([1,2,3,4], -1) returns the empty list.

    GAP also has a NrCombinations function returning the number of
    combinations:

    gap> NrCombinations([1,2,3,4], 5);
    0
    gap> NrCombinations([1,2,3,4], -1);
    0

    My Sage installation currently seems to be broken, but from the
    documentation it looks as though it just steals the functions from GAP.

    @mdickinson
    Copy link
    Member

    Got Sage working again. It also returns an empty list for r > n. For r
    negative, Combinations returns an empty list and Permutation gives an
    error.

    sage: Combinations(range(4), 6)
    Combinations of [0, 1, 2, 3] of length 6
    sage: Combinations(range(4), 6).list()
    []
    sage: Permutations(range(4), 6).list()
    []
    sage: Combinations(range(4), -1).list()
    []
    sage: Permutations(range(4), -1).list()
    ---------------------------------------------------------------------------

    IndexError                                Traceback (most recent call last)
    [... traceback snipped ...]

    @rhettinger
    Copy link
    Contributor

    Data point: Microsoft Excel's PERMUT() and COMBIN() return #NUM! for
    r>n or r<0.

    @lambertdw
    Copy link
    Mannequin

    lambertdw mannequin commented Jan 7, 2009

    I try to "not know" excel. Does it have any other means to represent an
    empty set?

    @rhettinger
    Copy link
    Contributor

    Excel is returning the count, not the set itself. So, it could have
    chosen to return zero.

    The HP32SII also has nCr and nPr functions that return INVALID DATA for
    r>n or r<0.

    The Combinatorics section of CRC Standard Mathematical Formulae (30th
    edition) defines nCr and nPr only for 0 <= r <= n. The same is true in
    Harris and Stocker's Handbook of Mathematics and Computational science.

    @rhettinger
    Copy link
    Contributor

    Summary of functions/definitions expected to return a number:

    r>n r<0 Source
    --------- ---------- ------------------------------
    error error MS Excel PERMUT() and COMBIN()
    error error HP32SII nPr and nCr
    undefined undefined CRC Combinatoric definitions
    undefined undefined Handbook of Mathematics and Computational Science
    undefined undefined Wolfram:
    http://mathworld.wolfram.com/Permutation.html
    zero zero http://en.wikipedia.org/wiki/Choose_function
    undefined undefined http://en.wikipedia.org/wiki/Combination
    undefined undefined http://en.wikipedia.org/wiki/Permutation
    zero undefined Knuth's choose function in Concrete Mathematics
    zero zero GAP's nrCombinations()

    Summary for tools that return the individual subsets:

    r>n r<0 Source
    --------- ---------- ------------------------------
    emptylist error Mathematica
    emptylist unknown Magma
    emptylist error GAP Combinations
    emptylist emptylist GAP Arrangements
    emptylist emptylist Sage Combinations
    emptylist error Sage Permutations

    @rhettinger
    Copy link
    Contributor

    David, I know the OPs position and Mark's opinion. What is your
    recommendation for the r>n case and the r<0 case?

    @rhettinger
    Copy link
    Contributor

    Attached an updated patch which expands the tests and docs.
    Handles the r>n case with an empty result.
    Leaves the r<0 case as a ValueError.

    Am thinking that if we do this, it should be treated as a bug and posted
    to 2.6 and 3.0.

    @lambertdw
    Copy link
    Mannequin

    lambertdw mannequin commented Jan 8, 2009

    I had thought highly of the "mull it over for a week" plan. After a
    week we'd decide to follow Stephen Wolfram's lead, which seems to be the
    current patch. I haven't yet used the python permutations iterator,
    although I used to have a script that solved word JUMBLE puzzles with a
    mathematica | spell pipeline. Now I look up words using a sorted
    anagram dictionary.

    @rhettinger
    Copy link
    Contributor

    Thanks David. Looks like we're all in agreement.
    Turns out, I did a weeks worth of mulling today ;-)
    Accepting the proposal with the following rationale:

    1. There exist some valid use cases where r>n.
    2. Returning an empty list is the expected outcome with two of the
      current pure python versions (the ones that filter results from
      product() based on their length, uniqueness and sort).
    3. It is consist with the notion of 'all subsets of length r ...'
    4. It corresponds to the piecewise definitions of the choose function
      and binomial coefficient.
    5. It matches the battle-tested APIs of all of the major mathematics
      tools we looked at (Mathematica, Magma, GAP, and Sage.
    6. The recipe for combinations_with_replacement() is well defined and
      useful in cases where r>n -- it is desirable that all the variants of
      this function keep their APIs in-sync.
    7. The three other posters here all agree on it :-)

    Weighing against it:

    1. The ValueError version has already been released.
    2. There may be use cases where r>n represents an error so that raising
      a ValueError would be helpful.
    3. The length of the comb/perm doesn't correspond to the non-piecewise,
      factorial based functions that are so ubiquitous (MS Excel, HP32SII, CRC
      handbook, Handbook of Mathematics, wikipedia's entries for Combination
      and Permutation, and GAP's nrCombinations() function).

    @rhettinger
    Copy link
    Contributor

    Fixed in r68394
    Will forward port in a day or two.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants