Unsupported provider

classification
Title: Patch of itertools.{combinations,permutations} for empty combinations
Type: behavior Stage:
Components: Library (Lib), Tests Versions: Python 3.0, Python 3.1, Python 2.7, Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: LambertDW, TFinley, mark.dickinson, rhettinger
Priority: normal Keywords: patch

Created on 2009-01-03 10:51 by TFinley, last changed 2009-01-08 05:21 by rhettinger. This issue is now closed.

Files
File name Uploaded Description Edit
itertools-empty-combinations.diff TFinley, 2009-01-03 10:51
combperm4.diff rhettinger, 2009-01-08 05:09
Messages (22)
msg78943 - (view) Author: Thomas Finley (TFinley) Date: 2009-01-03 10:51
This is a patch for the Python 3.1 build checked out from
http://svn.python.org/projects/python/branches/py3k

The current behavior of itertools.combinations(iterable,r) and
itertools.permutations(iterable,r) is to throw a ValueError if iterable
yields fewer than r items.  This changes these functions so the
generator instead yields 0 items.

As for my argument for acceptance, while the original behavior is not a
bug insofar as its implementation was deliberate, I'd argue this is
undesirable on grounds of mathematical consistency and functionality.

In mathematics the "choose" function is defined as "(n choose r)=0" for
r>n, so itertools.combinations' behavior is inconsistent with its
obvious combinatorial analogy.  (Similar for permutations and the
combinatorial "P" function.)

For functionality I'll cite my own case as anecdote.  In writing
regression tests I wanted to ensure that a group of items obeyed a
certain "mutually consistent" property between all triples.  (Something
akin to the triangle inequality.)  So:

all(triangle_respected(*triple) for triple in
itertools.combinations(group, 3))

If len(group)<=2, that's fine, since with no triangles there is no
inconsistency, and all(())==True.  However, the code failed with a
ValueError when groups were that small.  Working around this was fairly
awkward, since I wanted to continue to fail noisily if my
triangle_respected function threw a ValueError, and I wasn't sure at all
that it wouldn't if my items were 

The patch modifies Modules/itertoolsmodule.c slightly, changing
combinationsobject_new and permutationsobject_new.  (Deleting the error
throwing code, and have one line changes in both structures to handle
the n>r case.  One could handle this special case more efficiently than
I do by not bothering to allocate certain structures in this case, but I
didn't want to tempt fate.)  The patch also changes the appropriate
tests in Lib/test/test_itertools.py .

Obviously, this would break code that depends upon Python throwing
ValueError in the event of the combination or permutation sequence being
empty.  However, I hope given that combinations and permutations are a
relative novelty that their behavior in this corner case is not yet
etched in stone.

Sorry if this belongs in a PEP -- it seems quite minor, but I don't
quite have a feel what the threshold is.
msg78948 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-01-03 11:53
I agree that the proposed behaviour seems more correct:  the collection of 
all subsets of size 4 of range(3) is perfectly valid and well-defined;  it 
just happens to be empty.  I've also encountered this in practice (in code 
that was enumerating partitions of a list into a fixed number of pieces), 
but in my case it was easier to work around.

I guess the counterargument is that the current ValueError might catch 
bugs early;  I don't know how often this is true in practice.

In any case, this is Raymond's call.
msg79111 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-01-05 08:20
Will spend a while mulling this over and taking it under advisement.

Initially, I am disinclined for several reasons.

1.  For many use cases, r>n is an error condition that should not pass
silently.

2.  While it's possible to make definitions of comb/perm that define the
r>n case as having an empty result, many definitions do not.  See the
wikipedia article on permutations:

"""
In general the number of permutations is denoted by P(n, r), nPr, or
sometimes P_n^r, where:

    * n  is the number of elements available for selection, and
    * r  is the number of elements to be selected (0 ≤ r ≤ n).

For the case where r = n  it has just been shown that P(n, r) = n!. The
general case is given by the formula:

    P(n, r) = n! / (n-r)!.
"""

That discussion is typical and I think it important the number of items
returned matches the formula (which fails when r>n because (n-r)! would
call for a negative factorial).

Also see the wikipedia article on combinations (the "choose" function)
which also expresses a factorial formula that fails for r>n.  No mention
is made for special handling for r>n.

3. For cases where you need a more inclusive definition that assigns a
zero length result to cases where r>n, the workaround is easy. 
Moreover, in such cases, there is some advantage to being explicit about
those cases being handled.  In the example provided by the OP, the
explicit workaround is:

   all(triangle_respected(*triple) for triple in
itertools.combinations(group, 3) if len(group)>=3)

or if you factor-out the unvarying constant expression in the inner-loop:

   len(group)>=3 or all(triangle_respected(*triple) for triple in
itertools.combinations(group, 3))

For other cases, it may be preferable to write your own wrapper:

   def myperm(pool, r=None):
      'custom version of perm returning an empty iterator when r>n'
      if r is not None and r > len(pool):
          return iter([])
      return itertools.permutations(pool, r)

I like this because it is explicit about its intended behavior but
doesn't slow-down the common case where some work actually gets done.

4. Don't want to break any existing code that relies on the ValueError
being thrown.  It's too late to change this for 2.6 and 3.0 and possibly
for 2.7 and 3.1 without having a deprecation period.  While this hasn't
been out long, it will have been once 2.7 is released.  Code that relies
on the ValueError may be hard to spot because it is implicitly relying
on the ValueError aborting the program for invalid input (invalid in the
sense of how the result is being used).

5. I personally find the r>n case to be weird and would have a hard time
explaining it to statistics students who are counting permutations and
relating the result back to the factorial formulas.

On the flip side, I do like Mark's thought that the r>n case being empty
doesn't muck-up the notion of combinations as returning all "r-length
subsequences of elements from the input iterable."  Also, I can see a
parallel in the string processing world where substring searches using
__contains__ simply return False instead of raising an exception when
the substring is longer that the string being searched.

Those are my initial thoughts.  Will ponder it a bit more this week.
msg79116 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-01-05 09:54
Quick notes so I don't forget:

The two main pure python equivalents in the docs would need to be
updated with the new special cases (currently they raise IndexError for
r>n).  The other two pure python equivalents would automatically handle
the proposed new cases.

The Mathematica definition of permutations does not discuss or allow for
the r>n case, http://mathworld.wolfram.com/Permutation.html .  One of
the other tools that actually lists permutations and takes a second
argument (most do not) is in Mathematica,
http://reference.wolfram.com/mathematica/ref/Permutations.html .  It is
unclear from their docs whether the r>n case is supported as a
zero-length output or as an error condition.
msg79308 - (view) Author: Thomas Finley (TFinley) Date: 2009-01-07 09:13
Hi Raymond, thanks for your well reasoned and thorough reply.  Just a
couple small thoughts... I'll borrow your numbers if you don't mind.

1. I'd ask you to discount this argument.  There are many situations in
the Python library where empty results are possible return values. 
There are likewise many places in mathematics where a set as defined
winds up being empty.  And sure, absolutely, sometimes this empty result
will be an error condition.  However, right off the top of my head, I
can't really think of any other place in the Python library where an
empty returned iterator/list is automatically assumed to necessarily be
an error.

The closest I can think of right now might be key/index errors on dicts
and sequences... but that's a user explicitly asking for exactly one
element, in which case non-existence would be bad.

In a larger sense, if a result computed by a function is sensible and
well defined in itself, it doesn't seem even possible for a function to
anticipate what answers might be considered error conditions by calling
code.  Potentially any return value for any function might be an error
for some user code.  So... the question then becomes, is the empty
result a sensible and well defined answer.  And here we differ again I
see.  :D

2. OK on perms.  If you don't mind, I'm going to talk about combinations
since that was my real motivation.

    Also see the wikipedia article on combinations (the
    "choose" function) which also expresses a factorial
    formula that fails for r>n. No mention is made for
    special handling for r>n.

Hmmm.  Are you looking at the definition of "choose function" under
Wikipedia?
     http://en.wikipedia.org/wiki/Choose_function
Notice the material after the "and," after the first part of the
definition, in case you missed it in your first reading.  So, this is
the def I was taught through high school and undergrad, perhaps the same
for Mark judging from his reply.  You were taught it was undefined?  It
does sort of complicate some other defs that rely on binomial
coefficients though, I'd think, but I guess you could make those work
with little effort.  It's not as though subtly different but
incompatible definitions of basic terms are that uncommon in math...
never knew the binomial coef was in that category though.  You teach
stats I take it?  I suppose you'd know then... never heard of it, but
that doesn't mean much.  Oh well.  (You can perhaps feel my pain at
seeing the current behavior, given my understanding.)

3. For your "replacement" myperm (let's say mycomb), if you insert a
"pool=tuple(iterable)" into the beginning, you'll have my fix, statement
for statement.  You advance this as desirable, while to my eye it was
nasty enough to motivate me to submit my first patch to Python.  So, you
can imagine we have a slight difference of opinion.  ;)  Let's see if I
can't present my view effectively...

Saying the fix is more explicit suggests there's something implicit
about the concise solution.  If the choose function is defined for those
values, then itertools.combinations is just unnecessarily forcing code
to treat something as a special case even when it's not.  So, I'd be
careful to draw a distinction between "more verbose" and "more explicit."

4. Can't really speak to that.  It's probably worth changing
documentation in any event, even if you reject the patch.  The provided
equiv code and the actual behavior in this "error case" differ anyway as
you noted in your second reply.  Also, this error condition isn't
evident from the documentation without reading and parsing the source code.

A third and extraneous comment, if you do wind up changing the docs
anyway, perhaps a shorter recursive solution would be easier to
understand?  Maybe it's just me since I'm sort of a functional type of
guy, but I found the equivalent code a little hard to parse.

5. Yeah.  Actually, I'm pretty sure that's why the choose function is
defined piecewise, since negative factorials are undefined as you say.
msg79366 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-01-07 20:06
1.  This is the most important issue.  Is the r>n case typically an
error in the user's program or thought process.  If so, then the utility
of a ValueError can be weighed against the utility of other cases where
you want combs/perms to automatically handle the r>n cases with an empty
output.  These use cases are what I'm mulling over this week.

2. I see your wikipedia link.  The one I was looking at was:
   http://en.wikipedia.org/wiki/Combination

3. I wasn't clear on the issue of explictness.  The potential problem
that it isn't immediately obvious to me what comb/perm does when r>n, so
if I see code that isn't explictly handling that case, I have to lookup
what comb/perm does do to make sure the code works.

In the math module, it is a virtue that math.sqrt(-1) raises a
ValueError.  In the cmath module, it is a virtue that it does not.  The
latter case is distinguished because the programmer has explicitly
requested a routine that can handle complex numbers -- that is a good
clue that the surrounding code was designed to handle the complex result.

4. Not too worried about this one.  Essentially the thought is that code
that wasn't designed with the r>n case in mind probably benefits from
having a ValueError raised when that condition is encountered.
   
5. This one bugs me a bit.  It is nice to have all the factorial   
formulas just work and not have a piecewise definition.

BTW, we don't really have a difference of opinion.  My mind is open.  I
aspire to pick the option that is the best for most users including
students and novice users.  The trick in language design is to
anticipate use cases and to understand that people have differing world
views (i.e. it is a perfectly reasonable point-of-view that taking n
things r at a time makes no sense when r is greater than n).

In choosing, there is some bias toward sticking the API as it was
released.  Changing horses in midstream always causes a headache for
some user somewhere.

Am still thinking this one through and will let you know in a week or
so.  I also want to continue to research into how this is handled in
other libraries and other languages.
msg79367 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-01-07 20:20
Another thought before I forget:  The piecewise definition of the choose
function or for binomial coefficients suggests that supporting the r>n
case should be accompanied by supporting the r<0 case.
msg79369 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-01-07 20:34
> 5. This one bugs me a bit.  It is nice to have all the factorial
> formulas just work and not have a piecewise definition.

IIRC, in the 'Concrete Mathematics' book, Knuth and co use something
like

(n choose k) = (n-to-the-k-falling)/k!

to get around this:  this definition works for all k >= 0 and all
integers (or even real numbers) n.  Here x-to-the-k-falling means
x * (x-1) * ... * (x-k+1).  See:

http://mathworld.wolfram.com/FallingFactorial.html

> Another thought before I forget:  The piecewise definition of the
> choose function or for binomial coefficients suggests that
> supporting the r>n case should be accompanied by supporting 
> the r<0 case.

I'd say not:  in the context of sets of combinations/permutations,
(rather than just defining the nCr or nPr symbols) r has a
definite meaning as a cardinality:   "enumerate all subsets of
iter of cardinality r" and so it makes sense to me to restrict to
the case where r is nonnegative.

(If we were talking about the integer-valued function nCr, then I'd
agree.)

My own guess would be that having combinations(range(n), r)
give the empty iterator for r > n is the right thing in the
vast majority of situations, even (especially?) where the
programmer hasn't anticipated the possibility of r > n.
I have yet to see a case where it's useful to raise ValueError.

I have access to the Magma algebra package at work;  I'll
check tomorrow or Friday to see what that does.
msg79370 - (view) Author: David W. Lambert (LambertDW) Date: 2009-01-07 20:58
Mathematica returns an empty list.

In[1]:= Permutations[{1,2},{1}]

Out[1]= {{1}, {2}}

In[2]:= Permutations[{1,2},{4}]

Out[2]= {}

In[3]:=
msg79371 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-01-07 21:02
David, thanks for the data point.  What does it return for 

 In[1]:= Permutations[{a, b, c}, {-1}]
msg79372 - (view) Author: David W. Lambert (LambertDW) Date: 2009-01-07 21:08
Mathematica indicates for the user to define it later.  An error.

In[3]:= Permutations[{1,2},{-2}]

Permutations::nninfseq: 
   Position 2 of Permutations[{1, 2}, {-2}]
     must be All, Infinity, a non-negative integer, or a List whose 
first
     element (required) is a non-negative integer, second element 
(optional)
     is a non-negative integer or Infinity, and third element (optional) 
is a
     nonzero integer.

Out[4]= Permutations[{1, 2}, {-2}]
msg79373 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-01-07 21:10
Results from Magma, Sage and Gap:

Magma provides functions "Subsets(S, k)" and "Permutations(S, k)".  From 
the documentation:

Subsets(S, k) : The set of subsets of the set S of size k. If k is 
larger than the cardinality of S then the result will be empty.

Permutations(S, k) : The set of permutations (stored as sequences) of 
each of the subsets of the set S of cardinality k.

GAP has the same behaviour: even if you've never encountered GAP before, 
it's fairly Pythonesque, so the following extract from a GAP 4 REPL
means exactly what you think it does:

gap> Combinations([1,2,3,4], 3);
[ [ 1, 2, 3 ], [ 1, 2, 4 ], [ 1, 3, 4 ], [ 2, 3, 4 ] ]
gap> Combinations([1,2,3,4], 5);
[  ]

Permutations work the same way (but the function is called 
Arrangements):

gap> Arrangements([1,2,3,4], 3);
[ [ 1, 2, 3 ], [ 1, 2, 4 ], [ 1, 3, 2 ], [ 1, 3, 4 ], [ 1, 4, 2 ], 
  [ 1, 4, 3 ], [ 2, 1, 3 ], [ 2, 1, 4 ], [ 2, 3, 1 ], [ 2, 3, 4 ], 
  [ 2, 4, 1 ], [ 2, 4, 3 ], [ 3, 1, 2 ], [ 3, 1, 4 ], [ 3, 2, 1 ], 
  [ 3, 2, 4 ], [ 3, 4, 1 ], [ 3, 4, 2 ], [ 4, 1, 2 ], [ 4, 1, 3 ], 
  [ 4, 2, 1 ], [ 4, 2, 3 ], [ 4, 3, 1 ], [ 4, 3, 2 ] ]
gap> Arrangements([1,2,3,4], 5);
[  ]

Combinations([1,2,3,4], -1) gives an error.  Interestingly, 
Arrangements([1,2,3,4], -1) returns the empty list.

GAP also has a NrCombinations function returning the number of 
combinations:

gap> NrCombinations([1,2,3,4], 5);
0
gap> NrCombinations([1,2,3,4], -1);
0

My Sage installation currently seems to be broken, but from the 
documentation it looks as though it just steals the functions from GAP.
msg79374 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2009-01-07 21:16
Got Sage working again.  It also returns an empty list for r > n.  For r 
negative, Combinations returns an empty list and Permutation gives an 
error.

sage: Combinations(range(4), 6)
Combinations of [0, 1, 2, 3] of length 6
sage: Combinations(range(4), 6).list()
[]
sage: Permutations(range(4), 6).list()
[]
sage: Combinations(range(4), -1).list()
[]
sage: Permutations(range(4), -1).list()
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
[... traceback snipped ...]
msg79379 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-01-07 22:37
Data point:  Microsoft Excel's PERMUT() and COMBIN() return #NUM! for
r>n or r<0.
msg79380 - (view) Author: David W. Lambert (LambertDW) Date: 2009-01-07 22:40
I try to "not know" excel.  Does it have any other means to represent an 
empty set?
msg79381 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-01-07 22:53
Excel is returning the count, not the set itself.  So, it could have
chosen to return zero.

The HP32SII also has nCr and nPr functions that return INVALID DATA for
r>n or r<0.

The Combinatorics section of CRC Standard Mathematical Formulae (30th
edition) defines nCr and nPr only for 0 <= r <= n.  The same is true in
Harris and Stocker's Handbook of Mathematics and Computational science.
msg79382 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-01-07 23:26
Summary of functions/definitions expected to return a number:           

  r>n       r<0       Source
--------- ----------  ------------------------------
error      error      MS Excel PERMUT() and COMBIN() 
error      error      HP32SII nPr and nCr    
undefined  undefined  CRC Combinatoric definitions   
undefined  undefined  Handbook of Mathematics and Computational Science  
undefined  undefined  Wolfram:  
http://mathworld.wolfram.com/Permutation.html
zero       zero       http://en.wikipedia.org/wiki/Choose_function  
undefined  undefined  http://en.wikipedia.org/wiki/Combination   
undefined  undefined  http://en.wikipedia.org/wiki/Permutation   
zero       undefined  Knuth's choose function in Concrete Mathematics    
zero       zero       GAP's nrCombinations()


Summary for tools that return the individual subsets:           

  r>n       r<0       Source
--------- ----------  ------------------------------
emptylist  error      Mathematica
emptylist  unknown    Magma
emptylist  error      GAP Combinations
emptylist  emptylist  GAP Arrangements
emptylist  emptylist  Sage Combinations
emptylist  error      Sage Permutations
msg79383 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-01-07 23:35
David, I know the OPs position and Mark's opinion.  What is your
recommendation for the r>n case and the r<0 case?
msg79386 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-01-08 00:48
Attached an updated patch which expands the tests and docs.
Handles the r>n case with an empty result.
Leaves the r<0 case as a ValueError.

Am thinking that if we do this, it should be treated as a bug and posted
to 2.6 and 3.0.
msg79393 - (view) Author: David W. Lambert (LambertDW) Date: 2009-01-08 01:59
I had thought highly of the "mull it over for a week" plan.  After a 
week we'd decide to follow Stephen Wolfram's lead, which seems to be the 
current patch.  I haven't yet used the python permutations iterator, 
although I used to have a script that solved word JUMBLE puzzles with a 
mathematica | spell pipeline.  Now I look up words using a sorted 
anagram dictionary.
msg79394 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-01-08 02:20
Thanks David. Looks like we're all in agreement.
Turns out, I did a weeks worth of mulling today ;-)
Accepting the proposal with the following rationale:

1. There exist some valid use cases where r>n.
2. Returning an empty list is the expected outcome with two of the
current pure python versions (the ones that filter results from
product() based on their length, uniqueness and sort).
3. It is consist with the notion of 'all subsets of length r ...'
4. It corresponds to the piecewise definitions of the choose function
and binomial coefficient.
5. It matches the battle-tested APIs of all of the major mathematics
tools we looked at (Mathematica, Magma, GAP, and Sage.
6. The recipe for combinations_with_replacement() is well defined and
useful in cases where r>n -- it is desirable that all the variants of
this function keep their APIs in-sync.
7. The three other posters here all agree on it :-)

Weighing against it:
1. The ValueError version has already been released.
2. There may be use cases where r>n represents an error so that raising
a ValueError would be helpful.
3. The length of the comb/perm doesn't correspond to the non-piecewise,
factorial based functions that are so ubiquitous (MS Excel, HP32SII, CRC
handbook, Handbook of Mathematics, wikipedia's entries for Combination
and Permutation, and GAP's nrCombinations() function).
msg79399 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-01-08 05:21
Fixed in r68394 
Will forward port in a day or two.
History
Date User Action Args
2009-01-08 05:21:20rhettingersetstatus: open -> closed
resolution: accepted -> fixed
messages: + msg79399
2009-01-08 05:09:40rhettingersetfiles: - combperm3.diff
2009-01-08 05:09:34rhettingersetfiles: + combperm4.diff
2009-01-08 02:44:40rhettingersetfiles: - combperm2.diff
2009-01-08 02:44:35rhettingersetfiles: + combperm3.diff
2009-01-08 02:20:17rhettingersetresolution: accepted
messages: + msg79394
versions: + Python 2.6, Python 3.0, Python 2.7
2009-01-08 01:59:09LambertDWsetmessages: + msg79393
2009-01-08 01:47:31rhettingersetfiles: - combperm.diff
2009-01-08 01:47:26rhettingersetfiles: + combperm2.diff
2009-01-08 00:48:43rhettingersetfiles: + combperm.diff
messages: + msg79386
2009-01-07 23:35:15rhettingersetmessages: + msg79383
2009-01-07 23:26:41rhettingersetmessages: + msg79382
2009-01-07 22:53:05rhettingersetmessages: + msg79381
2009-01-07 22:40:35LambertDWsetmessages: + msg79380
2009-01-07 22:37:11rhettingersetmessages: + msg79379
2009-01-07 21:16:31mark.dickinsonsetmessages: + msg79374
2009-01-07 21:10:15mark.dickinsonsetmessages: + msg79373
2009-01-07 21:08:54LambertDWsetmessages: + msg79372
2009-01-07 21:02:39rhettingersetmessages: + msg79371
2009-01-07 20:58:15LambertDWsetnosy: + LambertDW
messages: + msg79370
2009-01-07 20:35:01mark.dickinsonsetmessages: + msg79369
2009-01-07 20:20:02rhettingersetmessages: + msg79367
2009-01-07 20:06:24rhettingersetmessages: + msg79366
2009-01-07 09:13:04TFinleysetmessages: + msg79308
2009-01-05 09:54:01rhettingersetmessages: + msg79116
2009-01-05 08:20:12rhettingersetmessages: + msg79111
2009-01-03 11:53:45mark.dickinsonsetassignee: rhettinger
messages: + msg78948
nosy: + mark.dickinson, rhettinger
2009-01-03 10:51:15TFinleycreate