classification
Title: itertools.product with infinite iterator cause MemoryError.
Type: behavior Stage: committed/rejected
Components: Library (Lib) Versions: Python 3.3, Python 3.2, Python 2.7
process
Status: closed Resolution: invalid
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: Sumudu.Fernando, eric.araujo, falsetru, rhettinger, terry.reedy
Priority: normal Keywords:

Created on 2010-10-15 00:45 by falsetru, last changed 2012-01-19 03:44 by terry.reedy. This issue is now closed.

Files
File name Uploaded Description Edit
product.py Sumudu.Fernando, 2012-01-18 11:01 alt. implementation of cartesian product generator
Messages (6)
msg118735 - (view) Author: Jeong-Min Lee (falsetru) Date: 2010-10-15 00:44
According to the documentation, itertools.product is equivalent to nested for-loops in a generator expression.
But, itertools.product(itertools.count(2010)) is not.

>>> import itertools
>>> (year for year in itertools.count(2010))
<generator object <genexpr> at 0x026367D8>
>>> itertools.product(itertools.count(2010))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError
msg118831 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-10-15 21:18
The input to itertools.product must be a finite sequence of finite iterables. itertools.count(startvalue) produces an infinite sequence of ints (which are not iterable). Passing the latter to the former causes the latter to run until memory is exhausted. You should get the same result with tuple(itertools.count()), which is what happens within itertools.product.

Your generator example does not show the same problem because it is not nested and never runs.

If you are unfamiliar with Python and its stdlib, please ask questions on python-list (mirror on comp.lang.python and gmane.comp.python.general) before reporting what you think might be an error.
msg151533 - (view) Author: Sumudu Fernando (Sumudu.Fernando) Date: 2012-01-18 11:01
I don't agree with the response to this.

It is true that as implemented (at least in 2.7, I don't have 3.x handy to check) itertools.product requires finite iterables.  However this seems to be simply a consequence of the implementation and not part of the "spirit" of the function, which as falsetru pointed out is stated to be "equivalent to nested for-loops in a generator expression".

Indeed, implementing product in Python (in a recursive way) doesn't have this problem.

Perhaps a more convincing set of testcases to show why this could be considered a problem:

>>> import itertools
>>> itertools.product(xrange(100))
<itertools.product object at 0xb7ed334c>
>>> itertools.product(xrange(1000000))
<itertools.product object at 0xb7ed620c>
>>> itertools.product(xrange(1000000000))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError

Note that I'm not even using an infinite iterable, just a really big one.  The issue is that creating the iterator fails with a MemoryError, before I've even asked for any values.  Consider the following:

for (i, v) in enumerate(itertools.product(a, b, c)):
    if i < 1000:
        print v
    else:
        break

When a, b, and c are relatively small, finite iterables, this code works fine.  However, if *any* of them are too large (or infinite), we see a MemoryError before the loop even starts, even though only 1000 elements are required.  I think it's conceivable that we might want something like "a = itertools.cycle(xrange(5))", and even that will break this loop.

That said, in all such cases I could think of, we can always either truncate big iterators before passing them to product, or use zip/comprehensions to add their values into the tuple (or some combination of those).  So maybe it isn't a huge deal.

I've attached my implementation of product which deals with infinite iterators by leveraging enumerate and itertools.cycle, and is pretty much a direct translation of the "odometer" idea.  This doesn't support the "repeat" parameter (but probably could using itertools.tee).  One thing that should be changed is itertools.cycle shouldn't be called / doesn't need to be called on infinite iterators, but I couldn't figure out how to do that.  Maybe there is some way to handle it in the C implementation?)

In summary: the attached implementation of product can accept any mix of infinite / finite iterators, returning a generator intended for partial consumption.  The existing itertools.product doesn't work in this case.
msg151575 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012-01-18 22:55
Proposing an expansion of the definition of product() is a *completely* different issue from the validity of count() as an input. I answered correctly given the current definition of product(): it is not valid input. It is also not valid input to your proposed revision:
>>> tuple(itertools.cycle(enumerate(it)) for it in itertools.count())
...
TypeError: 'int' object is not iterable
-- just as I said.

If you want to propose an enhancement, either open an new, enhancement issue or post to python-ideas. Since new features can only go in 3.3+, post 3.x code, not 2.x. And please do not quibble about the difference between 'infinite' and 'too large to fit in memory'.
msg151579 - (view) Author: Sumudu Fernando (Sumudu.Fernando) Date: 2012-01-18 23:13
>>> tuple(itertools.cycle(enumerate(it)) for it in itertools.count())
  ...
  TypeError: 'int' object is not iterable

That is not what happens in the function, though!  That would correspond to doing product(*itertools.count(2010)), but if you try that you won't even get past argument expansion (obviously).  Doing product(*xrange(10)) gives the error you're talking about, for example.

product(itertools.count(2010)) works perfectly well with the version I posted, though it is a bit silly to do it that way since it produces the same values as count itself (which is what "cartesian product" should do), while saving extra bookkeeping along the way.

Anyway, I'm pretty new to python and I don't think this is quite relevant enough to warrant opening a new ticket.  I'm happy to leave it here for the education of the next neophyte who stumbles across this idiosyncracy of itertools.product.
msg151605 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012-01-19 03:44
A relatively simple change would be to allow the first iterable to be 'infinite', when repeat==1, by not calling tuple() on it. The reason for turning the iterables into concrete sequences is because they might not be reiterable. (cycle() does the same for the same reason.) But since the first iterable is only iterated once, this does not apply to it.

    if repeat == 1:
        pools = [args[0:1]].extend(tuple(pool) for pool in args[1:])
    else:
        pools = [tuple(pool) for pool in args] * repeat

The counter argument to this or any generalized proposal is that one can expand the product() into enough loops to avoid infinite (or very large) args. For example, the following produces '1AA', '1AB', ..., '1EE', '2AA', ... indefinitely.

naa=(''.join((str(n),)+s) for n in itertools.count(1)
     for s in itertools.product(string.ascii_uppercase[0:5], repeat=2))

RAYMOND: Do you think the doc should specify that each iterable must be finite, and that explicit loops are the alternative if not?
History
Date User Action Args
2012-01-19 03:44:25terry.reedysetmessages: + msg151605
2012-01-18 23:13:38Sumudu.Fernandosetmessages: + msg151579
2012-01-18 22:55:13terry.reedysetstage: committed/rejected
messages: + msg151575
versions: + Python 3.3, - Python 3.1
2012-01-18 11:01:33Sumudu.Fernandosetfiles: + product.py
nosy: + Sumudu.Fernando
messages: + msg151533

2010-10-15 21:18:43terry.reedysetstatus: open -> closed
versions: - Python 2.6
nosy: + terry.reedy

messages: + msg118831

resolution: invalid
2010-10-15 14:19:44eric.araujosetnosy: + eric.araujo
2010-10-15 01:25:41benjamin.petersonsetassignee: rhettinger

nosy: + rhettinger
2010-10-15 00:45:00falsetrucreate