Issue10109
Created on 2010-10-15 00:45 by falsetru, last changed 2012-01-19 03:44 by terry.reedy. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| product.py | Sumudu.Fernando, 2012-01-18 11:01 | alt. implementation of cartesian product generator | ||
| Messages (6) | |||
|---|---|---|---|
| msg118735 - (view) | Author: Jeong-Min Lee (falsetru) | Date: 2010-10-15 00:44 | |
According to the documentation, itertools.product is equivalent to nested for-loops in a generator expression. But, itertools.product(itertools.count(2010)) is not. >>> import itertools >>> (year for year in itertools.count(2010)) <generator object <genexpr> at 0x026367D8> >>> itertools.product(itertools.count(2010)) Traceback (most recent call last): File "<stdin>", line 1, in <module> MemoryError |
|||
| msg118831 - (view) | Author: Terry J. Reedy (terry.reedy) * ![]() |
Date: 2010-10-15 21:18 | |
The input to itertools.product must be a finite sequence of finite iterables. itertools.count(startvalue) produces an infinite sequence of ints (which are not iterable). Passing the latter to the former causes the latter to run until memory is exhausted. You should get the same result with tuple(itertools.count()), which is what happens within itertools.product. Your generator example does not show the same problem because it is not nested and never runs. If you are unfamiliar with Python and its stdlib, please ask questions on python-list (mirror on comp.lang.python and gmane.comp.python.general) before reporting what you think might be an error. |
|||
| msg151533 - (view) | Author: Sumudu Fernando (Sumudu.Fernando) | Date: 2012-01-18 11:01 | |
I don't agree with the response to this.
It is true that as implemented (at least in 2.7, I don't have 3.x handy to check) itertools.product requires finite iterables. However this seems to be simply a consequence of the implementation and not part of the "spirit" of the function, which as falsetru pointed out is stated to be "equivalent to nested for-loops in a generator expression".
Indeed, implementing product in Python (in a recursive way) doesn't have this problem.
Perhaps a more convincing set of testcases to show why this could be considered a problem:
>>> import itertools
>>> itertools.product(xrange(100))
<itertools.product object at 0xb7ed334c>
>>> itertools.product(xrange(1000000))
<itertools.product object at 0xb7ed620c>
>>> itertools.product(xrange(1000000000))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
MemoryError
Note that I'm not even using an infinite iterable, just a really big one. The issue is that creating the iterator fails with a MemoryError, before I've even asked for any values. Consider the following:
for (i, v) in enumerate(itertools.product(a, b, c)):
if i < 1000:
print v
else:
break
When a, b, and c are relatively small, finite iterables, this code works fine. However, if *any* of them are too large (or infinite), we see a MemoryError before the loop even starts, even though only 1000 elements are required. I think it's conceivable that we might want something like "a = itertools.cycle(xrange(5))", and even that will break this loop.
That said, in all such cases I could think of, we can always either truncate big iterators before passing them to product, or use zip/comprehensions to add their values into the tuple (or some combination of those). So maybe it isn't a huge deal.
I've attached my implementation of product which deals with infinite iterators by leveraging enumerate and itertools.cycle, and is pretty much a direct translation of the "odometer" idea. This doesn't support the "repeat" parameter (but probably could using itertools.tee). One thing that should be changed is itertools.cycle shouldn't be called / doesn't need to be called on infinite iterators, but I couldn't figure out how to do that. Maybe there is some way to handle it in the C implementation?)
In summary: the attached implementation of product can accept any mix of infinite / finite iterators, returning a generator intended for partial consumption. The existing itertools.product doesn't work in this case.
|
|||
| msg151575 - (view) | Author: Terry J. Reedy (terry.reedy) * ![]() |
Date: 2012-01-18 22:55 | |
Proposing an expansion of the definition of product() is a *completely* different issue from the validity of count() as an input. I answered correctly given the current definition of product(): it is not valid input. It is also not valid input to your proposed revision: >>> tuple(itertools.cycle(enumerate(it)) for it in itertools.count()) ... TypeError: 'int' object is not iterable -- just as I said. If you want to propose an enhancement, either open an new, enhancement issue or post to python-ideas. Since new features can only go in 3.3+, post 3.x code, not 2.x. And please do not quibble about the difference between 'infinite' and 'too large to fit in memory'. |
|||
| msg151579 - (view) | Author: Sumudu Fernando (Sumudu.Fernando) | Date: 2012-01-18 23:13 | |
>>> tuple(itertools.cycle(enumerate(it)) for it in itertools.count()) ... TypeError: 'int' object is not iterable That is not what happens in the function, though! That would correspond to doing product(*itertools.count(2010)), but if you try that you won't even get past argument expansion (obviously). Doing product(*xrange(10)) gives the error you're talking about, for example. product(itertools.count(2010)) works perfectly well with the version I posted, though it is a bit silly to do it that way since it produces the same values as count itself (which is what "cartesian product" should do), while saving extra bookkeeping along the way. Anyway, I'm pretty new to python and I don't think this is quite relevant enough to warrant opening a new ticket. I'm happy to leave it here for the education of the next neophyte who stumbles across this idiosyncracy of itertools.product. |
|||
| msg151605 - (view) | Author: Terry J. Reedy (terry.reedy) * ![]() |
Date: 2012-01-19 03:44 | |
A relatively simple change would be to allow the first iterable to be 'infinite', when repeat==1, by not calling tuple() on it. The reason for turning the iterables into concrete sequences is because they might not be reiterable. (cycle() does the same for the same reason.) But since the first iterable is only iterated once, this does not apply to it.
if repeat == 1:
pools = [args[0:1]].extend(tuple(pool) for pool in args[1:])
else:
pools = [tuple(pool) for pool in args] * repeat
The counter argument to this or any generalized proposal is that one can expand the product() into enough loops to avoid infinite (or very large) args. For example, the following produces '1AA', '1AB', ..., '1EE', '2AA', ... indefinitely.
naa=(''.join((str(n),)+s) for n in itertools.count(1)
for s in itertools.product(string.ascii_uppercase[0:5], repeat=2))
RAYMOND: Do you think the doc should specify that each iterable must be finite, and that explicit loops are the alternative if not?
|
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2012-01-19 03:44:25 | terry.reedy | set | messages: + msg151605 |
| 2012-01-18 23:13:38 | Sumudu.Fernando | set | messages: + msg151579 |
| 2012-01-18 22:55:13 | terry.reedy | set | stage: committed/rejected messages: + msg151575 versions: + Python 3.3, - Python 3.1 |
| 2012-01-18 11:01:33 | Sumudu.Fernando | set | files:
+ product.py nosy: + Sumudu.Fernando messages: + msg151533 |
| 2010-10-15 21:18:43 | terry.reedy | set | status: open -> closed versions: - Python 2.6 nosy: + terry.reedy messages: + msg118831 resolution: invalid |
| 2010-10-15 14:19:44 | eric.araujo | set | nosy:
+ eric.araujo |
| 2010-10-15 01:25:41 | benjamin.peterson | set | assignee: rhettinger nosy: + rhettinger |
| 2010-10-15 00:45:00 | falsetru | create | |
