classification
Title: re-usable generators / generator expressions should return iterables
Type: enhancement Stage:
Components: Versions:
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: Jae, r.david.murray, svenrahmann
Priority: normal Keywords:

Created on 2009-05-08 21:26 by svenrahmann, last changed 2009-06-30 02:15 by Jae. This issue is now closed.

Files
File name Uploaded Description Edit
reusable_generators.py svenrahmann, 2009-05-08 21:26
Messages (3)
msg87473 - (view) Author: Sven Rahmann (svenrahmann) Date: 2009-05-08 21:26
The syntax of generator expressions suggests that they can be used
similarly to lists (at least when iterated over).
However, as was pointed out to me, the resulting generators are
iterators and can be used only once.
This is inconvenient in situations where some function expects an
iterable argument but needs to iterate over it more than once.

Consider the following function (see also attached file
reusable_generators.py for a complete example)

def secondmax(iterable):
    """return the second largest value in iterable"""
    m = max(iterable)
    return max(i for i in iterable if i<m)

It works fine when passed a list or other iterable container, but
consider the following situation. We have a huge matrix A (list of
lists) and want to pass a column to the function.

Using a list works fine, but requires copying the column's values and
needs additional memory:

col2_list = [a[2] for a in A]  # new list created from column 2

There is no reason why we shouldn't be able to create an iterable object
that returns, one by one, the values from the colums:

col2_gen  = (a[2] for a in A) 

The problem is that secondmax(col2_gen) does not work; try the attached
file: col2_gen can be iterated over only once.

I can imagine many situations where I need or want to iterate over such
a "view" object several times; I don't see a reason why it shouldn't be
possible or why it would be unwanted.

We can do the following, but it is not elegant: Wrap the generator
expression into a closure and a class.

class ReusableGenerator():
    def __init__(self,g): self.g = g
    def __iter__(self):   return self.g()

col2_re = ReusableGenerator(lambda: (a[2] for a in A)) # I want this!

This works, but it is not a generator object (e.g., it doesn't have a
next method). We also need the lambda detour for this to work.

Note that in some situations, the "problem" I describe does not occur or
can be easily circumvented. For example instead of writing

col2 = (a[2] for a in A) 
for x in col2: foo(x)
for x in col2: foo(x) # doesn't work

we could just repeat the generator expression (and create a new iterator
whenever we need it):

for x in (a[2] for a in A): foo(x)
for x in (a[2] for a in A): foo(x) # works fine

But exactly this is impossible if I want to pass the generator
expression or generator function to another function (such as secondmax()). 

I believe this contradicts Python philosophy that functions can be
passed around just like other objects.


My proposal is probably unrealistic, but I would like to see generator
functions and generator expressions change in a way that they return not
iterators, but iterables, so the problem described here does not occur,
and wrapper classes are unnecessary.

In Java that distinction is very clear, in Python less so I think (which
is good because iterators are a pain to use in Java).


Admittedly, I have no idea why generator functions and expressions are
implemented as they are; there are probably lots of good reasons, and it
may not be possible to change this any time soon or at all.
However, I think the change would make Python a more consistent language.
msg87503 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-05-09 18:14
You might be interested to read about this package:

http://www.fiber-space.de/generator_tools/doc/generator_tools.html

For anything to happen in this area you'd need to get some consensus on
python-ideas first.  If you do that, you can open a new ticket
referencing the python-ideas thread (or even reopen this one if that
seems appropriate).
msg89898 - (view) Author: Jae Kwon (Jae) Date: 2009-06-30 02:15
I second this feature request, and will try to get consensus in
python-ideas. 

Meanwhile, here's a sample workaround.

>>> def gen2iterable(genfunc):
...     def wrapper(*args, **kwargs):
...         class _iterable(object):
...             def __iter__(self):
...                 return genfunc(*args, **kwargs)
...         return _iterable()
...     return wrapper
... 
>>> 
>>> @gen2iterable
... def foo():
...   for i in range(10):
...     yield i
... 
>>> a = foo()
>>> 
>>> max(a)
9
>>> max(a)
9
>>> def secondmax(iterable):
...     """return the second largest value in iterable"""
...     m = max(iterable)
...     return max(i for i in iterable if i<m)
... 
>>> secondmax(a)
8
History
Date User Action Args
2009-06-30 02:15:15Jaesetnosy: + Jae
messages: + msg89898
2009-05-09 18:14:26r.david.murraysetstatus: open -> closed
versions: - Python 3.1
nosy: + r.david.murray

messages: + msg87503

resolution: rejected
2009-05-08 21:26:18svenrahmanncreate