classification
Title: generator expression implementation
Type: Stage:
Components: Interpreter Core Versions:
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: Nosy List: arigo, gvanrossum, hyeshik.chang, jiwon, mwh, quiver, rhettinger
Priority: normal Keywords: patch

Created on 2004-01-07 12:10 by jiwon, last changed 2004-05-21 08:05 by arigo. This issue is now closed.

Files
File name Uploaded Description Edit
genexpr-doc.diff hyeshik.chang, 2004-03-17 05:37 documentation draft for generator expression
gexp.diff.capture jiwon, 2004-03-27 15:56 variable capture version. (without any precomputation of iterables)
gexp.diff.lazy_bind jiwon, 2004-03-27 15:58 patch without any precomputation or early binding.
gexp.diff.precompute-outmost jiwon, 2004-05-15 17:24 lazy-binding + outmost iterable precomputation version
genexpr-doc.diff jiwon, 2004-05-20 16:51 documentation for generator expression. (including precomputation issues)
Messages (62)
msg45146 - (view) Author: Jiwon Seo (jiwon) * Date: 2004-01-07 12:10
Since I was interested in pep 289(generator
expression), I dabbled with it, and implemented a
working version of it. I'm not sure if I did it right,
but maybe someone who knows better can fix it right.

1. Grammar has two changes, which is
 a. atom: '(' [testlist] ')' | '[' [listmaker] ']' | ...

     changes to 

    atom: '(' [testgenexpr] ')' | '[' [listmaker] ']' | ...

     where testgenexpr defines like this.
 
    testgenexpr: test ( gen_for | (',' test)* [','] )

 b. argument: [test '='] test
 
     changes to

    argument: [test '='] test [gen_for]

 (gen_for, gen_if, gen_iter is similar to list_for,
list_if, list_iter respectively.)

2. Instead of changing rule of arglist in Grammar to
accept generator expression, I changed argument rule
like 1.b. This means Grammar accepts generator
expression without parenthesis in function call even
there are several arguments, like

reduce(operator.add, (x for x in range(10))) 

This is against what pep requires, so I made
parsermodule.c and compile.c to catch that and throw
error message when there's more than one argument in a
function. The reason I did it that way is to go around
a problem of ambiguity in the grammar by adding
generator expression to arglist.


3. I implemented generator expression as equivalent to
a call to a function which has variable capturing code
and generator code. For example,

x = 1
g = (x for i in range(10))

is equivalent to

x = 1
def __gen_wrapper():
    _[x] = x # variable capture here
    def __gen():
        for i in range(10):
            yield _[x]
    return __gen()

g = __gen_wrapper()

4. Since I implemented generator expression equivalent
to a function call, symbol table should enter new scope
when it meets generator expression. So I ended up with
adding following code to symtable.c

PyObject *
PySymtableEntry_New(struct symtable *st, char *name,
int type, int lineno)
{

...
        switch (type) {
        case funcdef:
        case lambdef:
!       case testgenexpr: /* generator expression */
!       case argument:    /* generator expression */
                ste->ste_type = TYPE_FUNCTION;
                break;
...

so that generator expression can be dealt as function
type. This is kind of stupid, but I couldn't find other
easy for it, so I just left it like that.

5. this patch does not include diff of
src/Lib/symbol.py, so you must run python Lib/symbol.py
to get it right.
msg45147 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2004-01-07 12:25
Logged In: YES 
user_id=55188

Assigned to me.
The originator is my friend and I have much interest on
this. :-)
msg45148 - (view) Author: Jiwon Seo (jiwon) * Date: 2004-01-08 04:50
Logged In: YES 
user_id=595483

I added diff of Lib/symbol.py, so no need to run python
Lib/symbol.py now.
msg45149 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2004-01-08 05:44
Logged In: YES 
user_id=55188

Okay. I verified that basic characteristics mentioned on PEP
are working.
I started to review the implementation.
msg45150 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2004-01-08 18:50
Logged In: YES 
user_id=4771

We may not need two levels of nested anonymous functions.  It seems to me that the following equivalent code would do, because it captures the variable in an argument instead of via nested scopes:

x = 1
def __gen(x):
  for i in range(10):
    yield x
g = __gen(x)

I don't know though if this is easy to implement in compile.c.  Alternatively:

x = 1
def __gen(x=x):
  for i in range(10):
    yield x
g = __gen()
msg45151 - (view) Author: Jiwon Seo (jiwon) * Date: 2004-01-09 01:49
Logged In: YES 
user_id=595483

What arigo wrote sounds reasonable, and not very difficult
to implement. I'll try to do that way.
msg45152 - (view) Author: Jiwon Seo (jiwon) * Date: 2004-01-12 03:26
Logged In: YES 
user_id=595483

ok. I've implemented capturing variables part as arigo
suggested. File genexpr-capture-vars-in-args.diff is that.
If you look at the code, you'll find that the code for
generator expression is much shorter, and there's lesser
modification in the existing code.
msg45153 - (view) Author: Jiwon Seo (jiwon) * Date: 2004-01-19 11:10
Logged In: YES 
user_id=595483

ok. I modified the patch so that it evaluates iterator expr in 
generator expression creation time. That means, 

g = (x for x in range(10))

is equivalent to

def __gen(iter1):
    for x in iter1:
        yield x
g = __gen(range(10))

so, evalution of range(10) is not deferred anymore.

I also changed testgenexpr to testlist_gexp in 
Grammar/Grammar, since Hye-Shik Chang advised as such.
I'm willing to take any advice about this patch, so please do. 
msg45154 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2004-01-21 14:14
Logged In: YES 
user_id=55188

Okay. My review is done. The revised patch "gexp.diff" looks fine 
for me.
There're few minor tweaks still needed to get into CVS.
- Some error exit cases are ignored. You need to provide safe exit 
paths for MemoryError. (eg. PyList_GetSlice usage of Python/
compiler.c)
- A patch for 'compiler' pure python package. (there's an old 
implementation on SF #795947)
msg45155 - (view) Author: Michael Hudson (mwh) (Python committer) Date: 2004-01-21 18:13
Logged In: YES 
user_id=6656

Documentation would be nice!
msg45156 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2004-01-25 23:44
Logged In: YES 
user_id=55188

BTW, does unavaliability of yield (genexpr) and return
(genexpr) an intended behavior?
msg45157 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2004-01-26 06:17
Logged In: YES 
user_id=55188

Please ignore the previous comment.
It was a result from an old revision of patches. It works on
the recent.
msg45158 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2004-01-26 15:15
Logged In: YES 
user_id=4771

>>> (a for d in a)
Fatal Python error: unknown scope for a in <generator
expression>(1) in <stdin>
symbols: {'a': 2048, '_[1]': 4, 'd': 2}
locals: {'_[1]': 0, 'd': 1}
globals: {}
msg45159 - (view) Author: Jiwon Seo (jiwon) * Date: 2004-01-27 07:44
Logged In: YES 
user_id=595483

I fixed the patch for the bug that arigo mentioned, and for
what perky mentioned - PyList_GetSlice error handling - .

now, I'll tackle python compiler package :)
msg45160 - (view) Author: George Yoshida (quiver) (Python committer) Date: 2004-01-29 11:48
Logged In: YES 
user_id=671362

yet another Fatal Python error;

>>> (a for a in (b for b in (0,1)))
<generator object at 0x40170f8c>
>>> (a for a in (b for b in range(2)))
Fatal Python error: unknown scope for range in ?(0) in <stdin>
symbols: {'range': 8}
locals: {}
globals: {}
 
Aborted

# applied patch as of 2004-01-27
msg45161 - (view) Author: Jiwon Seo (jiwon) * Date: 2004-01-30 03:01
Logged In: YES 
user_id=595483

ok, I've fixed the bug quiver commented, and added related
test code to test_grammar.py.
msg45162 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2004-01-30 05:09
Logged In: YES 
user_id=55188

Yet another Fatal Python error;

>>> (a for a in (b for b in (a for a in 'xxx' if True) if
False) if True)
lookup 'True' in ? 3 -1
freevars of <generator expression>: ('True',)
Fatal Python error: com_make_closure()
zsh: abort (core dumped)  ./python

# applied patch as of 2004-01-30
msg45163 - (view) Author: Jiwon Seo (jiwon) * Date: 2004-02-02 08:29
Logged In: YES 
user_id=595483

ok. I fixed another bug reported by perky, and added related
test to test_grammar.py.
msg45164 - (view) Author: George Yoshida (quiver) (Python committer) Date: 2004-02-02 11:47
Logged In: YES 
user_id=671362

I am not sure if I should call this a bug, but here it goes:

>>> lst = [lambda x:x+y for y in range(3)]
>>> for f in lst:print f(0)
...
2
2
2
>>> gen = (lambda x:x+y for y in range(3))
>>> for f in gen:print f(0)
...
0
1
2

Is this the intended behavior?
msg45165 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2004-02-02 12:37
Logged In: YES 
user_id=55188

I think it's right for namespace difference between listcomp
and genexpr.
msg45166 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2004-02-02 12:58
Logged In: YES 
user_id=4771

The behavior is indeed the one described by the PEP but it is still surprising.  As far as I'm concerned it is yet another reason why free variable bindings ("nested scopes") are done wrong in Python currently :-(

If you use the older trick "lambda x,y=y:x+y" instead, then both cases will agree (with the sensible result in my opinion).  A few more issues and I'll start a campain for definition-time bindings of free variables :-(
msg45167 - (view) Author: George Yoshida (quiver) (Python committer) Date: 2004-02-03 09:07
Logged In: YES 
user_id=671362

Thanks, Arigo and Perky.

Hmm, maybe I should read PEP and the thread about 
namespace more carefully, not just skim the surface.
Anyway, PEP has not been updated since last October, so I 
think it's good time to merge recent progress of genexpr and 
update PEP-289.
msg45168 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2004-02-03 11:46
Logged In: YES 
user_id=4771

After thinking a bit more on the issue, note that the generator version of your example is equivalent to:

def g():
    for y in range(3):
        yield lambda x: x+y

which means that it can suffer from the same problem as the first example, but more subtly (and even more confusingly):

for f in g(): print f(0)         # 0, 1, 2
for f in list(g()): print f(0)   # 2, 2, 2

This is because due to Python's nested scope rules the above generator is equivalent to:

def g():
    result = lambda x: x+y
    for y in range(3):
        yield result

at which point I think it gets pretty confusing...
msg45169 - (view) Author: Jiwon Seo (jiwon) * Date: 2004-02-04 06:13
Logged In: YES 
user_id=595483

Again, I fixed the patch so that it can get error from '(x
for x in None)' immediately. (upto now, error does not occur
until generator expression is evaluated)
msg45170 - (view) Author: George Yoshida (quiver) (Python committer) Date: 2004-02-04 10:45
Logged In: YES 
user_id=671362

What about this code?
In the currently implementation, loop variables inside a list 
comprehension is not visible outside if you use it inside a 
generator expression.
For example:

>>> (a*b for a in [b for b in range(5)])
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
NameError: name 'b' is not defined

Its list comprehension counterpart is:

>>> [a*b for a in [b for b in range(5)]]
[0, 4, 8, 12, 16]
msg45171 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2004-02-04 13:53
Logged In: YES 
user_id=55188

As it mentioned in PEP, even listcomp's leaking of loop
value will be dropped in Python 3. I'm sorry but I don't see
that the usage is solid in the future.
msg45172 - (view) Author: Jiwon Seo (jiwon) * Date: 2004-02-23 16:34
Logged In: YES 
user_id=595483

Whoa! I finally patched compiler package in pure python. (I
was a bit busy recently :)
I've run regression test with 'Tools/compiler/regrtest.py'
before I submit this patch, and there was one failure (which
is test_dis.py). I'm not sure if that's a problem, so I'm
just submitting this in. 

Now, I think I'll refactor things a bit!
msg45173 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-03-13 21:17
Logged In: YES 
user_id=80475

Any recent progress?
msg45174 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2004-03-14 02:59
Logged In: YES 
user_id=55188

I'm writing docs for tutorial and language reference. It'll
be completed in a day or two.
msg45175 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2004-03-16 04:16
Logged In: YES 
user_id=55188

Another bug in the implementation.

>>> list((x, y) for x in 'abcd' for y in 'abcd')
[('a', 'a'), ('a', 'b'), ('a', 'c'), ('a', 'd')]

Expected behavior may be:
>>> [(x, y) for x in 'abcd' for y in 'abcd']
[('a', 'a'), ('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'a'),
('b', 'b'), ('b', 'c'), ('b', 'd'), ('c', 'a'), ('c', 'b'),
('c', 'c'), ('c', 'd'), ('d', 'a'), ('d', 'b'), ('d', 'c'),
('d', 'd')]
msg45176 - (view) Author: Jiwon Seo (jiwon) * Date: 2004-03-16 16:24
Logged In: YES 
user_id=595483

ah, the following bug was due to the patch I uploaded
2004-02-04 15:13.

In order to get an error instantly from an expression like
"g=(x for x in None)", I made it equivalent to,

def __gen(_[1]):
    for x in _[1]:
        yield x
g=__gen(iter(None)) 
                 ^^^^

But when there is two or more iterator expression of same
symbol(or string), that patch produces error, because
currently, "((x, y) for x in 'abcd' for y in 'abcd') " is
equivalent to,

def __gen(_[1]):
    for x in _[1]:
        for y in _[1]:
            yield (x,y)

g = __gen(iter("abcd")) # passing only one parameter

I could make it pass every iterator expressions respectively
 even when they are same symbol(or string, ...), but for
now, I just dropped that patch (patch of 2004-02-04) since
that's the simplest thing now. If somebody says getting
instance error for cases like "(x for x in None)" is
important, I'll make another patch for it.

Btw, added more test case that covers what perky mentioned.
msg45177 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2004-03-17 05:37
Logged In: YES 
user_id=55188

Okay. Here's my draft for documentation.
Any review/fix/enhance is very welcome!
I think it's ready to putting it into CVS now.
msg45178 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-03-17 06:21
Logged In: YES 
user_id=80475

I'll give it a second review.
msg45179 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2004-03-17 06:28
Logged In: YES 
user_id=55188

The compiler package patch has some problems in its
compiler/ast.py currently.
jiwon said he regenerated it using Tools/compiler/astgen.py.
But it made some incompatibilities against ast.py on current
CVS. For example, class Expression.
msg45180 - (view) Author: Jiwon Seo (jiwon) * Date: 2004-03-17 08:45
Logged In: YES 
user_id=595483

To fix the bug that perky picked out, I hand-made ast.py
instead of using auto-generated code. But, still I don't
understand why Tools/compiler/regrtest didn't tell me about
it. (Maybe I didn't look at the output carefully enough.)
Ah, and it would be nice if somebody tell me how to run
test_grammar.py(only) with python-compiler package.
msg45181 - (view) Author: George Yoshida (quiver) (Python committer) Date: 2004-03-19 13:37
Logged In: YES 
user_id=671362

Hi, Jiwon. This is another bug candidate.
If you use genexp with iterators, it doesn't work well.

# test Perky's previous post using iterators
>>> list((x,y) for x in iter('abcd') for y in iter('abcd'))
[('a', 'a'), ('a', 'b'), ('a', 'c'), ('a', 'd')]

Thanks.
msg45182 - (view) Author: Jiwon Seo (jiwon) * Date: 2004-03-21 16:59
Logged In: YES 
user_id=595483

Hi, quiver. I don't think we can easily go around this problem 
if we have to capture iterators in generator expression.
If you run following, you'll know what I mean.

>>> a = iter("abcd")
>>> b = iter("abcd")
>>> [(x, y) for x in a for y in b]
[('a', 'a'), ('a', 'b'), ('a', 'c'), ('a', 'd')]

I think one possible solution could be, instead of passing 
evaluations of iterators in generator expression,  make a list 
from iterator and, then pass it as argument. For instance,

g = (x for x in iter("abcd"))

will be equivalent to,

def __gen(_[1]):
    for x in _[1]:
        yield x
g = __gen(list(iter("abcd"))) # see 'list'

- instead of g = __gen(iter("abcd"))  .

I'm not sure if I'm in a position to decide to do that way or 
not. If the current reviewer (rhettinger) approves it, I'll do 
that way. Or else, I think I will post it on the mailing list.
msg45183 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-03-21 17:28
Logged In: YES 
user_id=80475

You need a solution other than list().  A key to the success
of genexps is to never to eat memory by expanding an
iterator into a container.

For behavior questions, your reference point is the list
comprehension.  list(genexp) should always produce the same
result as a list comprehension.
msg45184 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2004-03-21 17:38
Logged In: YES 
user_id=55188

Agreed. I'd prefer the current implementation's behavor
rather than defined in PEP289. Yes, It's incompatible with
list comprehensions but generator expressions are already
quite different enough from it. :)

How about to change PEP289's genexpr semantics to this?

g = (x**2 for x in range(10))
print g.next()

is equivalent to:

def __gen(_i1):
    for x in _i1:
        yield x**2
g = __gen(range(10))
print g.next()
msg45185 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2004-03-21 17:42
Logged In: YES 
user_id=55188

Aah. But I'm not insisting on that because it's incompatible
even with plain nested for-loops.
msg45186 - (view) Author: Hyeshik Chang (hyeshik.chang) * (Python committer) Date: 2004-03-21 17:55
Logged In: YES 
user_id=55188

This inconsistency may not be fixed by list(iter())
workaround, either.

>>> def counter():
...     counter.count += 1
...     return range(counter.count)
...
>>> counter.count = 0
>>> [y for x in 'abc' for y in counter()]
[0, 0, 1, 0, 1, 2]
>>> counter.count = 0
>>> list(y for x in 'abc' for y in counter())
[0, 0, 0]
msg45187 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2004-03-21 23:40
Logged In: YES 
user_id=4771

Oh ho. Am I right in fearing that the idea of precomputing the iterables was broken in the first place? We just cannot do this. The things we are looping over may depend on other stuff from the generator expression itself. Example:

>>> [x for l in [[1,2],[3,4]] for x in l]
[1, 2, 3, 4]
>>> (y for m in [[1,2],[3,4]] for y in m)
NameError: name 'm' is not defined
msg45188 - (view) Author: Jiwon Seo (jiwon) * Date: 2004-03-22 01:57
Logged In: YES 
user_id=595483

Ah... Maybe we need to drop precomputation of iterables.
I'll post it on the mailing list so that people can consider
about that.
msg45189 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2004-03-22 19:39
Logged In: YES 
user_id=6380

Only the outermost (last) iterable should be precomputed.

While

  (f(x,y) for x in A(y) for y in B)

is not the same as

  ((f(x,y) for x in A(y)) for y in B)

I think the scoping should be done similarly.
msg45190 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2004-03-22 23:00
Logged In: YES 
user_id=6380

Sorry, I meant the first, not the last. I was confused about 
how the 'for' clauses are nested, but the outermost one is 
the first.

So the nesting remark is probably just confusing and wrong; 
ignore it.

What I meant is:

(f(x,y) for x in A() for y in B(x))

should IMO precompute A(), but not B(x). I guess the 
equivalent generator function would be:

def __gen(__outer__=A(), f=f, B=B):
  for x in __outer__:
    for y in B(x):
      yield f(x,y)

In general the value of every free variable used anywhere 
except in the outer expression should be captured; the 
*value* of the outer expression should be captured. This 
should give the least surprising semantics in a variaty of use 
cases.
msg45191 - (view) Author: Jiwon Seo (jiwon) * Date: 2004-03-27 15:56
Logged In: YES 
user_id=595483

I'm submitting two versions now. One is variable-capture
version(but without any iterable pre-computation), and the
other is lazy-binding version. I'll be in military camp for
about 1 month(4/6~5/2) so I will not be able to fix/patch
anything for the period, thus I'm submitting this in now. :)
If I have time, and necessity arises (before 4/6), I'll also
add outmost-iterable-precomputation version.
msg45192 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-05-12 07:32
Logged In: YES 
user_id=80475

The lazy_bind version has survived over of a month of my 
testing and multiple read throughs of the code :-)

I'm not expert with graminit.h but did wonder if testlist_gexp
could have a high numbered code so that other codes would 
not have to be re-numbered.

When you get back from military camp, please load the 
version that precomputes the outermost iterable.  I believe 
that is Guido's currently favored approach.  Currently, it fails 
this test:

>>> x = 10
>>> g = (i*i for i in range(x))
>>> x = 5
>>> list(g)    # should be ten items long
[0, 1, 4, 9, 16] 

Tim, summarized it as:

"""
 ... Guido favored mostly-late binding -- which is late 
binding, except that the iterable in the leftmost for-clause is 
evaluated at once.  So

    ... (e1 for stuff in e2 for morestuff in e3 ...) ...

is effectively replaced by

    def __g(primary):
        for stuff in primary:
           for morestuff in e3:
              ...
                 yield e1

    ... __g(e2) ...

"""

msg45193 - (view) Author: Jiwon Seo (jiwon) * Date: 2004-05-12 08:30
Logged In: YES 
user_id=595483

I don't think that "lazy-binding + leftmost iterable 
precomputation" makes sense. Just adding another for-loop 
to the example shows the reason.

>>> x = 10
>>> g = ((i,j) for i in range(x) for j in range(x))
>>> x = 5
>>> list(g)
Here, if j is iterated only to 5 when i is iterated to 10, I think 
it would make users confused.

I think "early-binding + leftmost iterable precomputation" 
makes sense, but "lazy-binding + leftmost iterable 
precomputation" does not make sense IMHO.

About renumbering testlist_gexp in graminit.h, it's not what I 
did but it's auto-generated by pgen(Parser/pgenmain.c). 
Although it makes another file (symbol.py) need to be 
changed (and thus makes cvs log a bit dirty), it's tool-
generated file, so I did it like that. If you think it's better to 
do it without re-numbering, let me know or you could do it 
yourself.  ;^)
msg45194 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2004-05-12 14:03
Logged In: YES 
user_id=6380

You're just showing that saving genexprs up for later use is
a bad idea. We should place clear warnings in the docs that
the genexpr should be used up right away, and otherwise to
write an explicit generator function.
msg45195 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2004-05-12 15:26
Logged In: YES 
user_id=4771

Which is exactly why I cannot understand why you are so keen on one binding model rather than the other, if it doesn't matter.  It looks like we are going to add is a feature that doesn't scale to advanced usages.  Moreover it seems really obvious that non-experts *will* get surprizes, because they *will* try to replace all their listcomps with genexprs -- they shouldn't do that, but of course they will do it anyway, because it appears to work -- until the occasional strange bug kreeps in, and one that is difficult to actually relate to the genexpr in the first place.
msg45196 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2004-05-12 17:36
Logged In: YES 
user_id=6380

Read back what I wrote on python-dev about capturing free
variables as a new concept with which I am uncomfortable.
There should not be such "advanced" usages of genexprs --
advanced usages are better served by explicit generator
functions.
msg45197 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-05-12 18:54
Logged In: YES 
user_id=80475

Jeremy:  I reviewed the lazy binding version of the patch as
much as I could.  Can you take a brief look at just the
changes to the grammar and at the new functions in
compile.c.  The functions were clearly written and neatly
paralleled list comps but I wondered whether it was possible
and desirable to factor-out the commonalities.

Jiwon:  the graminit.h renumbering is fine (all of the
automatically generated files should be left as-as).

Guido:  I was clear on your rationale for preferring late
binding but did not find wording for the reason behind the
precompution of the innermost iterable. I speculate that the
semantics reflect that this is an expression and that all
expressions (except and/or) are immediate. 
msg45198 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2004-05-13 03:58
Logged In: YES 
user_id=6380

On why the first expr should be evaluated immediately:

Consider sum(x for x in foo()). Now suppose there's a bug in
foo() that raises an exception, and a bug in sum() that
raises an exception before it starts iterating over its
argument. Which exception would you expect to see? I'd be
surprised if the one in sum() was raised rather the one in
foo(), since the call to foo()  is part of the argument to
sum(), and I expect arguments to be processed before the
function is called.

OTOH, in sum(bar(x) for x in foo()), where sum() and foo()
are bugfree, but bar() raises an exception, we have no
choice but to delay the call to bar() until sum() starts
iterating -- that's part of the contract of generators.
(They do nothing until their next() method is first called.)

Ditto for subsequent iterables if nested for loops are used:
in sum(x*y for x in range(10) for y in bar(x)), there are 10
calls to bar(), with arguments supplied by the first (outer)
for loop, and we have no choice but to delay even the first
call to bar() until sum() calls next() on its argument.

I know this isn't the easiest thing to implement, but I
still think it's the right thing. The generator function for
the last example would be something like

def G(arg):
    for x in arg:
        for y in bar(x):
            yield x*y

and the call to sum() would be

sum(G(range(10)))
msg45199 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-05-15 00:31
Logged In: YES 
user_id=80475

Jiwon, do you see how to proceed for here?

Note that given, (for x in exp1 if exp2 for y in exp3 if
exp4), only exp1 is evaluated immediately.  exp2 is deferred.
msg45200 - (view) Author: Jiwon Seo (jiwon) * Date: 2004-05-15 06:08
Logged In: YES 
user_id=595483

Ok. I see.

I've almost finished adding it(outmost iterable 
precomputation) to CPython, and now trying to do it in 
python compiler package. Probably I'll commit patch till 
tomorrow.
msg45201 - (view) Author: Jiwon Seo (jiwon) * Date: 2004-05-15 17:24
Logged In: YES 
user_id=595483

OK. I added outmost-iterable-precomputation feature to
lazy-binding version of the patch, and also added related
tests to test_grammar.py.
msg45202 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-05-19 08:20
Logged In: YES 
user_id=80475

Thank you for the superb contribution.

It passes all of my tests without exception (see the new
file, test_genexps).  I do not see any issues with the coding.

Am accepting and applying the patch except for the
documentation.  

Please update the documentation part of the patch and I will
go through and clarify where necessary.  Also, I will update
the PEP to reflect Guido's rationale for the design
decisions on binding behavior.

msg45203 - (view) Author: Jiwon Seo (jiwon) * Date: 2004-05-20 16:51
Logged In: YES 
user_id=595483

Thank you, and I'm happy to contribute. =)

I updated documentation so that it deals with the
precomputation issue. But I don't know how much I should
write or where, so please review/revise it.
msg45204 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-05-20 17:08
Logged In: YES 
user_id=80475

Jiwon, while working through the doc patch, I noticed a code
generation nit.  The code g=(i for i in range(10)) gets
translated to:

def __gen(exp):
   for i in exp:
       yield i
g = __gen(iter(range(10))
del __gen

On the next to last line, the call to iter() may be
superfluous.  Do 
you see a need for it?  If so, I need to include that in the
spec.  If not, then it should be removed (line 1822 in
compile.c).

In terms of byte codes, the difference shows up in a
comparison to an equivalent generator:

>>> dis(compile('g(range(10))', '', 'eval'))
  0           0 LOAD_NAME                0 (g)
              3 LOAD_NAME                1 (range)
              6 LOAD_CONST               0 (10)
              9 CALL_FUNCTION            1
             12 CALL_FUNCTION            1
             15 RETURN_VALUE 
       
>>> dis(compile('(i for i in range(10))', '', 'eval'))
  0           0 LOAD_CONST               0 (<code object
<generator expression> at 00A4B220, file "", line 1>)
              3 MAKE_FUNCTION            0
              6 LOAD_NAME                0 (range)
              9 LOAD_CONST               1 (10)
             12 CALL_FUNCTION            1
             15 GET_ITER            
             16 CALL_FUNCTION            1
             19 RETURN_VALUE  

The get_iter at code position 15 is what is in question.

msg45205 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-05-20 17:32
Logged In: YES 
user_id=80475

FWIW, here is a patch that moves the GET_ITER back inside
the generator so that the implementation will more closely
match the spec and match the operation of regular generators.

*** compile.c   19 May 2004 08:20:15 -0000      2.302
--- compile.c   20 May 2004 17:29:35 -0000
***************
*** 1628,1633 ****
--- 1628,1634 ----

        if (is_outmost) {
                com_addop_varname(c, VAR_LOAD,
"[outmost-iterable]");
+               com_addbyte(c, GET_ITER);
                com_push(c, 1);
        }
        else {
***************
*** 1819,1825 ****
                        com_addoparg(c, MAKE_FUNCTION, 0);

                com_test(c, CHILD(CHILD(n, 1), 3));
-               com_addbyte(c, GET_ITER);
                com_addoparg(c, CALL_FUNCTION, 1);
                com_pop(c, 1);

--- 1820,1825 ----
msg45206 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-05-20 23:07
Logged In: YES 
user_id=80475

Please disregard the last two posts.  Guido and I agreed
that the current behavior is correct.  I've updated the docs
accordingly and added a test.
msg45207 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2004-05-21 08:05
Logged In: YES 
user_id=4771

Note that FOR_ITER segfaults if it is passed a non-iterable.  This gives us a dangerous code object without invoking the 'new' module.  A compiler-produced code object should probably not allow us to do the following:

>>> def f(): (a for b in c)
>>> f.func_code.co_consts
(None, <code object <generator expression> at xxx>, ...)
>>> co = f.func_code.co_consts[1]
>>> dis.dis(co)
  1         0 SETUP_LOOP         17 (to 20)
            3 LOAD_FAST           0 ([outmost-iterable])
       >>   6 FOR_ITER           10 (to 20)
              ...
>>> f.func_code = co
>>> f(5).next()
Segmentation fault
History
Date User Action Args
2004-01-07 12:10:43jiwoncreate