Issue 872326: generator expression implementation

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/39778

classification

Title:	generator expression implementation
Type:		Stage:
Components:	Interpreter Core	Versions:

process

Status:	closed	Resolution:	accepted
Dependencies:		Superseder:
Assigned To:		Nosy List:	arigo, gvanrossum, hyeshik.chang, jiwon, mwh, quiver, rhettinger
Priority:	normal	Keywords:	patch

Created on 2004-01-07 12:10 by jiwon, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
genexpr-doc.diff	hyeshik.chang, 2004-03-17 05:37	documentation draft for generator expression
gexp.diff.capture	jiwon, 2004-03-27 15:56	variable capture version. (without any precomputation of iterables)
gexp.diff.lazy_bind	jiwon, 2004-03-27 15:58	patch without any precomputation or early binding.
gexp.diff.precompute-outmost	jiwon, 2004-05-15 17:24	lazy-binding + outmost iterable precomputation version
genexpr-doc.diff	jiwon, 2004-05-20 16:51	documentation for generator expression. (including precomputation issues)

Messages (62)
msg45146 - (view)	Author: Jiwon Seo (jiwon) *	Date: 2004-01-07 12:10
Since I was interested in pep 289(generator expression), I dabbled with it, and implemented a working version of it. I'm not sure if I did it right, but maybe someone who knows better can fix it right. 1. Grammar has two changes, which is a. atom: '(' [testlist] ')' \| '[' [listmaker] ']' \| ... changes to atom: '(' [testgenexpr] ')' \| '[' [listmaker] ']' \| ... where testgenexpr defines like this. testgenexpr: test ( gen_for \| (',' test)* [','] ) b. argument: [test '='] test changes to argument: [test '='] test [gen_for] (gen_for, gen_if, gen_iter is similar to list_for, list_if, list_iter respectively.) 2. Instead of changing rule of arglist in Grammar to accept generator expression, I changed argument rule like 1.b. This means Grammar accepts generator expression without parenthesis in function call even there are several arguments, like reduce(operator.add, (x for x in range(10))) This is against what pep requires, so I made parsermodule.c and compile.c to catch that and throw error message when there's more than one argument in a function. The reason I did it that way is to go around a problem of ambiguity in the grammar by adding generator expression to arglist. 3. I implemented generator expression as equivalent to a call to a function which has variable capturing code and generator code. For example, x = 1 g = (x for i in range(10)) is equivalent to x = 1 def __gen_wrapper(): _[x] = x # variable capture here def __gen(): for i in range(10): yield _[x] return __gen() g = __gen_wrapper() 4. Since I implemented generator expression equivalent to a function call, symbol table should enter new scope when it meets generator expression. So I ended up with adding following code to symtable.c PyObject * PySymtableEntry_New(struct symtable st, char name, int type, int lineno) { ... switch (type) { case funcdef: case lambdef: ! case testgenexpr: /* generator expression / ! case argument: / generator expression */ ste->ste_type = TYPE_FUNCTION; break; ... so that generator expression can be dealt as function type. This is kind of stupid, but I couldn't find other easy for it, so I just left it like that. 5. this patch does not include diff of src/Lib/symbol.py, so you must run python Lib/symbol.py to get it right.
msg45147 - (view)	Author: Hyeshik Chang (hyeshik.chang) *	Date: 2004-01-07 12:25
Logged In: YES user_id=55188 Assigned to me. The originator is my friend and I have much interest on this. :-)
msg45148 - (view)	Author: Jiwon Seo (jiwon) *	Date: 2004-01-08 04:50
Logged In: YES user_id=595483 I added diff of Lib/symbol.py, so no need to run python Lib/symbol.py now.
msg45149 - (view)	Author: Hyeshik Chang (hyeshik.chang) *	Date: 2004-01-08 05:44
Logged In: YES user_id=55188 Okay. I verified that basic characteristics mentioned on PEP are working. I started to review the implementation.
msg45150 - (view)	Author: Armin Rigo (arigo) *	Date: 2004-01-08 18:50
Logged In: YES user_id=4771 We may not need two levels of nested anonymous functions. It seems to me that the following equivalent code would do, because it captures the variable in an argument instead of via nested scopes: x = 1 def __gen(x): for i in range(10): yield x g = __gen(x) I don't know though if this is easy to implement in compile.c. Alternatively: x = 1 def __gen(x=x): for i in range(10): yield x g = __gen()
msg45151 - (view)	Author: Jiwon Seo (jiwon) *	Date: 2004-01-09 01:49
Logged In: YES user_id=595483 What arigo wrote sounds reasonable, and not very difficult to implement. I'll try to do that way.
msg45152 - (view)	Author: Jiwon Seo (jiwon) *	Date: 2004-01-12 03:26
Logged In: YES user_id=595483 ok. I've implemented capturing variables part as arigo suggested. File genexpr-capture-vars-in-args.diff is that. If you look at the code, you'll find that the code for generator expression is much shorter, and there's lesser modification in the existing code.
msg45153 - (view)	Author: Jiwon Seo (jiwon) *	Date: 2004-01-19 11:10
Logged In: YES user_id=595483 ok. I modified the patch so that it evaluates iterator expr in generator expression creation time. That means, g = (x for x in range(10)) is equivalent to def __gen(iter1): for x in iter1: yield x g = __gen(range(10)) so, evalution of range(10) is not deferred anymore. I also changed testgenexpr to testlist_gexp in Grammar/Grammar, since Hye-Shik Chang advised as such. I'm willing to take any advice about this patch, so please do.
msg45154 - (view)	Author: Hyeshik Chang (hyeshik.chang) *	Date: 2004-01-21 14:14
Logged In: YES user_id=55188 Okay. My review is done. The revised patch "gexp.diff" looks fine for me. There're few minor tweaks still needed to get into CVS. - Some error exit cases are ignored. You need to provide safe exit paths for MemoryError. (eg. PyList_GetSlice usage of Python/ compiler.c) - A patch for 'compiler' pure python package. (there's an old implementation on SF #795947)
msg45155 - (view)	Author: Michael Hudson (mwh)	Date: 2004-01-21 18:13
Logged In: YES user_id=6656 Documentation would be nice!
msg45156 - (view)	Author: Hyeshik Chang (hyeshik.chang) *	Date: 2004-01-25 23:44
Logged In: YES user_id=55188 BTW, does unavaliability of yield (genexpr) and return (genexpr) an intended behavior?
msg45157 - (view)	Author: Hyeshik Chang (hyeshik.chang) *	Date: 2004-01-26 06:17
Logged In: YES user_id=55188 Please ignore the previous comment. It was a result from an old revision of patches. It works on the recent.
msg45158 - (view)	Author: Armin Rigo (arigo) *	Date: 2004-01-26 15:15
Logged In: YES user_id=4771 >>> (a for d in a) Fatal Python error: unknown scope for a in <generator expression>(1) in <stdin> symbols: {'a': 2048, '_[1]': 4, 'd': 2} locals: {'_[1]': 0, 'd': 1} globals: {}
msg45159 - (view)	Author: Jiwon Seo (jiwon) *	Date: 2004-01-27 07:44
Logged In: YES user_id=595483 I fixed the patch for the bug that arigo mentioned, and for what perky mentioned - PyList_GetSlice error handling - . now, I'll tackle python compiler package :)
msg45160 - (view)	Author: George Yoshida (quiver)	Date: 2004-01-29 11:48
Logged In: YES user_id=671362 yet another Fatal Python error; >>> (a for a in (b for b in (0,1))) <generator object at 0x40170f8c> >>> (a for a in (b for b in range(2))) Fatal Python error: unknown scope for range in ?(0) in <stdin> symbols: {'range': 8} locals: {} globals: {} Aborted # applied patch as of 2004-01-27
msg45161 - (view)	Author: Jiwon Seo (jiwon) *	Date: 2004-01-30 03:01
Logged In: YES user_id=595483 ok, I've fixed the bug quiver commented, and added related test code to test_grammar.py.
msg45162 - (view)	Author: Hyeshik Chang (hyeshik.chang) *	Date: 2004-01-30 05:09
Logged In: YES user_id=55188 Yet another Fatal Python error; >>> (a for a in (b for b in (a for a in 'xxx' if True) if False) if True) lookup 'True' in ? 3 -1 freevars of <generator expression>: ('True',) Fatal Python error: com_make_closure() zsh: abort (core dumped) ./python # applied patch as of 2004-01-30
msg45163 - (view)	Author: Jiwon Seo (jiwon) *	Date: 2004-02-02 08:29
Logged In: YES user_id=595483 ok. I fixed another bug reported by perky, and added related test to test_grammar.py.
msg45164 - (view)	Author: George Yoshida (quiver)	Date: 2004-02-02 11:47
Logged In: YES user_id=671362 I am not sure if I should call this a bug, but here it goes: >>> lst = [lambda x:x+y for y in range(3)] >>> for f in lst:print f(0) ... 2 2 2 >>> gen = (lambda x:x+y for y in range(3)) >>> for f in gen:print f(0) ... 0 1 2 Is this the intended behavior?
msg45165 - (view)	Author: Hyeshik Chang (hyeshik.chang) *	Date: 2004-02-02 12:37
Logged In: YES user_id=55188 I think it's right for namespace difference between listcomp and genexpr.
msg45166 - (view)	Author: Armin Rigo (arigo) *	Date: 2004-02-02 12:58
Logged In: YES user_id=4771 The behavior is indeed the one described by the PEP but it is still surprising. As far as I'm concerned it is yet another reason why free variable bindings ("nested scopes") are done wrong in Python currently :-( If you use the older trick "lambda x,y=y:x+y" instead, then both cases will agree (with the sensible result in my opinion). A few more issues and I'll start a campain for definition-time bindings of free variables :-(
msg45167 - (view)	Author: George Yoshida (quiver)	Date: 2004-02-03 09:07
Logged In: YES user_id=671362 Thanks, Arigo and Perky. Hmm, maybe I should read PEP and the thread about namespace more carefully, not just skim the surface. Anyway, PEP has not been updated since last October, so I think it's good time to merge recent progress of genexpr and update PEP-289.
msg45168 - (view)	Author: Armin Rigo (arigo) *	Date: 2004-02-03 11:46
Logged In: YES user_id=4771 After thinking a bit more on the issue, note that the generator version of your example is equivalent to: def g(): for y in range(3): yield lambda x: x+y which means that it can suffer from the same problem as the first example, but more subtly (and even more confusingly): for f in g(): print f(0) # 0, 1, 2 for f in list(g()): print f(0) # 2, 2, 2 This is because due to Python's nested scope rules the above generator is equivalent to: def g(): result = lambda x: x+y for y in range(3): yield result at which point I think it gets pretty confusing...
msg45169 - (view)	Author: Jiwon Seo (jiwon) *	Date: 2004-02-04 06:13
Logged In: YES user_id=595483 Again, I fixed the patch so that it can get error from '(x for x in None)' immediately. (upto now, error does not occur until generator expression is evaluated)
msg45170 - (view)	Author: George Yoshida (quiver)	Date: 2004-02-04 10:45
Logged In: YES user_id=671362 What about this code? In the currently implementation, loop variables inside a list comprehension is not visible outside if you use it inside a generator expression. For example: >>> (ab for a in [b for b in range(5)]) Traceback (most recent call last): File "<stdin>", line 1, in ? NameError: name 'b' is not defined Its list comprehension counterpart is: >>> [ab for a in [b for b in range(5)]] [0, 4, 8, 12, 16]
msg45171 - (view)	Author: Hyeshik Chang (hyeshik.chang) *	Date: 2004-02-04 13:53
Logged In: YES user_id=55188 As it mentioned in PEP, even listcomp's leaking of loop value will be dropped in Python 3. I'm sorry but I don't see that the usage is solid in the future.
msg45172 - (view)	Author: Jiwon Seo (jiwon) *	Date: 2004-02-23 16:34
Logged In: YES user_id=595483 Whoa! I finally patched compiler package in pure python. (I was a bit busy recently :) I've run regression test with 'Tools/compiler/regrtest.py' before I submit this patch, and there was one failure (which is test_dis.py). I'm not sure if that's a problem, so I'm just submitting this in. Now, I think I'll refactor things a bit!
msg45173 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2004-03-13 21:17
Logged In: YES user_id=80475 Any recent progress?
msg45174 - (view)	Author: Hyeshik Chang (hyeshik.chang) *	Date: 2004-03-14 02:59
Logged In: YES user_id=55188 I'm writing docs for tutorial and language reference. It'll be completed in a day or two.
msg45175 - (view)	Author: Hyeshik Chang (hyeshik.chang) *	Date: 2004-03-16 04:16
Logged In: YES user_id=55188 Another bug in the implementation. >>> list((x, y) for x in 'abcd' for y in 'abcd') [('a', 'a'), ('a', 'b'), ('a', 'c'), ('a', 'd')] Expected behavior may be: >>> [(x, y) for x in 'abcd' for y in 'abcd'] [('a', 'a'), ('a', 'b'), ('a', 'c'), ('a', 'd'), ('b', 'a'), ('b', 'b'), ('b', 'c'), ('b', 'd'), ('c', 'a'), ('c', 'b'), ('c', 'c'), ('c', 'd'), ('d', 'a'), ('d', 'b'), ('d', 'c'), ('d', 'd')]
msg45176 - (view)	Author: Jiwon Seo (jiwon) *	Date: 2004-03-16 16:24
Logged In: YES user_id=595483 ah, the following bug was due to the patch I uploaded 2004-02-04 15:13. In order to get an error instantly from an expression like "g=(x for x in None)", I made it equivalent to, def __gen(_[1]): for x in _[1]: yield x g=__gen(iter(None)) ^^^^ But when there is two or more iterator expression of same symbol(or string), that patch produces error, because currently, "((x, y) for x in 'abcd' for y in 'abcd') " is equivalent to, def __gen(_[1]): for x in _[1]: for y in _[1]: yield (x,y) g = __gen(iter("abcd")) # passing only one parameter I could make it pass every iterator expressions respectively even when they are same symbol(or string, ...), but for now, I just dropped that patch (patch of 2004-02-04) since that's the simplest thing now. If somebody says getting instance error for cases like "(x for x in None)" is important, I'll make another patch for it. Btw, added more test case that covers what perky mentioned.
msg45177 - (view)	Author: Hyeshik Chang (hyeshik.chang) *	Date: 2004-03-17 05:37
Logged In: YES user_id=55188 Okay. Here's my draft for documentation. Any review/fix/enhance is very welcome! I think it's ready to putting it into CVS now.
msg45178 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2004-03-17 06:21
Logged In: YES user_id=80475 I'll give it a second review.
msg45179 - (view)	Author: Hyeshik Chang (hyeshik.chang) *	Date: 2004-03-17 06:28
Logged In: YES user_id=55188 The compiler package patch has some problems in its compiler/ast.py currently. jiwon said he regenerated it using Tools/compiler/astgen.py. But it made some incompatibilities against ast.py on current CVS. For example, class Expression.
msg45180 - (view)	Author: Jiwon Seo (jiwon) *	Date: 2004-03-17 08:45
Logged In: YES user_id=595483 To fix the bug that perky picked out, I hand-made ast.py instead of using auto-generated code. But, still I don't understand why Tools/compiler/regrtest didn't tell me about it. (Maybe I didn't look at the output carefully enough.) Ah, and it would be nice if somebody tell me how to run test_grammar.py(only) with python-compiler package.
msg45181 - (view)	Author: George Yoshida (quiver)	Date: 2004-03-19 13:37
Logged In: YES user_id=671362 Hi, Jiwon. This is another bug candidate. If you use genexp with iterators, it doesn't work well. # test Perky's previous post using iterators >>> list((x,y) for x in iter('abcd') for y in iter('abcd')) [('a', 'a'), ('a', 'b'), ('a', 'c'), ('a', 'd')] Thanks.
msg45182 - (view)	Author: Jiwon Seo (jiwon) *	Date: 2004-03-21 16:59
Logged In: YES user_id=595483 Hi, quiver. I don't think we can easily go around this problem if we have to capture iterators in generator expression. If you run following, you'll know what I mean. >>> a = iter("abcd") >>> b = iter("abcd") >>> [(x, y) for x in a for y in b] [('a', 'a'), ('a', 'b'), ('a', 'c'), ('a', 'd')] I think one possible solution could be, instead of passing evaluations of iterators in generator expression, make a list from iterator and, then pass it as argument. For instance, g = (x for x in iter("abcd")) will be equivalent to, def __gen(_[1]): for x in _[1]: yield x g = __gen(list(iter("abcd"))) # see 'list' - instead of g = __gen(iter("abcd")) . I'm not sure if I'm in a position to decide to do that way or not. If the current reviewer (rhettinger) approves it, I'll do that way. Or else, I think I will post it on the mailing list.
msg45183 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2004-03-21 17:28
Logged In: YES user_id=80475 You need a solution other than list(). A key to the success of genexps is to never to eat memory by expanding an iterator into a container. For behavior questions, your reference point is the list comprehension. list(genexp) should always produce the same result as a list comprehension.
msg45184 - (view)	Author: Hyeshik Chang (hyeshik.chang) *	Date: 2004-03-21 17:38
Logged In: YES user_id=55188 Agreed. I'd prefer the current implementation's behavor rather than defined in PEP289. Yes, It's incompatible with list comprehensions but generator expressions are already quite different enough from it. :) How about to change PEP289's genexpr semantics to this? g = (x2 for x in range(10)) print g.next() is equivalent to: def __gen(_i1): for x in _i1: yield x2 g = __gen(range(10)) print g.next()
msg45185 - (view)	Author: Hyeshik Chang (hyeshik.chang) *	Date: 2004-03-21 17:42
Logged In: YES user_id=55188 Aah. But I'm not insisting on that because it's incompatible even with plain nested for-loops.
msg45186 - (view)	Author: Hyeshik Chang (hyeshik.chang) *	Date: 2004-03-21 17:55
Logged In: YES user_id=55188 This inconsistency may not be fixed by list(iter()) workaround, either. >>> def counter(): ... counter.count += 1 ... return range(counter.count) ... >>> counter.count = 0 >>> [y for x in 'abc' for y in counter()] [0, 0, 1, 0, 1, 2] >>> counter.count = 0 >>> list(y for x in 'abc' for y in counter()) [0, 0, 0]
msg45187 - (view)	Author: Armin Rigo (arigo) *	Date: 2004-03-21 23:40
Logged In: YES user_id=4771 Oh ho. Am I right in fearing that the idea of precomputing the iterables was broken in the first place? We just cannot do this. The things we are looping over may depend on other stuff from the generator expression itself. Example: >>> [x for l in [[1,2],[3,4]] for x in l] [1, 2, 3, 4] >>> (y for m in [[1,2],[3,4]] for y in m) NameError: name 'm' is not defined
msg45188 - (view)	Author: Jiwon Seo (jiwon) *	Date: 2004-03-22 01:57
Logged In: YES user_id=595483 Ah... Maybe we need to drop precomputation of iterables. I'll post it on the mailing list so that people can consider about that.
msg45189 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2004-03-22 19:39
Logged In: YES user_id=6380 Only the outermost (last) iterable should be precomputed. While (f(x,y) for x in A(y) for y in B) is not the same as ((f(x,y) for x in A(y)) for y in B) I think the scoping should be done similarly.
msg45190 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2004-03-22 23:00
Logged In: YES user_id=6380 Sorry, I meant the first, not the last. I was confused about how the 'for' clauses are nested, but the outermost one is the first. So the nesting remark is probably just confusing and wrong; ignore it. What I meant is: (f(x,y) for x in A() for y in B(x)) should IMO precompute A(), but not B(x). I guess the equivalent generator function would be: def __gen(__outer__=A(), f=f, B=B): for x in __outer__: for y in B(x): yield f(x,y) In general the value of every free variable used anywhere except in the outer expression should be captured; the value of the outer expression should be captured. This should give the least surprising semantics in a variaty of use cases.
msg45191 - (view)	Author: Jiwon Seo (jiwon) *	Date: 2004-03-27 15:56
Logged In: YES user_id=595483 I'm submitting two versions now. One is variable-capture version(but without any iterable pre-computation), and the other is lazy-binding version. I'll be in military camp for about 1 month(4/6~5/2) so I will not be able to fix/patch anything for the period, thus I'm submitting this in now. :) If I have time, and necessity arises (before 4/6), I'll also add outmost-iterable-precomputation version.
msg45192 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2004-05-12 07:32
Logged In: YES user_id=80475 The lazy_bind version has survived over of a month of my testing and multiple read throughs of the code :-) I'm not expert with graminit.h but did wonder if testlist_gexp could have a high numbered code so that other codes would not have to be re-numbered. When you get back from military camp, please load the version that precomputes the outermost iterable. I believe that is Guido's currently favored approach. Currently, it fails this test: >>> x = 10 >>> g = (i*i for i in range(x)) >>> x = 5 >>> list(g) # should be ten items long [0, 1, 4, 9, 16] Tim, summarized it as: """ ... Guido favored mostly-late binding -- which is late binding, except that the iterable in the leftmost for-clause is evaluated at once. So ... (e1 for stuff in e2 for morestuff in e3 ...) ... is effectively replaced by def __g(primary): for stuff in primary: for morestuff in e3: ... yield e1 ... __g(e2) ... """
msg45193 - (view)	Author: Jiwon Seo (jiwon) *	Date: 2004-05-12 08:30
Logged In: YES user_id=595483 I don't think that "lazy-binding + leftmost iterable precomputation" makes sense. Just adding another for-loop to the example shows the reason. >>> x = 10 >>> g = ((i,j) for i in range(x) for j in range(x)) >>> x = 5 >>> list(g) Here, if j is iterated only to 5 when i is iterated to 10, I think it would make users confused. I think "early-binding + leftmost iterable precomputation" makes sense, but "lazy-binding + leftmost iterable precomputation" does not make sense IMHO. About renumbering testlist_gexp in graminit.h, it's not what I did but it's auto-generated by pgen(Parser/pgenmain.c). Although it makes another file (symbol.py) need to be changed (and thus makes cvs log a bit dirty), it's tool- generated file, so I did it like that. If you think it's better to do it without re-numbering, let me know or you could do it yourself. ;^)
msg45194 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2004-05-12 14:03
Logged In: YES user_id=6380 You're just showing that saving genexprs up for later use is a bad idea. We should place clear warnings in the docs that the genexpr should be used up right away, and otherwise to write an explicit generator function.
msg45195 - (view)	Author: Armin Rigo (arigo) *	Date: 2004-05-12 15:26
Logged In: YES user_id=4771 Which is exactly why I cannot understand why you are so keen on one binding model rather than the other, if it doesn't matter. It looks like we are going to add is a feature that doesn't scale to advanced usages. Moreover it seems really obvious that non-experts will get surprizes, because they will try to replace all their listcomps with genexprs -- they shouldn't do that, but of course they will do it anyway, because it appears to work -- until the occasional strange bug kreeps in, and one that is difficult to actually relate to the genexpr in the first place.
msg45196 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2004-05-12 17:36
Logged In: YES user_id=6380 Read back what I wrote on python-dev about capturing free variables as a new concept with which I am uncomfortable. There should not be such "advanced" usages of genexprs -- advanced usages are better served by explicit generator functions.
msg45197 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2004-05-12 18:54
Logged In: YES user_id=80475 Jeremy: I reviewed the lazy binding version of the patch as much as I could. Can you take a brief look at just the changes to the grammar and at the new functions in compile.c. The functions were clearly written and neatly paralleled list comps but I wondered whether it was possible and desirable to factor-out the commonalities. Jiwon: the graminit.h renumbering is fine (all of the automatically generated files should be left as-as). Guido: I was clear on your rationale for preferring late binding but did not find wording for the reason behind the precompution of the innermost iterable. I speculate that the semantics reflect that this is an expression and that all expressions (except and/or) are immediate.
msg45198 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2004-05-13 03:58
Logged In: YES user_id=6380 On why the first expr should be evaluated immediately: Consider sum(x for x in foo()). Now suppose there's a bug in foo() that raises an exception, and a bug in sum() that raises an exception before it starts iterating over its argument. Which exception would you expect to see? I'd be surprised if the one in sum() was raised rather the one in foo(), since the call to foo() is part of the argument to sum(), and I expect arguments to be processed before the function is called. OTOH, in sum(bar(x) for x in foo()), where sum() and foo() are bugfree, but bar() raises an exception, we have no choice but to delay the call to bar() until sum() starts iterating -- that's part of the contract of generators. (They do nothing until their next() method is first called.) Ditto for subsequent iterables if nested for loops are used: in sum(xy for x in range(10) for y in bar(x)), there are 10 calls to bar(), with arguments supplied by the first (outer) for loop, and we have no choice but to delay even the first call to bar() until sum() calls next() on its argument. I know this isn't the easiest thing to implement, but I still think it's the right thing. The generator function for the last example would be something like def G(arg): for x in arg: for y in bar(x): yield xy and the call to sum() would be sum(G(range(10)))
msg45199 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2004-05-15 00:31
Logged In: YES user_id=80475 Jiwon, do you see how to proceed for here? Note that given, (for x in exp1 if exp2 for y in exp3 if exp4), only exp1 is evaluated immediately. exp2 is deferred.
msg45200 - (view)	Author: Jiwon Seo (jiwon) *	Date: 2004-05-15 06:08
Logged In: YES user_id=595483 Ok. I see. I've almost finished adding it(outmost iterable precomputation) to CPython, and now trying to do it in python compiler package. Probably I'll commit patch till tomorrow.
msg45201 - (view)	Author: Jiwon Seo (jiwon) *	Date: 2004-05-15 17:24
Logged In: YES user_id=595483 OK. I added outmost-iterable-precomputation feature to lazy-binding version of the patch, and also added related tests to test_grammar.py.
msg45202 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2004-05-19 08:20
Logged In: YES user_id=80475 Thank you for the superb contribution. It passes all of my tests without exception (see the new file, test_genexps). I do not see any issues with the coding. Am accepting and applying the patch except for the documentation. Please update the documentation part of the patch and I will go through and clarify where necessary. Also, I will update the PEP to reflect Guido's rationale for the design decisions on binding behavior.
msg45203 - (view)	Author: Jiwon Seo (jiwon) *	Date: 2004-05-20 16:51
Logged In: YES user_id=595483 Thank you, and I'm happy to contribute. =) I updated documentation so that it deals with the precomputation issue. But I don't know how much I should write or where, so please review/revise it.
msg45204 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2004-05-20 17:08
Logged In: YES user_id=80475 Jiwon, while working through the doc patch, I noticed a code generation nit. The code g=(i for i in range(10)) gets translated to: def __gen(exp): for i in exp: yield i g = __gen(iter(range(10)) del __gen On the next to last line, the call to iter() may be superfluous. Do you see a need for it? If so, I need to include that in the spec. If not, then it should be removed (line 1822 in compile.c). In terms of byte codes, the difference shows up in a comparison to an equivalent generator: >>> dis(compile('g(range(10))', '', 'eval')) 0 0 LOAD_NAME 0 (g) 3 LOAD_NAME 1 (range) 6 LOAD_CONST 0 (10) 9 CALL_FUNCTION 1 12 CALL_FUNCTION 1 15 RETURN_VALUE >>> dis(compile('(i for i in range(10))', '', 'eval')) 0 0 LOAD_CONST 0 (<code object <generator expression> at 00A4B220, file "", line 1>) 3 MAKE_FUNCTION 0 6 LOAD_NAME 0 (range) 9 LOAD_CONST 1 (10) 12 CALL_FUNCTION 1 15 GET_ITER 16 CALL_FUNCTION 1 19 RETURN_VALUE The get_iter at code position 15 is what is in question.
msg45205 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2004-05-20 17:32
Logged In: YES user_id=80475 FWIW, here is a patch that moves the GET_ITER back inside the generator so that the implementation will more closely match the spec and match the operation of regular generators. * compile.c 19 May 2004 08:20:15 -0000 2.302 --- compile.c 20 May 2004 17:29:35 -0000 *********** * 1628,1633 ** --- 1628,1634 ---- if (is_outmost) { com_addop_varname(c, VAR_LOAD, "[outmost-iterable]"); + com_addbyte(c, GET_ITER); com_push(c, 1); } else { *********** * 1819,1825 **** com_addoparg(c, MAKE_FUNCTION, 0); com_test(c, CHILD(CHILD(n, 1), 3)); - com_addbyte(c, GET_ITER); com_addoparg(c, CALL_FUNCTION, 1); com_pop(c, 1); --- 1820,1825 ----
msg45206 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2004-05-20 23:07
Logged In: YES user_id=80475 Please disregard the last two posts. Guido and I agreed that the current behavior is correct. I've updated the docs accordingly and added a test.
msg45207 - (view)	Author: Armin Rigo (arigo) *	Date: 2004-05-21 08:05
Logged In: YES user_id=4771 Note that FOR_ITER segfaults if it is passed a non-iterable. This gives us a dangerous code object without invoking the 'new' module. A compiler-produced code object should probably not allow us to do the following: >>> def f(): (a for b in c) >>> f.func_code.co_consts (None, <code object <generator expression> at xxx>, ...) >>> co = f.func_code.co_consts[1] >>> dis.dis(co) 1 0 SETUP_LOOP 17 (to 20) 3 LOAD_FAST 0 ([outmost-iterable]) >> 6 FOR_ITER 10 (to 20) ... >>> f.func_code = co >>> f(5).next() Segmentation fault

History
Date	User	Action	Args
2022-04-11 14:56:02	admin	set	github: 39778
2004-01-07 12:10:43	jiwon	create