This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Tails of generator get lost under zip()
Type: enhancement Stage: needs patch
Components: Documentation Versions: Python 2.6
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: daniel.urban, docs@python, eric.araujo, ezio.melotti, georg.brandl, gpk-kochanski, rhettinger
Priority: low Keywords:

Created on 2011-02-19 10:50 by gpk-kochanski, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
bug312.py gpk-kochanski, 2011-02-19 10:50 Example bug, demonstrated under python 2.6.6
Messages (8)
msg128842 - (view) Author: Greg Kochanski (gpk-kochanski) Date: 2011-02-19 10:50
When you have a generator as an argument to zip(), code after the last yield statement may not get executed.  The problem is that zip() stops after it gets _one_ exception, i.e. when just one of the generators has finished.

As a result, if there were any important clean-up code at the end of a generator, it will not be executed.   Caches may not get flushed, et cetera.

At the least, this is a documentation bug that needs to be pointed out in both zip() and the definition of a generator().  More realistically, it is a severe wart on the language, because it violates the programmer's reasonable expectation that a generator executes until it falls off the end of the function.  It means that a generator becomes conceptually nasty: you cannot predict what it will do based just on an inspection of the code and the code it calls.

Likely, the same behavior happens in itertools, too.
msg128843 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-02-19 10:55
This behavior is documented[0]:
"""The returned list is truncated in length to the length of the shortest argument sequence."""

You can use izip_longest instead[1].

[0]: http://docs.python.org/library/functions.html#zip
[1]: http://docs.python.org/library/itertools.html#itertools.izip_longest
msg128845 - (view) Author: Greg Kochanski (gpk-kochanski) Date: 2011-02-19 11:44
(a) It is not documented for the symmetric (4, 4) case where the two generators are of equal length.

(b) Even for the asymmetric case, it is not documented in such a way that people are likely to see the implications.

(c) Documented or not, it's still a wart.
msg128851 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-02-19 15:49
a) that's true, even if the behavior makes sense (when the first generator ends there's no reason to see what's next in the second). Georg, do you think it should be documented?

b) if you want to be sure that some clean-up is executed you should use a try/finally in the generator. Relying on the number of elements of another generator used together with zip() seems very fragile to determine when/if a clean-up should be done (what if the other generator has a different number of elements? what if an exception is raised before it's fully consumed? what if you use the generator in for/while loop and break the loop before reaching the end? ...).

c) even if you consider it as a wart, changing it for zip() will break compatibility and it's against the language moratorium. This behavior is also useful if e.g. you have the generators g1 that yields 1 2 3, g2 that yields 4 5 6, and g3 that yields a b c d e f and you want to first zip(g1, g3) and get 1a 2b 3c and then continue with zip(g2, g3) and get 4d 5e 6f. Checking in the first zip() if g3 reached its end or not would mean consuming the 'd', and that would be a worse wart imho.
msg128866 - (view) Author: Greg Kochanski (gpk-kochanski) Date: 2011-02-19 17:51
Yes, the current behaviour makes sense from a language designer's viewpoint, and maybe even from the user's viewpoint (if the user thinks about it a carefully).

But, that's not the point of a computer language.   The whole reason we program in languages like python instead of asm is to match the behaviour of the silicon to human capabilities and expectations.   So, documentation needs to go beyond the minimum from which an expert could deduce the system behaviour.  It needs to point out unexpected things that a competent programmer might miss, even if they could potentially have deduced that unexpected behaviour.

The trouble here is that the syntax of a generator is so much like a function that it's easy to think of it as being as safe and simple as a function.  It's not: the "yield" statement lets a lot of external complexity leak in that's not relevant to a function (unless you're writing multithreaded code).  So, the documentation needs to help the user avoid such problems.
msg128884 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-02-19 23:26
This is not a bug.  It is an implementation specific detail and is not guaranteed behavior.  The submitted "example bug" is horrible code that makes unwarranted assumptions about the implementation -- it is an anti-pattern to write generators that assume that their consumers will run them to exhaustion so that cleanup code will be executed -- a number of tools violate this assumption.  If you're relying on this technique for cleanup, you're doing it wrong.

I'll look at this again after the 3.2 release.  When it was discussed before, the outcome was to introduce itertools.zip_longest() and to not over-document non-guaranteed implementation specific details (lest people rely on them and write code even worse than the OP's example).
msg128889 - (view) Author: Greg Kochanski (gpk-kochanski) Date: 2011-02-20 01:41
The code (bug312.py) was not submitted as a "pattern", but rather as an example of a trap into which it is easy to fall, at least for the 99% of programmers who are users of the language rather than its implementers.  


The basic difference is that while one can write a function that is guaranteed to execute to the end of its body[*]; one cannot do that with a generator function.   This point ought to be made in the documentation.
[* Neglecting SIGKILL and perhaps a few abnormal cases.]

The current documentation emphasizes the analogy to functions (which can be misleading) and (in section 6.8) explictly says that the normal behaviour of a generator function is to run all the way to completion.
msg128892 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-02-20 05:32
I've looked at the docs again and think they're fine.  And in the 3.x docs, the iterator version of zip() specifies its implementation with a pure python equivalent that makes it clear that the iterator is not run to exhaustion.  

Note that zip() has existed since 2.0 and iterators/generators since 2.2.  The docs for them have worked just fine, so I wouldn't worry too much about their being a "trap for 99% of programmers who aren't implementers."

GPK, I can see that you're wound-up about this, but you seem to have a profound misunderstanding about iterators/generators and have incorrectly assumed an implied contract for consumer functions to completely consume their inputs.  Sorry, but I'm going to close this one.  For further discussion, I recommend the python tutor mailing list.
History
Date User Action Args
2022-04-11 14:57:13adminsetgithub: 55457
2011-02-20 05:32:53rhettingersetstatus: open -> closed

messages: + msg128892
resolution: not a bug
nosy: georg.brandl, rhettinger, ezio.melotti, eric.araujo, daniel.urban, docs@python, gpk-kochanski
2011-02-20 01:41:43gpk-kochanskisetnosy: georg.brandl, rhettinger, ezio.melotti, eric.araujo, daniel.urban, docs@python, gpk-kochanski
messages: + msg128889
2011-02-19 23:26:07rhettingersetpriority: normal -> low

nosy: + rhettinger
messages: + msg128884

assignee: docs@python -> rhettinger
type: behavior -> enhancement
2011-02-19 21:34:37daniel.urbansetnosy: + daniel.urban
2011-02-19 19:16:44georg.brandlsetassignee: docs@python

nosy: + docs@python
components: + Documentation, - None
stage: resolved -> needs patch
2011-02-19 17:55:36gpk-kochanskisetstatus: closed -> open
nosy: georg.brandl, ezio.melotti, eric.araujo, gpk-kochanski
resolution: not a bug -> (no value)
2011-02-19 17:51:27gpk-kochanskisetnosy: georg.brandl, ezio.melotti, eric.araujo, gpk-kochanski
messages: + msg128866
2011-02-19 15:49:47ezio.melottisetnosy: + georg.brandl, eric.araujo
messages: + msg128851
2011-02-19 11:44:24gpk-kochanskisetmessages: + msg128845
2011-02-19 10:55:25ezio.melottisetstatus: open -> closed

nosy: + ezio.melotti
messages: + msg128843

resolution: not a bug
stage: resolved
2011-02-19 10:50:44gpk-kochanskicreate