classification
Title: Tutorial offers dangerous advice about iterators: “__iter__() can just return self”
Type: behavior Stage:
Components: Documentation Versions: Python 3.7, Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Usui, amaury.forgeotdarc, andersk, georg.brandl, merwok, pitrou, r.david.murray, rhettinger, terry.reedy, ysj.ray
Priority: low Keywords: patch

Created on 2010-04-12 06:35 by andersk, last changed 2017-07-13 12:29 by r.david.murray.

Files
File name Uploaded Description Edit
issue8376.patch Usui, 2016-10-14 09:04 review
Messages (16)
msg102918 - (view) Author: Anders Kaseorg (andersk) Date: 2010-04-12 06:35
The Python tutorial offers some dangerous advice about adding iterator behavior to a class:
http://docs.python.org/tutorial/classes.html#iterators
“By now you have probably noticed that most container objects can be looped over using a for statement:
…
Having seen the mechanics behind the iterator protocol, it is easy to add iterator behavior to your classes. Define a __iter__() method which returns an object with a next() method. If the class defines next(), then __iter__() can just return self:”

This is reasonable advice for writing an iterator class, but terrible advice for writing a container class, because it encourages you to associate a single iterator with the container, which breaks nested iteration and leads to hard-to-find bugs.  (One of those bugs recently made its way into the code handout for a problem set in MIT’s introductory CS course, 6.00.)

A container class’s __iter__() should return a generator or an instance of a separate iterator class, not self.  The tutorial should make this clearer.
msg102951 - (view) Author: ysj.ray (ysj.ray) Date: 2010-04-12 15:09
I think there is not much ploblem. Please notice this sentence in the previous paragraph in the toturial :

"Behind the scenes, the for statement calls iter() on the container object. The function returns an iterator object that defines the method next() which accesses elements in the container one at a time."

I think it's clear enough that the container object is different from the iterator object, what exactly has the iterator behavior is the iterator object, not the container object. And this sentence you mentioned:

"If the class defines next(), then __iter__() can just return self:"

is to teach you define a iterator, not a container. So by reading it carefully, you won't define something that is both a container and a iterator, will you? 

If you really define a next() on a class which you want it be a container, and then define a __iter__() on it and return self, then your container is exactly a generator, not a container. And a generator can certainly iterate only once.
msg102955 - (view) Author: Éric Araujo (merwok) * (Python committer) Date: 2010-04-12 15:47
Hello

I thought about two bits in your message: “I think it's clear enough” 
and “So by reading it carefully”. They show that the doc is a bit 
ambiguous here, and while I agree that people should take the time to 
read and understand, I think that documentation can always be made 
clearer. The doc about iterables could tell to return an instance of 
another class or use a generator, and then a separate paragraph would 
explain about using next to define iterators, explaining that iterators 
may or may not be iterable.

Does that sound good?

Regards
msg102961 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-04-12 17:24
It might be worth mentioning in the tutorial the specific problem the OP points out: that if an object returns itself in __iter__, then that object can have only one iteration state.  And that therefor only an object that is itself a unique iterator (as opposed to a collection wanting to allow iteration) should return itself as __iter__.

Also note that in the example in the tutorial it says "easy to add the behavior", which *in context* is wrong; pedagogically the better thing to do would be to say "it is easy to write a class that implements an iterator".  As Éric said, splitting the text so that returning an iterator from a collection and constructing an iterator class are distinct topics would greatly improve the exposition.

I'm an experienced Python coder, and I've made this mistake myself.  Often it doesn't break because the code only iterates the object once, but it would be better to have the tutorial encourage doing this correctly by providing good examples and explanations.
msg102963 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-04-12 18:07
I think the docs are fine as-is.  At that point in the tutorial, we're trying to teach what the iterator protocol is.

The considerations of what makes a good design can be discussed elsewhere.  It is not a one-liner.  For example, files are self iterators but lists are not -- there are good reasons for either choice.  Explaining those design choices is not a good way to *introduce* the iterator protocol.

Recommend closing this report.
msg102966 - (view) Author: Anders Kaseorg (andersk) Date: 2010-04-12 18:44
As an experienced Python programmer I am obviously aware that the tutorial is trying to teach how to make an iterator class, not how to make a container class.

But the tutorial doesn’t make that *clear*.  It should be much more explicit about what it is explaining to avoid confusing those concepts in the minds of beginners.  (Or even the staff of the MIT introductory CS course.)

One way to fix this confusion would be to explain what one should do for both container classes and iterator classes:

"""
Having seen the mechanics behind the iterator protocol, it is easy to add iterator behavior to your container classes.  Define a :meth:`__iter__` method which returns an object of a separate iterator class.  The iterator class should have a :meth:`next` method and an :meth:`__iter__` method (the :meth:`__iter__` method of the iterator class should just return ``self``)::

   class ReverseList(object):
       "Container that lets you iterate over the items backwards"
       def __init__(self, data):
           self.data = data
       def __iter__(self):
           return ReverseIterator(self.data)

   class ReverseIterator(object):
       "Iterator for looping over a sequence backwards"
       def __init__(self, data):
           self.data = data
           self.index = len(data)
       def __iter__(self):
           return self
       def next(self):
           if self.index == 0:
               raise StopIteration
           self.index = self.index - 1
           return self.data[self.index]

   >>> for char in ReverseIterator('spam'):
   ...     print char
   ...
   m
   a
   p
   s
   >>> for char in ReverseList([1,2,3]):
   ...     print char
   ...
   3
   2
   1
"""
msg111155 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-07-22 08:16
The two example classes are used exactly the same way, and don't show the differences between them. How about a simple change like:

Replace: 
"""If the class defines
:meth:`next`, then :meth:`__iter__` can just return ``self``"""
with
"""It may be convenient to define :meth:`next` in the class, and have :meth:`__iter__` just return ``self``;  In this case though, the object can be iterated only once."""
msg111196 - (view) Author: Anders Kaseorg (andersk) Date: 2010-07-22 17:26
I don’t think that small change is good enough, if it is still the case that the only provided example is the dangerous one.

It would be easy to clarify the differences between the classes:

>>> rl = test.ReverseList('spam')
>>> [c for c in rl]
['m', 'a', 'p', 's']
>>> [c for c in rl]
['m', 'a', 'p', 's']
>>> ri = iter(rl)
>>> ri
<test.ReverseIterator object at 0x7fa5cbeaec50>
>>> [c for c in ri]
['m', 'a', 'p', 's']
>>> [c for c in ri]
[]
msg111197 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-07-22 17:33
At least equally useful would be the mention that __iter__ can be a generator, eliminating the need for intermediate objects:

>>> class C:
...     def __iter__(self):
...         yield 1
...         yield 2
... 
>>> list(C())
[1, 2]
msg111198 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-07-22 17:40
Why do you call this "dangerous"? because the object can be iterated only once? But reversed() does the same thing.
And all iterators objects must also implement the __iter__ method that return themselves, otherwise they cannot be used in a for loop.
Better start with such an object.
msg111201 - (view) Author: Anders Kaseorg (andersk) Date: 2010-07-22 17:53
Antoine: That’s true.

Amaury: See my original bug description (“This is reasonable advice for writing an iterator class, but terrible advice for writing a container class…”), and my other comments.

There is nothing wrong with explaining how to write an iterator, but the explanation needs to make clear that this is _not_ how you write a container.  Currently the section opens with a misleading motivation (“By now you have probably noticed that most container objects can be looped over using a for statement”), but it does actually not explain how to write a container at all.  So I proposed some language and an example to fix that.
msg278638 - (view) Author: loic rowe (Usui) Date: 2016-10-14 09:04
My proposal for this documentation point is to get rid off the word "most" and to replace it with "built-in". Since it will remove the misleading idea that this paragraph can explain how you can write a containers.
msg278640 - (view) Author: Anders Kaseorg (andersk) Date: 2016-10-14 09:35
Usui, this is a tutorial intended for beginners.  Even if the change from “most” to “built-in” were a relevant one (and I don’t see how it is), beginners cannot possibly be expected to parse that level of meaning out of a single word.

The difference between iterators and containers deserves at least a couple of sentences and preferably an example that includes both, as I proposed in http://bugs.python.org/issue8376#msg102966.  Do you disapprove of that proposal?
msg278641 - (view) Author: loic rowe (Usui) Date: 2016-10-14 09:58
I don't disapprove the proposal on the need to have an explanation on the difference between a container and an iterator.

But I was unable to found in the documentation a proper definition of a container.
msg278805 - (view) Author: loic rowe (Usui) Date: 2016-10-17 13:14
A new proposal is to add at the end of the paragraph on the iterator those point:
"""
As you should have noticed this Reverse let you iterate over the data only once::

   >>> rev = Reverse('spam')
   >>> for char in rev:
   ...     print(char)
   ...
   m
   a
   p
   s
   >>> for char in rev:
   ...     print(char)
   ...
   >>>

So now let's complete our example to have a container that allow you to iterate over his items backwards::

  class ReverseList(object):
       "Container that lets you iterate over the items backwards"
       def __init__(self, data):
           self.data = data
       def __iter__(self):
           return ReverseIterator(self.data)

   class ReverseIterator(object):
       "Iterator for looping over a sequence backwards"
       def __init__(self, data):
           self.data = data
           self.index = len(data)
       def __iter__(self):
           return self
       def next(self):
           if self.index == 0:
               raise StopIteration
           self.index = self.index - 1
           return self.data[self.index]

   >>> rev = ReverseList([1,2,3])
   >>> for char in rev:
   ...     print (char)
   ...
   3
   2
   1
   >>> for char in rev:
   ...     print (char)
   ...
   3
   2
   1
"""
msg298280 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-07-13 12:29
I just had a colleague get confused about the container returning self, and he was referring to the iterator protocol docs at https://docs.python.org/3.6/library/stdtypes.html#iterator.__iter__.  If you don't read that section with your thinking cap firmly in place it is easy to get confused about what "the iterator itself" means there.  I think it would be beneficial to add a sentence about the iterator state needing to be independent for each iterator returned by the *container's* __iter__.  A similar sentence in the tutorial might be all that is needed as well.
History
Date User Action Args
2017-07-13 12:29:41r.david.murraysetmessages: + msg298280
2016-10-17 13:14:25Usuisetmessages: + msg278805
2016-10-14 09:58:30Usuisetmessages: + msg278641
2016-10-14 09:35:05andersksetmessages: + msg278640
2016-10-14 09:04:37Usuisetfiles: + issue8376.patch

nosy: + Usui
messages: + msg278638

keywords: + patch
2016-10-13 14:53:21matrixisesetversions: + Python 3.6, Python 3.7, - Python 2.7, Python 3.2, Python 3.3
2012-05-16 22:29:47terry.reedysetnosy: + terry.reedy

versions: + Python 3.3, - Python 2.6, Python 3.1
2010-10-29 10:07:21adminsetassignee: georg.brandl -> docs@python
2010-07-22 17:53:31andersksetmessages: + msg111201
2010-07-22 17:40:52amaury.forgeotdarcsetmessages: + msg111198
2010-07-22 17:33:33pitrousetnosy: + pitrou
messages: + msg111197
2010-07-22 17:26:55andersksetmessages: + msg111196
2010-07-22 08:16:48amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg111155
2010-04-12 18:44:21andersksetmessages: + msg102966
2010-04-12 18:07:58rhettingersetpriority: normal -> low
nosy: + rhettinger
messages: + msg102963

2010-04-12 17:24:03r.david.murraysetpriority: normal
versions: + Python 2.6, Python 3.1, Python 2.7, Python 3.2
nosy: + r.david.murray

messages: + msg102961

type: behavior
2010-04-12 15:47:19merwoksetmessages: + msg102955
2010-04-12 15:09:27ysj.raysetmessages: + msg102951
2010-04-12 14:36:23ysj.raysetnosy: + ysj.ray
2010-04-12 09:02:33merwoksetnosy: + merwok
2010-04-12 06:35:57anderskcreate