classification
Title: Iterable glossary entry needs clarification
Type: behavior Stage: needs patch
Components: Documentation Versions: Python 3.7, Python 3.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: Zero, docs@python, r.david.murray, rhettinger, terry.reedy, veky
Priority: normal Keywords:

Created on 2013-07-25 23:57 by Zero, last changed 2017-07-17 19:43 by terry.reedy.

Messages (16)
msg193723 - (view) Author: Stephen Paul Chappell (Zero) Date: 2013-07-25 23:57
The following interactive session shows that iterables are not detected properly by the `collections.abc.Iterable` class.

    >>> class IsIterable:
        def __init__(self, data):
            self.data = data
        def __getitem__(self, key):
            return self.data[key]

    >>> is_iterable = IsIterable(range(5))
    >>> for value in is_iterable:
        value

        
    0
    1
    2
    3
    4
    >>> from collections.abc import Iterable
    >>> isinstance(is_iterable, Iterable)
    False
msg193724 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-07-26 01:25
The definition of an Iterable is a class that defines an __iter__  method.  Your class does not, so the behavior you site is correct.

The glossary entry for 'iterable' could use a little clarification.  A class that defines __getitem__ is an iterable if and only if it returns results when passed integers.  Since the documentation for Iterable references that glossary entry, it should probably also be explicit that defining __getitem__ does not (because of the forgoing limitation) cause isinstance(x, Iterable) to be True.  For a class that does not define __iter__, you must explicitly register it with Iterable.

To see why this must be so, consider this:

  >>> y = IsIterable({'a': 'b', 'c': 'd'})
  >>> [x for x in y]
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "<stdin>", line 1, in <listcomp>
    File "<stdin>", line 5, in __getitem__
  KeyError: 0
msg193764 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013-07-26 23:28
Stephen, your class, or rather instances thereof when initialized with a sequence, follow the old iteration protocol. You might call them iterators in the generic sense, though I cannot remember whether we used 'iterator' much before the introduction of the new and now dominant iteration protocol. I am sure 'iterable' was introduced with the new protocol for objects with .__iter__ methods that return iterators, which in this context means an object with a .__next__ method and excludes .__getitem__ objects.

It would have been less confusing is we had disabled the old protocol in 3.0, but aside from the predictable confusion, it seemed better to keep it.
msg194104 - (view) Author: Stephen Paul Chappell (Zero) Date: 2013-08-01 19:05
If my program needed to know if an object is iterable, it would be tempting to define and call the following function instead of using collections.abc.Iterable:

    def iterable(obj):
        try:
            iter(obj)
        except TypeError:
            return False
        return True

Something tells me that is not what the author of collections.abc.Iterable intended.
msg194113 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-08-01 20:26
That would give you a false positive, though.  It would return True for the 'y' in my example, which is not iterable.  So Iterable's behavior here is an example of the Python design rule "resist the temptation to guess".

As Terry said, new classes should implement an __iter__ method.  The __getitem__ iteration support is for backward compatibility.
msg194122 - (view) Author: Stephen Paul Chappell (Zero) Date: 2013-08-01 21:46
Maybe this would have been more appropriate as a question on StackOverflow:

What is the proper way of asking if an object is iterable if it does not support the iterator protocol but does support the old getitem protocol? One might argue that it is better to ask for forgiveness rather than permission, but that does not really answer the question.

My impression of collections.abc.Iterable is that programmers can use it to ask if an object is iterable. Some argue that it is better to ask for forgiveness rather that permission and would suggest pretending that an object is iterable until it is proven otherwise. However, what do we use collections.abc.Iterable’s for then?

The true question is really, “What is the proper way of asking if an object is iterable if it does not support the iterator protocol but does support the old getitem protocol?” More generically, how can you ask an object if it supports ANY iteration protocol? The query probably should have been posted on StackOverflow and not here.

This may not be a problem with collections.abc.Iterable, and thus the issue should be closed. However, the previous question remains, and it is apparent that it cannot be answered with the abstract class as it currently is. Maybe the solution is to just ask for forgiveness where appropriate.
msg194135 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-08-01 23:29
“What is the proper way of asking if an object is iterable if it does not support the iterator protocol but does support the old getitem protocol?”

The *only* answer to that question is to try to iterate it, and see if you get a KeyError on "0".  Since this results in obtaining the first element if it *is* iterable, and in the general case you cannot "reset" an iterable, there is no way to look before you leap.  You have to catch the error after it occurs.

This question and answer probably do belong on Stack Overflow or python-list, but the glossary entry still needs improvement, since the Iterable docs reference it :)
msg298464 - (view) Author: Vedran Čačić (veky) * Date: 2017-07-17 08:19
I think this is backwards. "Refusing the temptation to guess" in this case can mean returning True for is_iterable. After all, we can always have something like

    class Deceptive:
        def __iter__(self):
            raise TypeError("I'm not really iterable")

and it's not the business of instancecheck to actually iterate (either via __iter__, or __getitem__). Its task is to check whether it has a corresponding attribute (not set to None, per the new convention of explicitly disabling protocols).

It could be different if the "old __getitem__ iteration" was deprecated, or at least scheduled to be deprecated, but as far as I can tell, it isn't. (It really should be documented if it were so.)

_At least_, the documentation of https://docs.python.org/3/library/collections.abc.html#collections.abc.Iterable should be more precise in saying (instead of just "See also the definition of iterable.") something like "Note that the definition of iterable in the glossary is more general than what this method checks, by design / omission / backward compatibility / apathy / whatever."

(Ok, the last part might be too much. But it's essential to point out the things are different, and whether it's meant to stay that way.)
msg298495 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-07-17 12:51
No, refusing to guess in this case is to believe the class's declaration that it is an iterable if (and only if) it defines __iter__, which is the modern definition of iterable.  If that doesn't work when the object is iterated, that's a bug in the class claiming to be an iterable when it isn't.

The confusion here is the existence of the older iteration protocol.  As you say, the documentation can use some improvement.  Eventually someone will submit a proposal in the form of a PR and we can hammer out the exact wording.
msg298502 - (view) Author: Vedran Čačić (veky) * Date: 2017-07-17 13:05
Of course. The Deceptive class was just reductio ad absurdum. I'm all for believing the class through what attributes does it expose. We agree there.

Where we don't agree, is _what_ attributes constitute the iteration protocol. You, the source code and the documentation of the collections.abc.Iterable say one thing (__iter__), while I, the current version of Python (at least CPython, but I think other implementations do the same) and the glossary say another thing (__iter__ or __getitem__).

[It's not the only protocol consisting of two attributes... e.g. bool protocol also consists of two attributes, __bool__ and __len__ (though it is not optional, so we don't have collections.abc.Boolable).]

You seem to say that only the glossary needs fixing. But then we'll be in an even more weird position, where we must say some objects can be iterated, but are not iterables. I'm pretty sure you don't want that. The whole point of "Xable" words (e.g. "callable", as opposed to "function") is that it encompasses everything that can be Xed, not only the first thing that comes to mind (e.g. classes can also be called).

Or are you saying that after the glossary is fixed, then we should fix Python by (at least deprecating, if not) forbidding __getitem__ iteration? I'm not sure that this is the consensus. Are you?
msg298530 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-07-17 14:30
The wold "iterable" just means "can be looped over".  There are many ways to implement this capability (two-arg form of iter(), the __iter__ method, generators, __getitem__ with integer indexing, etc).

collections.abc.Iterable is more limited and that is okay.  There is nothing that compels us to break an API has been around and successful for 26+ years.  That clearly wasn't Guido's intention when he added  collections.abc.Iterable which is just a building block for more complex ABCs.

I recommend closing this.  We're not going to kill a useful API and break tons of code because of an overly pedantic reading of what is allowed to be iterable.

However we can make a minor amendment to the glossary entry to mention that there are multiple ways of becoming iterable.

Stephen, the try/except is a reasonable way to recognize an iterable.  The ABCs are intended to recognize only things that implement a particular implementation or that are registered.  It is not more encompassing or normative than that.
msg298534 - (view) Author: Vedran Čačić (veky) * Date: 2017-07-17 14:40
Raymond, I think you didn't understand the issue. Glossary already _has_ the ammendment you mention (at least for the __getitem__ - I'm not sure any of other examples you mention are counterexamples to that interpretation: callable_iterators and generators _do_ have an __iter__ attribute, and they are correctly detected as instances of collections.abc.Iterable).

I wanted to push in the _opposite_ direction, to fully bless __getitem__ as a way to declare iterability, so it could be recognized by Iterable's instancecheck. Because it seems to me that whoever wrote that instancecheck, didn't have the _intention_ to exclude __getitem__ iteration.

Or at least, if we cannot do that because of backward compatibility:-(, to explicitly document that Iterable ABC _does not_ fully encompass what we mean by "being iterable".
msg298535 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-07-17 15:05
> Or at least, if we cannot do that because of backward
> compatibility:-(, to explicitly document that Iterable ABC
> _does not_ fully encompass what we mean by "being iterable".

That would be a reasonable amendment to collections.abc.Iterable docs.

I don't think it is either desirable or possible for collections.abc.Iterable to recognize iterables with __getitem__.  We cannot know it advance whether __getitem__ is a mapping or a sequence.  IIRC, that particular problem was the motivation for creating the ABCs. Without a user registering a class as Iterable or without inheriting from Iterable, there is really no way to know.
msg298540 - (view) Author: Vedran Čačić (veky) * Date: 2017-07-17 15:47
Yes, the mapping/sequence distinction was (at least declaratively) the reason the ABCs were introduced, but that isn't an obstacle here: whether a mapping or a sequence, it _is_ iterable, right?

---

In case anybody is interested, here's how I came to this problem: at a programming competition, I set a problem where contestants had to write some function, and I declared that "the function must work for arbitrary iterable (with some properties that currently don't matter)".

Then a big discussion ensued, with a big group of people thinking that classes with __getitem__ but no __iter__ don't quality (giving collections.abc.Iterable as an argument), and another big group of people thinking they do (giving EAFP as an argument: "look, I tried iterating, and succeeded").

Of course, it's an incredibly technical detail, but I don't like such gray areas. To me, things with __getitem__ are clearly iterable - the glossary says so:-). Iterable's instancecheck is simply buggy ("incomplete", if you want). There might be valid reasons for keeping it buggy, but they should be documented.
msg298544 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-07-17 17:26
"things with __getitem__ are clearly iterable"

This is false.  IMO it should be fixed in the glossary.  It should say "or __getitem__ method implementing sequence semantics".  That plus the addition to the Iterable docs will close this issue.
msg298551 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2017-07-17 19:43
The problem with the Iterable ABC is that 'iterable' and 'iterator' are *dynamically* defined, with a possibly infinite time required to possibly destructively check either definition.  In general, an algorithmic *static* check can only guess whether an object is iterable, though humans analyzing enough code can potentially get it right.  Therefore, using isinstance(ob, Iterable) is not 100% reliable, and in my opinion *should not be used* as the definition of lower-case 'iterable'.

Definition: Object ob is iterable if 'iter(ob)' returns an iterator. For the reasons given above, iter may return a non-iterator, but it will if ob implements either the old or new iterator protocol.  If ob has .__iter__, iter returns ob.__iter__().  If ob has .__getitem__, iter returns iterator(ob), where iterator is a hidden internal class that embodies the old iterator protocol by defining a .__next__ method that calls .__getitem__.  In both cases, iter does the best it can by assuming that the methods are correctly written as per one of the two protocols.

Loose definition: Object 'it' is iterable if it can be looped over.
Python definition: Object 'it' is iterable if repeated 'next(it)' calls either return an object or raise StopIteration.  This means that

try:
    while True:
        next(it)
except StopIteration:
   pass

runs, possibly forever, without raising.

As Raymond noted, an iterator can be created multiple ways: IteratorClass(), iter(ob), iter(func, sentinal), generator_func().
---

Iterable versus iter with respect to classes with __getitem__:

Iter was added in 2.2.  Built-in iterables were only gradually converted from old to new protocol, by adding a new .__iter__.  So even ignoring user classes, iter *had* to respect .__getitem__.  Even today, though only a small fraction of classes with .__getitem__ are iterable, people do not generally call iter() on random objects.  

Iterable (added 2.6) is documented as the "ABC for classes that provide the __iter__() method."  In other words, isinstance(ob, Iterable) replaces hasattr(ob, '__iter__').  Except that the former is more than that.  The magic word 'register' does not appear in the collections.ABC doc, and I think that this is the omission to be remedied.

"ABC for classes that provide the __iter__() method, or that provide a __getitem__ method that implements the old iterator protocol and register themselves as Iterable."

An example could be given using a patched version of IsIterable.

If one adds two lines of code

from collections.abc import Iterable
...
Iterable.register(IsIterable)

then isinstance(IsIterable(3), Iterable) is True, except that this is a lie in the other direction.

Traceback (most recent call last):
  File "F:\Python\mypy\tem.py", line 17, in <module>
    for i in it2:
  File "F:\Python\mypy\tem.py", line 7, in __getitem__
    return self.data[key]
TypeError: 'int' object is not subscriptable

Either IsIterable.__init__ must check that data itself has .__getitem__ or IsIterable.__next__ must capture exceptions and raise IndexError instead.

        def __getitem__(self, key):
            try:
                return self.data[key]
            except Exception:
                raise IndexError
History
Date User Action Args
2017-07-17 19:43:02terry.reedysetmessages: + msg298551
2017-07-17 17:26:34r.david.murraysetmessages: + msg298544
versions: + Python 3.6
2017-07-17 15:47:49vekysetmessages: + msg298540
2017-07-17 15:05:53rhettingersetmessages: + msg298535
2017-07-17 14:40:33vekysetmessages: + msg298534
2017-07-17 14:30:24rhettingersetversions: + Python 3.7, - Python 3.3, Python 3.4
nosy: + rhettinger

messages: + msg298530

assignee: docs@python -> rhettinger
2017-07-17 13:05:27vekysetmessages: + msg298502
2017-07-17 12:51:48r.david.murraysetmessages: + msg298495
2017-07-17 08:19:55vekysetnosy: + veky
messages: + msg298464
2013-08-01 23:29:18r.david.murraysetmessages: + msg194135
2013-08-01 21:46:11Zerosetmessages: + msg194122
2013-08-01 20:26:32r.david.murraysetmessages: + msg194113
2013-08-01 19:05:23Zerosetmessages: + msg194104
2013-07-26 23:28:20terry.reedysetnosy: + terry.reedy
messages: + msg193764
2013-07-26 01:25:40r.david.murraysetassignee: docs@python

components: + Documentation, - Library (Lib)
title: Iterables not detected correctly -> Iterable glossary entry needs clarification
nosy: + docs@python, r.david.murray
versions: + Python 3.4
messages: + msg193724
stage: needs patch
2013-07-25 23:57:31Zerocreate