classification
Title: Add __len__ to map, everything in itertools
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.6
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: flying sheep, r.david.murray, rhettinger, steven.daprano
Priority: low Keywords:

Created on 2015-08-12 10:47 by flying sheep, last changed 2015-08-14 20:06 by rhettinger. This issue is now closed.

Messages (10)
msg248451 - (view) Author: (flying sheep) * Date: 2015-08-12 10:47
Things like progressbars want len() to work on iterated objects.

It’s possible to define __len__ for many of the iterables returned by itertools.

some arguments have to be iterated to find the len(): of course we have to check if those are reentrant, and raise a TypeError if they are non-reentrant. (signified by “(r)→”)

for the predicate functions, it’s questionable if we should offer it, since they might take a long time and “len” is a property-like function that feels like it should return fast.

map(func, iterable) → len(iterable)

count(), cycle(), repeat()  → infinty, but because len() returns integers, and there’s only float infinity, that’s impossible

accumulate(iterable)  →  len(iterable)
chain(*iterables)  →  sum(len(it) for it in iterables)
chain.from_iterable(iterables)  (r)→  like the above
compress(data, selectors)  (r)→  sum(1 for s in selectors if s)
dropwhile(pred, iterable)  (r)→  for skip, r in enumerate(map(pred, iterable)): if r: return len(iterable) - skip
filterfalse(pred, iterable)  (r)→  sum(1 for r in map(pred, iterable) if r)
groupby(iterable[, keyfunc])  (r)→  no way but to actually execute it all
islice(seq, [start,] stop [, step])  →  calculatable if len(seq) is possible
starmap(function, iterables)  →  len(iterables)
takewhile(pred, iterable)  (r)→  for skip, r in enumerate(map(pred, iterable)): if not r: return skip
tee(iterable[, n])  →  n
zip_longest(*iterables[, fillvalue])  (r)→  max(len(it) for it in iterables)


product(), permutations(), combinations(), combinations_with_replacement()  →  there’s math for that.
msg248456 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-08-12 12:11
No, you may not iterate the iterator in order to compute the len, because then the iterator would be exhausted.  In addition, the point of itertools is to *lazily* do operations on iterables of indefinite length, so to offer __len__ if and only if the arguments supported len (for cases where that would work) would be essentially false advertising :)
msg248457 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2015-08-12 12:16
Unfortunately, this fails because there is no way to tell how long an arbitrary iterable is, or whether it is reentrant or not. Consider:

def gen():
    while True:
        if random.random() < 0.5:
            return random.random()

Not only is it not reentrant, but you cannot tell in advance how long it will be.

There's also the problem that not all iterables need to have a defined length. The iterator protocol, for example, does not demand that iterators define a length, and we should not put that burden on the programmer.

There's one more serious problem with the idea of giving iterators a length. Consider this case:

it = iter([1, 2, 3, 4, 5])
next(it)
next(it)
print(len(it))

What should be printed? 5, the length of the underlying list, or 3, the number of items still remaining to be seen? Whichever answer you give, it will be misleading and a bug magnet under certain circumstances.

I don't believe it is worth giving iterators like map, zip etc. a length depending on the nature of what they are iterating over. That can only lead to confusion. Programmers just have to understand that sequences have lengths, but arbitrary iterables may not.
msg248474 - (view) Author: (flying sheep) * Date: 2015-08-12 21:23
Hi, and sorry David, but I think you haven’t understood what I was proposing.

Maybe that was too much text and detail to read at once, while skipping the relevant details:

Python has iterators and iterables. iterators are non-reentrant iterables: once they are exhausted, they are useless.

But there are also iterables that create new, iterators whenever iter(iterable) is called (e.g. implicitly in a for loop). They are reentrant. This is why you can loop sequences such as lists more than once.

———————————————————————

One of those reentrant iterables is range(), whose __iter__ functions creates new lazy iterables, which has a __len__, and so on. It even has random access just like a sequence.

Now it’s always entirely possible to *lazily* determine len(chain(range(200), [1,2,5])), which is of course len(range(200)) + len([1,2,5]) = 200 + 3 = 203. No reentrant iterables are necessary here, only iterables with a __len__. (Simply calling len() on them all is sufficient, as it could only create a TypeError which would propagate upwards)

———————————————————————

To reiterate:

1. Lazy doesn’t mean non-reentrant, just like range() demonstrates.
2. I didn’t propose that this works on arbitrary iterables, only that it works if you supply iterables with suitable properties (and throws ValueError otherwise, just like len(some_generator_function()) already does)
3. I know what I’m doing, please trust me and read my proposal carefully ;)
msg248475 - (view) Author: (flying sheep) * Date: 2015-08-12 21:31
To elaborate more on my second point (“No reentrant iterables are necessary here, only iterables with a __len__”)

What i meant here is that inside a call of chain(*iterables), such as chain(foo, bar, *baz_generator()), the paramter “iterables” is always a tuple, i.e. a sequence.

So it is always possible to just call len() on each element of “iterables” and either get a ValueError or a collection of summable integers.

With other itertools functions, we’d need to determine beforehand if we have reentrant iterables or not. This might be a problem, and for some too un-lazy (e.g. groupby)

But at the very very least, we could implement this for everything where i didn’t write “(r)”: map, accumulate, chain, islice, starmap, tee, product, permutations, combinations, combinations_with_replacement
msg248477 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-08-12 21:42
No, I guessed that despite saying "some arguments have to be iterated" that you were really talking about arguments that had __len__.  That's why I added the sentence about it not being appropriate even if you only did it when the inputs had __len__.

But I'll let Raymond re-close this.  Who knows, maybe I'll be surprised and it will turn out that he's interested :)
msg248492 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2015-08-13 00:47
On Wed, Aug 12, 2015 at 09:23:26PM +0000, flying sheep wrote:

> Python has iterators and iterables. iterators are non-reentrant 
> iterables: once they are exhausted, they are useless.

Correct.

> But there are also iterables that create new, iterators whenever 
> iter(iterable) is called (e.g. implicitly in a for loop). They are 
> reentrant. This is why you can loop sequences such as lists more than 
> once.

The *iterable* itself may be reentrant, but the iterator formed from 
iter(iterable) is not. So by your previous comment, giving the iterator 
form a length is not appropriate.

Do you know of any non-iterator iterables which do not have a length 
when they could? With the exception of tee, all the functions in 
itertools return iterators.

> One of those reentrant iterables is range(), whose __iter__ functions 
> creates new lazy iterables, which has a __len__, and so on. It even 
> has random access just like a sequence.

You are misinterpreting what you are seeing. range objects already 
are sequences with a length, and nothing needs be done with them. But 
iter(range) are not sequences, they are iterators, and then are not 
sized and have no __len__ method:

py> it = iter(range(10))
py> len(it)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: object of type 'range_iterator' has no len()

If range_iterator objects were given a length, what would it be? Should 
it be the length of the underlying range object, which is easy to 
calculate but wrong? That's what you suggest below (your comments about 
chain). Or the length of how many items are yet to be seen, which is 
surprising in other ways?

> Now it’s always entirely possible to *lazily* determine 
> len(chain(range(200), [1,2,5])), 

Sure. But chain doesn't just accept range objects and lists as 
arguments, it accepts *arbitrary iterables* which you accept cannot be 
sized. So len(chain_obj) *may or may not* raise TypeError. Since you 
can't rely on it having a length, you have to program as if it doesn't. 
So in practice, I believe this will just add complication.

> which is of course len(range(200)) + 
> len([1,2,5]) = 200 + 3 = 203. No reentrant iterables are necessary 
> here, only iterables with a __len__. (Simply calling len() on them all 
> is sufficient, as it could only create a TypeError which would 
> propagate upwards)

That would be wrong. Consider:

it = chain("ab", "cd")
throw_away = next(it)
assert len(it) == 2 + 2  # call len() on the sequences
assert len(list(it)) == len(it)  # fails since 3 != 4
msg248496 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2015-08-13 04:44
I had explored this idea previously at some length (no pun intended) but it was mostly a dead-end.  The best we ended-up with has having __length_hint__ to indicate size to list().   

There were several issues some of which at detailed in the comment at the top of https://hg.python.org/cpython/file/tip/Lib/test/test_iterlen.py .  Another *big* issue was that Guido was adamantly opposed to iterators having a length because it changed their boolean value from always-true and it broke some of his published code that depended on iterators never being false, even when empty.
msg248502 - (view) Author: (flying sheep) * Date: 2015-08-13 08:07
> The *iterable* itself may be reentrant, but the iterator formed 
> from iter(iterable) is not. So by your previous comment, giving
> the iterator form a length is not appropriate.

> With the exception of tee, all the functions in itertools return
> iterators.

ah, so your gripe is that the itertools functions return iterators, not (possibly) reentrant objects like range(). and changing that would break backwards compatibility, since the documentation says “iterator”, not “iterable” (i.e. people can expect e.g. next(groupby(...))) to work.

that’s probably the end of this :(

the only thing i can imagine that adds reentrant properties (and an useful len()) to iterators would be an optional function (maybe __uniter__ :D) that returns an iterable whose __iter__ function creates a restarted iterator copy, or an optional function that directly returns such a copy. probably too much to ask for :/

> Since you can't rely on it having a length, you have to program as if
> it doesn't. So in practice, I believe this will just add complication.

I don’t agree here. If something accepts iterables and expects to sometimes be called on iterators and sometimes on sequences/len()gthy objects, it will already try/catch len(iterable) and do something useful if that succeeds.

> The best we ended-up with has having __length_hint__ to indicate size to list().

Just out of interest, how does my __uniter__ compare?

> because it changed their boolean value from always-true

it does? is it forbidden to define methods so that int(bool(o)) != len(o)?
msg248610 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2015-08-14 20:06
[flying sheep]
> that’s probably the end of this :(

Yes, I think so.
History
Date User Action Args
2016-07-16 21:22:53r.david.murraylinkissue27532 superseder
2015-08-14 20:06:40rhettingersetstatus: open -> closed
resolution: rejected
messages: + msg248610
2015-08-13 08:07:23flying sheepsetmessages: + msg248502
2015-08-13 04:44:20rhettingersetpriority: normal -> low

messages: + msg248496
2015-08-13 00:47:09steven.dapranosetmessages: + msg248492
2015-08-12 21:45:17serhiy.storchakasetassignee: rhettinger
2015-08-12 21:42:08r.david.murraysetnosy: + rhettinger
messages: + msg248477
2015-08-12 21:31:37flying sheepsetmessages: + msg248475
2015-08-12 21:23:26flying sheepsetstatus: closed -> open
resolution: rejected -> (no value)
messages: + msg248474
2015-08-12 12:16:45steven.dapranosetnosy: + steven.daprano
messages: + msg248457
2015-08-12 12:11:58r.david.murraysetstatus: open -> closed

nosy: + r.david.murray
messages: + msg248456

resolution: rejected
stage: resolved
2015-08-12 10:47:16flying sheepcreate