Issue24849
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2015-08-12 10:47 by flying sheep, last changed 2022-04-11 14:58 by admin. This issue is now closed.
Messages (10) | |||
---|---|---|---|
msg248451 - (view) | Author: (flying sheep) * | Date: 2015-08-12 10:47 | |
Things like progressbars want len() to work on iterated objects. It’s possible to define __len__ for many of the iterables returned by itertools. some arguments have to be iterated to find the len(): of course we have to check if those are reentrant, and raise a TypeError if they are non-reentrant. (signified by “(r)→”) for the predicate functions, it’s questionable if we should offer it, since they might take a long time and “len” is a property-like function that feels like it should return fast. map(func, iterable) → len(iterable) count(), cycle(), repeat() → infinty, but because len() returns integers, and there’s only float infinity, that’s impossible accumulate(iterable) → len(iterable) chain(*iterables) → sum(len(it) for it in iterables) chain.from_iterable(iterables) (r)→ like the above compress(data, selectors) (r)→ sum(1 for s in selectors if s) dropwhile(pred, iterable) (r)→ for skip, r in enumerate(map(pred, iterable)): if r: return len(iterable) - skip filterfalse(pred, iterable) (r)→ sum(1 for r in map(pred, iterable) if r) groupby(iterable[, keyfunc]) (r)→ no way but to actually execute it all islice(seq, [start,] stop [, step]) → calculatable if len(seq) is possible starmap(function, iterables) → len(iterables) takewhile(pred, iterable) (r)→ for skip, r in enumerate(map(pred, iterable)): if not r: return skip tee(iterable[, n]) → n zip_longest(*iterables[, fillvalue]) (r)→ max(len(it) for it in iterables) product(), permutations(), combinations(), combinations_with_replacement() → there’s math for that. |
|||
msg248456 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2015-08-12 12:11 | |
No, you may not iterate the iterator in order to compute the len, because then the iterator would be exhausted. In addition, the point of itertools is to *lazily* do operations on iterables of indefinite length, so to offer __len__ if and only if the arguments supported len (for cases where that would work) would be essentially false advertising :) |
|||
msg248457 - (view) | Author: Steven D'Aprano (steven.daprano) * | Date: 2015-08-12 12:16 | |
Unfortunately, this fails because there is no way to tell how long an arbitrary iterable is, or whether it is reentrant or not. Consider: def gen(): while True: if random.random() < 0.5: return random.random() Not only is it not reentrant, but you cannot tell in advance how long it will be. There's also the problem that not all iterables need to have a defined length. The iterator protocol, for example, does not demand that iterators define a length, and we should not put that burden on the programmer. There's one more serious problem with the idea of giving iterators a length. Consider this case: it = iter([1, 2, 3, 4, 5]) next(it) next(it) print(len(it)) What should be printed? 5, the length of the underlying list, or 3, the number of items still remaining to be seen? Whichever answer you give, it will be misleading and a bug magnet under certain circumstances. I don't believe it is worth giving iterators like map, zip etc. a length depending on the nature of what they are iterating over. That can only lead to confusion. Programmers just have to understand that sequences have lengths, but arbitrary iterables may not. |
|||
msg248474 - (view) | Author: (flying sheep) * | Date: 2015-08-12 21:23 | |
Hi, and sorry David, but I think you haven’t understood what I was proposing. Maybe that was too much text and detail to read at once, while skipping the relevant details: Python has iterators and iterables. iterators are non-reentrant iterables: once they are exhausted, they are useless. But there are also iterables that create new, iterators whenever iter(iterable) is called (e.g. implicitly in a for loop). They are reentrant. This is why you can loop sequences such as lists more than once. ——————————————————————— One of those reentrant iterables is range(), whose __iter__ functions creates new lazy iterables, which has a __len__, and so on. It even has random access just like a sequence. Now it’s always entirely possible to *lazily* determine len(chain(range(200), [1,2,5])), which is of course len(range(200)) + len([1,2,5]) = 200 + 3 = 203. No reentrant iterables are necessary here, only iterables with a __len__. (Simply calling len() on them all is sufficient, as it could only create a TypeError which would propagate upwards) ——————————————————————— To reiterate: 1. Lazy doesn’t mean non-reentrant, just like range() demonstrates. 2. I didn’t propose that this works on arbitrary iterables, only that it works if you supply iterables with suitable properties (and throws ValueError otherwise, just like len(some_generator_function()) already does) 3. I know what I’m doing, please trust me and read my proposal carefully ;) |
|||
msg248475 - (view) | Author: (flying sheep) * | Date: 2015-08-12 21:31 | |
To elaborate more on my second point (“No reentrant iterables are necessary here, only iterables with a __len__”) What i meant here is that inside a call of chain(*iterables), such as chain(foo, bar, *baz_generator()), the paramter “iterables” is always a tuple, i.e. a sequence. So it is always possible to just call len() on each element of “iterables” and either get a ValueError or a collection of summable integers. With other itertools functions, we’d need to determine beforehand if we have reentrant iterables or not. This might be a problem, and for some too un-lazy (e.g. groupby) But at the very very least, we could implement this for everything where i didn’t write “(r)”: map, accumulate, chain, islice, starmap, tee, product, permutations, combinations, combinations_with_replacement |
|||
msg248477 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2015-08-12 21:42 | |
No, I guessed that despite saying "some arguments have to be iterated" that you were really talking about arguments that had __len__. That's why I added the sentence about it not being appropriate even if you only did it when the inputs had __len__. But I'll let Raymond re-close this. Who knows, maybe I'll be surprised and it will turn out that he's interested :) |
|||
msg248492 - (view) | Author: Steven D'Aprano (steven.daprano) * | Date: 2015-08-13 00:47 | |
On Wed, Aug 12, 2015 at 09:23:26PM +0000, flying sheep wrote: > Python has iterators and iterables. iterators are non-reentrant > iterables: once they are exhausted, they are useless. Correct. > But there are also iterables that create new, iterators whenever > iter(iterable) is called (e.g. implicitly in a for loop). They are > reentrant. This is why you can loop sequences such as lists more than > once. The *iterable* itself may be reentrant, but the iterator formed from iter(iterable) is not. So by your previous comment, giving the iterator form a length is not appropriate. Do you know of any non-iterator iterables which do not have a length when they could? With the exception of tee, all the functions in itertools return iterators. > One of those reentrant iterables is range(), whose __iter__ functions > creates new lazy iterables, which has a __len__, and so on. It even > has random access just like a sequence. You are misinterpreting what you are seeing. range objects already are sequences with a length, and nothing needs be done with them. But iter(range) are not sequences, they are iterators, and then are not sized and have no __len__ method: py> it = iter(range(10)) py> len(it) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: object of type 'range_iterator' has no len() If range_iterator objects were given a length, what would it be? Should it be the length of the underlying range object, which is easy to calculate but wrong? That's what you suggest below (your comments about chain). Or the length of how many items are yet to be seen, which is surprising in other ways? > Now it’s always entirely possible to *lazily* determine > len(chain(range(200), [1,2,5])), Sure. But chain doesn't just accept range objects and lists as arguments, it accepts *arbitrary iterables* which you accept cannot be sized. So len(chain_obj) *may or may not* raise TypeError. Since you can't rely on it having a length, you have to program as if it doesn't. So in practice, I believe this will just add complication. > which is of course len(range(200)) + > len([1,2,5]) = 200 + 3 = 203. No reentrant iterables are necessary > here, only iterables with a __len__. (Simply calling len() on them all > is sufficient, as it could only create a TypeError which would > propagate upwards) That would be wrong. Consider: it = chain("ab", "cd") throw_away = next(it) assert len(it) == 2 + 2 # call len() on the sequences assert len(list(it)) == len(it) # fails since 3 != 4 |
|||
msg248496 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2015-08-13 04:44 | |
I had explored this idea previously at some length (no pun intended) but it was mostly a dead-end. The best we ended-up with has having __length_hint__ to indicate size to list(). There were several issues some of which at detailed in the comment at the top of https://hg.python.org/cpython/file/tip/Lib/test/test_iterlen.py . Another *big* issue was that Guido was adamantly opposed to iterators having a length because it changed their boolean value from always-true and it broke some of his published code that depended on iterators never being false, even when empty. |
|||
msg248502 - (view) | Author: (flying sheep) * | Date: 2015-08-13 08:07 | |
> The *iterable* itself may be reentrant, but the iterator formed > from iter(iterable) is not. So by your previous comment, giving > the iterator form a length is not appropriate. > With the exception of tee, all the functions in itertools return > iterators. ah, so your gripe is that the itertools functions return iterators, not (possibly) reentrant objects like range(). and changing that would break backwards compatibility, since the documentation says “iterator”, not “iterable” (i.e. people can expect e.g. next(groupby(...))) to work. that’s probably the end of this :( the only thing i can imagine that adds reentrant properties (and an useful len()) to iterators would be an optional function (maybe __uniter__ :D) that returns an iterable whose __iter__ function creates a restarted iterator copy, or an optional function that directly returns such a copy. probably too much to ask for :/ > Since you can't rely on it having a length, you have to program as if > it doesn't. So in practice, I believe this will just add complication. I don’t agree here. If something accepts iterables and expects to sometimes be called on iterators and sometimes on sequences/len()gthy objects, it will already try/catch len(iterable) and do something useful if that succeeds. > The best we ended-up with has having __length_hint__ to indicate size to list(). Just out of interest, how does my __uniter__ compare? > because it changed their boolean value from always-true it does? is it forbidden to define methods so that int(bool(o)) != len(o)? |
|||
msg248610 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2015-08-14 20:06 | |
[flying sheep] > that’s probably the end of this :( Yes, I think so. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:58:19 | admin | set | github: 69037 |
2020-02-05 20:22:46 | mark.dickinson | link | issue39558 superseder |
2016-07-16 21:22:53 | r.david.murray | link | issue27532 superseder |
2015-08-14 20:06:40 | rhettinger | set | status: open -> closed resolution: rejected messages: + msg248610 |
2015-08-13 08:07:23 | flying sheep | set | messages: + msg248502 |
2015-08-13 04:44:20 | rhettinger | set | priority: normal -> low messages: + msg248496 |
2015-08-13 00:47:09 | steven.daprano | set | messages: + msg248492 |
2015-08-12 21:45:17 | serhiy.storchaka | set | assignee: rhettinger |
2015-08-12 21:42:08 | r.david.murray | set | nosy:
+ rhettinger messages: + msg248477 |
2015-08-12 21:31:37 | flying sheep | set | messages: + msg248475 |
2015-08-12 21:23:26 | flying sheep | set | status: closed -> open resolution: rejected -> (no value) messages: + msg248474 |
2015-08-12 12:16:45 | steven.daprano | set | nosy:
+ steven.daprano messages: + msg248457 |
2015-08-12 12:11:58 | r.david.murray | set | status: open -> closed nosy: + r.david.murray messages: + msg248456 resolution: rejected stage: resolved |
2015-08-12 10:47:16 | flying sheep | create |