classification
Title: Provide a robust O(1) mechanism to check for infinite iterators
Type: enhancement Stage: needs patch
Components: Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: bbayles, erik.bray, jdemeyer, koos.zevenhoven, ncoghlan, pitrou, rhettinger, serhiy.storchaka
Priority: normal Keywords:

Created on 2018-06-22 12:48 by ncoghlan, last changed 2018-09-04 01:06 by koos.zevenhoven.

Messages (19)
msg320229 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2018-06-22 12:48
This is a simpler proposal born out of https://bugs.python.org/issue31815: adding __length_hint__ implementations to some known-infinite iterators in itertools that always raise TypeError.

This means that iterator consumers that use the operator.length_hint() API will throw an error when asked to consume them, rather than entering an uninterruptible infinite loop.

The proposed methods added by this proposal would be:

* itertools.count.__length_hint__
* itertools.cycle.__length_hint__
* itertools.repeat.__length_hint__

Each of these would raise TypeError (to indicate an explicitly non-finite length), rather than returning the default of 0 (which length hint API consumers typically interpret as indicating an unknown-but-finite length)
msg320232 - (view) Author: Jeroen Demeyer (jdemeyer) * (Python triager) Date: 2018-06-22 13:02
Why TypeError? Wouldn't OverflowError be more suitable?
msg320234 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2018-06-22 13:25
OverflowError covers cases like greater-than-sys.maxsize range instances, where the object itself is conceptually finite, but the length can't be represented as a C integer.

This case is different: it's a category error where the question being asked doesn't even make sense for the affected type.

However, unlike len(), where a missing __len__() implementation inherently raises TypeError, backwards compatibility requires operator.length_hint() to handle a missing implementation __length_hint__ implementation as equivalent to an implementation that returns a length hint of 0.
msg320235 - (view) Author: Jeroen Demeyer (jdemeyer) * (Python triager) Date: 2018-06-22 13:41
I'm certainly in favor of adding a way for infinite iterators to state that they are infinite. I just feel like TypeError is way overused in Python. I also disagree that it's a category error: it's an iterable, so it does make sense to ask for its length.

You should also consider that there may be classes of iterables whose instances are sometimes finite and sometimes infinite. Then it's certainly not a category error.

So I would like a specific answer for "I'm infinite". If you don't like OverflowError, other options are ArithmeticError, IndexError, ValueError or simply returning math.inf
msg320237 - (view) Author: Jeroen Demeyer (jdemeyer) * (Python triager) Date: 2018-06-22 13:54
By the way, there is a precedent for OverflowError on infinities:

>>> import math
>>> int(math.inf)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: cannot convert float infinity to integer
msg320239 - (view) Author: Erik Bray (erik.bray) * (Python triager) Date: 2018-06-22 14:11
Per the discussion in https://trac.sagemath.org/ticket/24757, it would also be nice if some other common iterables like map and filter implemented a similar __length_hint__ which would interrogate whatever iterables were give as their argument(s).

To summarize the discussion in that ticket, it would be nice to at least know if I can expect some iterable to be finite without actually knowing its length.  Sure, if I passed in some arbitrary generator there's no way for the intepreter to know a priori if it will terminate.  But I would at least like to be able to manually mark whether I expect, as an intelligent developer, that under normal conditions the generator *should* terminate.  And this information should filter up to other iterators that I pass my generator to.  That way I can code defensively around whether or not I at least expect an iterator to be finite (even if not of a known length).
msg320241 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2018-06-22 15:25
Using OverflowError wouldn't mean "I'm infinite", it would only mean "I'm larger than sys.maxsize" (the same way it does for range objects).

However, it may be that that's fine, since the information we really want to convey is:

1. Trying to store this iterable in memory would be a really bad idea
2. Even trying to iterate over this iterable to the end would probably also be a bad idea

And OverflowError conveys those pragmatic consequences pretty well.

(Returning math.inf isn't an option, since __length_hint__ is specified as returning an integer: https://www.python.org/dev/peps/pep-0424/)
msg320297 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-06-23 07:30
See also a meta-issue issue29833.

If use OverflowError as a sign of infinite iterator in __length_hint__, this should be documented as a legitimate use case for OverflowError.

itertools.repeat.__length_hint__() and reversed.__length_hint__() currently raise a TypeError for infinite iterator. reversed.__length_hint__() returns NotImplemented (which raises a TypeError when convert to a C integer). Both TypeError and NotImplemented are handled by the consumer of __length_hint__: in PyObject_LengthHint(). An OverflowError is treated as error.
msg320323 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-06-23 19:18
Perhaps an object can set an attribute, __infinite_iterator__ = True.

That would provide an unequivocal way to communicate to consumer code that the producer is known to generate a non-terminating stream.
msg320350 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2018-06-24 03:50
(I've updated the issue title to state the design requirement, rather than one potential solution to the design requirement)

I like the declarative `__infinite_iterator__` suggestion, as that's:

1. really easy to implement when defining a custom iterator type (whether in Python or in an extension module

2. relatively easy to add to generator-iterators as a read-write property that defaults to False (which could then be set to True via an "itertools.infinite_generator" decorator)

3. relatively easy to support in iterators like map, filter, etc via a property that delegates the response to the underlying iterator

4. even more complex iterators like chain, product, and combinations would be able to define a property based on "any(itr.__infinite__iterator__ in underyling_iterators)"
msg320352 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2018-06-24 04:12
Thinking about the generator-iterator case a bit more, "False" would be a bad default for that. Allowing "NotImplemented" to indicate the ambiguous "Might be finite, might be infinite" state, and using that as the default for generator-iterators, would probably make sense.
msg320554 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-06-27 08:23
Downside of __infinite_iterator__: another additional attribute to look up, and its functionality overlaps with __length_hint__.  I'd rather have a special return value for __length_hint__ (e.g. -1, -2, whatever).
msg320567 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-06-27 09:52
After more thought, I'm no longer convinced this would be useful.  Infinite is just a special case of "bigger or slower than convenient".  In a way, min(count()) is no more interesting than min(range(1_000_000_000_000)) or min(slow_data_source()).
msg320568 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-06-27 09:58
> Infinite is just a special case of "bigger or slower than convenient".

I have the same opinion.
msg320580 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2018-06-27 14:05
If C level iterator implementations in the standard library natively handled Ctrl-C (to cope with naive third party C level iterator consumers), and so did C level iterator consumers in the standard library (to cope with with naive third party C level iterators), I agree this wouldn't be that useful.

However, if folks are going to actively oppose making it possible for users to interrupt tight loops over non-Python iterators that are "bigger or slower than convenient" on the grounds of it making the C code more complicated to solve a problem they personally consider to be purely theoretical (despite other core developers telling them otherwise), then special casing known-infinite iterators is better than nothing.
msg320581 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-06-27 14:06
I think being able to handle Ctrl-C would be the right solution here.  Not handling Ctrl-C is a problem also for very long but non-infinite iterators.
msg320589 - (view) Author: Erik Bray (erik.bray) * (Python triager) Date: 2018-06-27 14:50
> Thinking about the generator-iterator case a bit more, "False" would be a bad default for that. Allowing "NotImplemented" to indicate the ambiguous "Might be finite, might be infinite" state, and using that as the default for generator-iterators, would probably make sense.

This is why I suggested the converse--something like __finite_iterator__ (nevermind bikeshedding over the name or "yet another dunder method).  This is something one would use to mark as iterator as "this is definitely expected to terminate at some point, assuming it is correctly implemented".  

If __finite_iterator__ == False, which should be the default, it doesn't necessarily mean it is infinite either, it just may or may not be finite, so there's no guarantee.

I think that __finite_iterator__ == True is more or less equivalent to returning a non-zero value from __length_hint__, whereas __finite_iterator__ == False is equivalent to raising NotImplemented for __length_hint__.  Either way it means adding __length_hint__ to all iterators, and also (as Nick suggested) having a decorator for generators to set the appropriate hint as well.
msg323666 - (view) Author: Koos Zevenhoven (koos.zevenhoven) * Date: 2018-08-17 18:42
I'd say there are three different problems related to this:

(1) The inability of infinite iterators/iterables to say anything about their length

(2) The inability to interrupt C-level loops

(3) The horrible consequences of things like list(itertools.count()) that fill up the memory


The problems overlap only partially. Unfortunately, fixing any single one of these problems does not eliminate the two others. For example, (2) and (3) may happen just as well without the iterator protocol being involved. And (1) may just prevent you from checking if the iterable has enough elements for whatever you're doing.
msg324548 - (view) Author: Koos Zevenhoven (koos.zevenhoven) * Date: 2018-09-04 01:06
That said, I agree with Antoine that being able to Ctrl-C would be the most useful of these fixes. But it seems clear that people are experiencing these issues differently (if at all).
History
Date User Action Args
2018-09-04 01:06:44koos.zevenhovensetmessages: + msg324548
2018-08-17 18:42:10koos.zevenhovensetnosy: + koos.zevenhoven
messages: + msg323666
2018-06-27 14:50:04erik.braysetmessages: + msg320589
2018-06-27 14:06:26pitrousetmessages: + msg320581
2018-06-27 14:05:01ncoghlansetmessages: + msg320580
2018-06-27 09:58:28serhiy.storchakasetmessages: + msg320568
2018-06-27 09:52:40rhettingersetmessages: + msg320567
2018-06-27 08:23:41pitrousetnosy: + pitrou
messages: + msg320554
2018-06-24 04:12:08ncoghlansetmessages: + msg320352
2018-06-24 03:50:59ncoghlansetmessages: + msg320350
title: Raise OverflowError in __length_hint__ for consistently infinite iterators -> Provide a robust O(1) mechanism to check for infinite iterators
2018-06-23 19:18:14rhettingersetmessages: + msg320323
2018-06-23 07:30:30serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg320297
2018-06-23 02:18:53bbaylessetnosy: + bbayles
2018-06-22 15:25:16ncoghlansetmessages: + msg320241
title: Raise TypeError in __length_hint__ for consistently infinite iterators -> Raise OverflowError in __length_hint__ for consistently infinite iterators
2018-06-22 14:11:52erik.braysetnosy: + erik.bray
messages: + msg320239
2018-06-22 13:54:02jdemeyersetmessages: + msg320237
2018-06-22 13:41:01jdemeyersetmessages: + msg320235
2018-06-22 13:25:45ncoghlansetmessages: + msg320234
2018-06-22 13:02:19jdemeyersetnosy: + jdemeyer
messages: + msg320232
2018-06-22 12:48:14ncoghlancreate