This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Make itertools iterators interruptible
Type: enhancement Stage: resolved
Components: Extension Modules Versions: Python 3.8
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: koos.zevenhoven, ncoghlan, rhettinger, serhiy.storchaka, tim.peters
Priority: normal Keywords: patch

Created on 2017-10-18 16:43 by serhiy.storchaka, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 4038 closed serhiy.storchaka, 2017-10-18 16:50
Messages (17)
msg304588 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-18 16:43
Proposed PR makes tight C loops with itertools iterators interruptible with Ctrl-C. It adds checks for keyboard interrupt in iterators that can produce long sequences without advancing other iterators. For performance checks are performed only for every 0x1000-th item. If for generating new value other iterator should be advanced, the responsibility for checking for keyboard interrupt is attributed to that iterator.

This would solve the problem discussed on Python-ideas:

https://mail.python.org/pipermail/python-ideas/2017-October/047412.html
http://permalink.gmane.org/gmane.comp.python.ideas/47429

Example:

>>> import itertools
>>> it = itertools.count()
>>> it in it
^CTraceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyboardInterrupt
>>>
msg304590 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-10-18 17:34
When I have time, I would like to re-launch a python-dev discussion on this.  It is my feeling that this solves an invented problem.  In my experience, it only ever happens to people who have intentionally trying to create this effect.

Adding this kind of "junk" through-out the code base adds complexity and more internal operations, but won't help *any* existing, deployed code.  We're making everyone pay for a problem that almost no one has.

Also, if we do care about interruptability, it is unclear whether the responsiblity should like with the consumer or the producer of the iterator.
msg304591 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-18 18:11
Microbenchmark results:

$ ./python -m perf timeit --compare-to=../cpython-release/python -s 'from itertools import repeat' 'list(repeat(None, 1000000))'
/home/serhiy/py/cpython-release/python: ..................... 3.79 ms +- 0.09 ms
/home/serhiy/py/cpython-iter/python: ..................... 4.14 ms +- 0.07 ms

Mean +- std dev: [/home/serhiy/py/cpython-release/python] 3.79 ms +- 0.09 ms -> [/home/serhiy/py/cpython-iter/python] 4.14 ms +- 0.07 ms: 1.09x slower (+9%)


$ ./python -m perf timeit --compare-to=../cpython-release/python -s 'from itertools import cycle, islice' 'list(islice(cycle(range(1000)), 1000000))'
/home/serhiy/py/cpython-release/python: ..................... 6.88 ms +- 0.30 ms
/home/serhiy/py/cpython-iter/python: ..................... 6.87 ms +- 0.26 ms

Mean +- std dev: [/home/serhiy/py/cpython-release/python] 6.88 ms +- 0.30 ms -> [/home/serhiy/py/cpython-iter/python] 6.87 ms +- 0.26 ms: 1.00x faster (-0%)
Not significant!


$ ./python -m perf timeit --compare-to=../cpython-release/python -s 'from itertools import count, islice' 'list(islice(count(), 1000000))'
/home/serhiy/py/cpython-release/python: ..................... 26.1 ms +- 0.6 ms
/home/serhiy/py/cpython-iter/python: ..................... 26.3 ms +- 0.6 ms

Mean +- std dev: [/home/serhiy/py/cpython-release/python] 26.1 ms +- 0.6 ms -> [/home/serhiy/py/cpython-iter/python] 26.3 ms +- 0.6 ms: 1.01x slower (+1%)
Not significant!


$ ./python -m perf timeit --compare-to=../cpython-release/python -s 'from itertools import product' 'list(product(range(100), repeat=3))'
/home/serhiy/py/cpython-release/python: ..................... 80.2 ms +- 3.2 ms
/home/serhiy/py/cpython-iter/python: ..................... 80.2 ms +- 1.7 ms

Mean +- std dev: [/home/serhiy/py/cpython-release/python] 80.2 ms +- 3.2 ms -> [/home/serhiy/py/cpython-iter/python] 80.2 ms +- 1.7 ms: 1.00x faster (-0%)
Not significant!


$ ./python -m perf timeit --compare-to=../cpython-release/python -s 'from itertools import combinations' 'list(combinations(range(23), 10))'
/home/serhiy/py/cpython-release/python: ..................... 177 ms +- 14 ms
/home/serhiy/py/cpython-iter/python: ..................... 169 ms +- 4 ms

Mean +- std dev: [/home/serhiy/py/cpython-release/python] 177 ms +- 14 ms -> [/home/serhiy/py/cpython-iter/python] 169 ms +- 4 ms: 1.05x faster (-4%)


The only significant slowdown is for repeat(). But there is possibility to optimize this one by reusing an existing counter.
msg304592 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-18 18:28
With optimized repeat():

$ ./python -m perf timeit --compare-to=../cpython-release/python -s 'from itertools import repeat' 'list(repeat(None, 1000000))'
/home/serhiy/py/cpython-release/python: ..................... 3.77 ms +- 0.06 ms
/home/serhiy/py/cpython-iter/python: ..................... 3.77 ms +- 0.05 ms

Mean +- std dev: [/home/serhiy/py/cpython-release/python] 3.77 ms +- 0.06 ms -> [/home/serhiy/py/cpython-iter/python] 3.77 ms +- 0.05 ms: 1.00x faster (-0%)
Not significant!
msg304593 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-10-18 18:58
I concur with Raymond. I cited the same arguments in the discussion on Python-ideas. But the other solution that was suggested in this discussion will add more complexity and can't solve all cases.
msg304595 - (view) Author: Koos Zevenhoven (koos.zevenhoven) * Date: 2017-10-18 20:05
To repeat one of my points in the linked threads, I'm not convinced that infinite iterators are the most common case for the problem of long uninterruptible loops. A general mechanism that can be easily used in many places with minimal maintenance burden would be nice. It could be used even in third-party extension modules.
msg304600 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-10-19 01:24
Defensive coding and the complications it brings is a fact of life when providing a widely used platform.

Sure, we're free to say "We don't care about minor user experience irritations like Ctrl-C not always being reliable, users should just suck it up and cope".

I think "It's your own fault for typing that, just restart your session from scratch" is setting the bar too low for ourselves.
msg304601 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-10-19 01:55
To put this another way: I see an uninterruptible infinite loop as a data loss bug on par with a segfault, since there's no graceful way to terminate the process and allow cleanup code to run.

For segfaults, we're willing to tolerate them, but we expect the reproducers to involve arcane coding contortions, not simple expressions like "sum(itertools.count())".

Now, the producer side check that Serhiy posted here only addresses part of the problem - there's also the question of making the consumption loops more robust by having them check for signals, and adding a ThreadExit equivalent to allow the interpreter to request shutdown of non-daemon threads other than the main thread.

But as long as we think it's a-OK for us to hang a user's session, causing them to lose all their unsaved/uncached data, then we're going to resist the extra code complexity required to account for these usability concerns. (And I realise they're not new concerns - they're just longstanding problems that folks have gotten used to tolerating and excusing)
msg304602 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-10-19 05:34
I respectfully disagree that this just happens to people accidentally -- Every single day, I work with either Python professionals or Python students and never see this situation occur, nor have I had a single report of it from one of my clients, ever.  In my experience, someone has to be trying to produce exactly this effect.  

They have to go out of their way to import a high-performance module, select one of the tools specifically documented to be infinite, specifically reach for one the very few tools like repeat() or count() that don't make any pure python callbacks, and then separately reach for a high-performance consumer that makes no pure python callbacks.  People don't just write ``sum(itertools.count()`` to do something useful, they do it just to see if they can produce exactly this effect.

We have a number of areas where we're comfortable saying "just don't do that" (i.e. the repr of a large number or of a large container, repeated exponentation, bytecode hacks, ill-formed ctypes, etc).

I would like to draw a line in the sand for itertools to not go down this path unless we actually see this happening in the wild to people not trying to do it on purpose.  It is much more likely that a user with accidentally types ">>> 'x' * 1000000000" and gets the same effect.

On a side note, I have a fear (possibly rational, possibly not) that introducing signal handling into formerly atomic operations will open up new classes of bugs and usability problems (i.e. Issue #14976 showed that when GC gained the ability trigger calls to __del__, it created queue reentrancy deadlock problems that could not be solved with pure python code).

One last thought -- the various core devs seem to be charging in opposite directions.  On the one hand, there seems to be no limit to the coding atrocities being considered to save under a millisecond of startup time and for various other questionable mirco-optimizations.  And on the other hand, there seems to be a great deal of willingness to inject almost-never-needed error checks or signal handling into otherwise tight, high-volume code paths.   One group likes to refactor code to make it clean and easy to maintain and stick with its business purpose, while another group is comfortable garbaging-up code in order to achieve some other benefit that may not be in line with the module designer's intent.
msg304607 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-10-19 07:50
I'd personally be happy enough if the infinite iterators implemented __length_hint__() as always raising TypeError so the machine-breaking cases of incremental consumption of ever-increasing amounts of memory were blocked - I was suggesting on python-ideas that enabling pervasive signal checking would be too intrusive for anyone to be willing to implement it.

However, Serhiy's patch showed me that it isn't particularly intrusive at all, and the risk of surprising consumers is low, since __next__() methods can already raise arbitrary exceptions.
msg304633 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2017-10-19 18:28
Segfaults are different:  they usually expose an error in CPython's implementation.  We don't prioritize them because the user may have to restart their program (who cares? <0.5 wink>), but because they demonstrate the language implementation is accessing memory wildly.  That in turn can result in anything, from arbitrarily wrong program results, through file corruption, to massive security holes.  It's far more a "correctness" than a "usability" concern.

If a user provokes a segfault by (ab)using low-level facilities (say, ctypes), we don't care - that's on them.  But most segfaults have pointed to legitimate corner-case errors in CPython itself.

There's no correctness issue in whether iterators are always interruptible - it doesn't merit the same concern.
msg304736 - (view) Author: Koos Zevenhoven (koos.zevenhoven) * Date: 2017-10-22 09:13
For the interactive user who uses an interactive environment such as the repl or a Jupyter notebook, the situation is a little different from "CPython as programming language runtime".

The docs say a KeyboardInterrupt is "Raised when the user hits the interrupt key (normally Control-C or Delete). During execution, a check for interrupts is made regularly.". I suppose there's some ambiguity in what "regularly" means there ;). 

But regardless of whether anyone bothers to read that part of the docs, Ctrl-C or an interrupt button not working can feel like a correctness issue for someone that's using an interactive Python environment *as an application* in daily work. Python gives you the impression that you can always interrupt anything if it turns out to take too much time. And I remember that being one of the points that made me move away from matlab, which at that time had problems with interrupting computations.
msg320230 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2018-06-22 12:51
Note: I've filed the "raise TypeError in __length_hint__" suggestion for infinite iterators separately in https://bugs.python.org/issue33939
msg320351 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2018-06-24 04:03
As a potential stepping stone towards possible future changes in the default behaviour here, what if itertools were to offer an opt-in "check_signals(itr, *, iterations=100_000)" helper function that was essentially a more efficient version of::

    def check_signals(itr, *iterations=100_000):
        while True:
            next_slice = islice(itr, iterations)
            for count, item in enumerate(next_slice, 1):
                yield item
            if count < iterations:
                raise StopIteration

This would:

1. Provide a straightforward way for folks to explicitly opt-in to periodic signal checks
2. Provide a way to check for potential compatibility issues with other libraries and components to better assess the risks of switching the default behaviour
msg320353 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-06-24 05:19
> What if itertools were to offer an opt-in ...

This doesn't make sense to me.  As far as I can tell, the only time this issue has ever arisen in the past 15 or 16 years is when someone was trying to create an unbreakable infinite loop on-purpose.  In a way, it is no more interesting than intentionally triggering a seqfault with ctypes or bytecode hacks.  Likewise, it isn't even unique to itertools -- it shows up in any potentially long running C-code such as numpy/scipy calls.

I would like to close this issue and instead go down the path of issue 33939 which would allow consumers to detect when an input expects to be infinite.  The consumers can then decide whether they want to make periodic cntl-c checks.
msg320354 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2018-06-24 05:39
The purpose would be two-fold:

1. The presence of the `check_signals()` wrapper provides a way to more explicitly document that the other itertools iterators *don't* implicitly check for signals, so if you want to combine them with consumers that also don't check for signals, then you're going to need to wrap the iterator.

2. As a helper for integration code that's dealing with consumers that don't check for signals, but want to make those loops interruptible. Doing that in Python (as in my example) is inefficient, since you end up running Python bytecode on every iteration, and also don't have as much control over exactly when the signals get checked.

Given a solution to issue 33939, I'd drop the priority on this issue to low, but I don't think it would make it redundant.
msg320356 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-06-24 05:54
I don't think a new public API should be introduced.  This is at best an implementation detail.  

Also, I really don't want to garbage-up the inner-loop code for the itertools.  I've spent a good deal of time micro-optimizing this code and don't want to throw it away for something that is of nearly zero value and imo not a real issue that affects real users.

Marking this a closed for now.  We can discuss it more at the sprints (Python 3.8 is still a long way away).
History
Date User Action Args
2022-04-11 14:58:53adminsetgithub: 75996
2019-05-31 20:30:23terry.reedylinkissue37040 superseder
2018-06-24 05:54:58rhettingersetstatus: open -> closed
resolution: rejected
messages: + msg320356

stage: patch review -> resolved
2018-06-24 05:39:24ncoghlansetmessages: + msg320354
2018-06-24 05:19:18rhettingersetmessages: + msg320353
versions: + Python 3.8, - Python 3.7
2018-06-24 04:03:37ncoghlansetmessages: + msg320351
2018-06-22 12:51:03ncoghlansetmessages: + msg320230
2017-10-22 09:13:17koos.zevenhovensetmessages: + msg304736
2017-10-19 18:28:45tim.peterssetnosy: + tim.peters
messages: + msg304633
2017-10-19 07:50:58ncoghlansetmessages: + msg304607
2017-10-19 05:34:07rhettingersetmessages: + msg304602
2017-10-19 01:55:31ncoghlansetmessages: + msg304601
2017-10-19 01:24:37ncoghlansetmessages: + msg304600
2017-10-18 20:05:50koos.zevenhovensetnosy: + koos.zevenhoven
messages: + msg304595
2017-10-18 18:58:12serhiy.storchakasetmessages: + msg304593
2017-10-18 18:28:55serhiy.storchakasetmessages: + msg304592
2017-10-18 18:11:40serhiy.storchakasetmessages: + msg304591
2017-10-18 17:34:47rhettingersetmessages: + msg304590
2017-10-18 17:29:11rhettingersetassignee: rhettinger
2017-10-18 16:51:05serhiy.storchakasettitle: Make itertools iterators interrable -> Make itertools iterators interruptible
2017-10-18 16:50:34serhiy.storchakasetkeywords: + patch
pull_requests: + pull_request4011
2017-10-18 16:43:01serhiy.storchakacreate