classification
Title: Occasionally check for Ctrl-C in long-running operations like sum
Type: behavior Stage:
Components: Interpreter Core Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: gslavin, haypo, ncoghlan, rhettinger, steven.daprano, terry.reedy, trey, wim.glenn
Priority: low Keywords: patch

Created on 2016-02-12 17:47 by steven.daprano, last changed 2016-09-30 14:54 by trey.

Files
File name Uploaded Description Edit
KeyboardInterrupt.patch gslavin, 2016-09-20 01:16 Allows interruption of builtin routines using ctrl-c review
test_sig_int_builtins.py gslavin, 2016-09-20 01:18
Messages (15)
msg260189 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2016-02-12 17:47
There are a few operations such as summing or unpacking infinite iterators where the interpreter can become unresponsive and ignore Ctrl-C KeyboardInterrupt. Guido suggests that such places should occasionally check for signals:

https://mail.python.org/pipermail/python-ideas/2016-February/038426.html
msg260238 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2016-02-13 12:41
At least list() and potentially other container constructors are also affected.

While it's mentioned in the thread, I'll explicitly note here that the problem is specifically with iterators implemented in C, like itertools.count().

Iterators implemented in Python already evaluate Python code on each iteration, which means Ctrl-C gets detected and converted to KeyboardInterrupt.
msg260535 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2016-02-19 23:53
Great idea.  In Windows, closing the window with [x] will kill the process, at the cost of loosing its contents.  In IDLE's Shell, Restart Shell will do the same without killing IDLE, but it is easy to not know of or forget that option in a moment of panic.
msg260536 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-02-19 23:59
> Great idea.  In Windows, closing the window with [x] will kill the process, at the cost of loosing its contents.

Note: after a few years, I heard that Windows supports something like SIGKILL: CTRL+Pause kills the current process ;-) You loose the process, but you don't have to close the terminal, confirm and reopen a new terminal, go back to your working directly, etc.

Note 2: Even more off-topic, type .~<enter> in an SSH session to kill it, again it avoids to reopen a terminal window ;-)
msg260556 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2016-02-20 03:59
I verified that ^Break(pause) works.  It even causes ^C to be printed.
msg276995 - (view) Author: George Slavin (gslavin) * Date: 2016-09-20 01:16
I have a patch that checks for KeyboardInterrupt during builtin operations.  This allows sum, max, min, list, dict, set, and tuple calls to be interrupted when they are working on infinite iterators.

I've attached the patch, and a test I wrote to show that you can ctrl-c out of all the above calls.

This is my first attempt at a patch, so I would appreciate any feedback on what I need to fix to allow it to be submitted :)
msg276996 - (view) Author: George Slavin (gslavin) * Date: 2016-09-20 01:18
I've attached the test for this patch (I couldn't figure out how to upload two files with one comment).
msg277002 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-09-20 03:57
I think this needs more discussion on python-dev before going down this path.

In reality, we have many places that have "long running" C code when fed extreme arguments.  In practice, we almost never have a problem with these except for cute toy bug reports.   To "fix" this, we would need to alter all possible long running data consumers or alter all possible long running data producers.  This kind of change is hard to test, gums up the code, and hinders future maintainability for near zero benefit to ordinary users.

    min(range(100000000000))    # There a lots of places like this

The proposed patch is indelicate about the addition of signal checking.  It check on every single iteration right in the middle of most highly optimizied, tightest, most speed critical loops in Python, making every use pay a cost for something almost no one will ever benefit from.
msg277094 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2016-09-21 05:55
While I agree with Raymond regarding the performance implications if this isn't handled carefully, I think we're also getting to a point where better accounting for signal handling latency in inner loops is something that could be considered for 3.7 - the benchmarking infrastructure being built out to measure performance optimisations would also allow overhead tuning of a "batched iteration" idiom where signals were checked for either every N thousand iterations, periodically based on time, or some combination of the two.

Benchmarking to measure the speed impact is going to be essential, though - this is a case where changing the behaviour is clearly possible, so the key questions are whether or not the resulting runtime overhead can be made low enough to be deemed acceptable, and whether or not it can be done in a way that doesn't make the affected parts of the code base effectively unreadable.
msg277096 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2016-09-21 06:04
As far as the "What's the benefit to users?" question goes, I think the main intended beneficiaries would be children and other folks playing at the command prompt and trying out different things.

The "no segfaults from normal Python code" rule aims to make that kind of exploration a significantly more positive experience than it is in a language like C - you're far more likely to get a traceback than you are to have the interpreter fall over completely. Tracebacks can be intimidating to new users, but they still give them new information to work with.

Infinite loops at the Python level are similarly about as friendly to ad hoc exploration as we can possibly make them: Ctrl-C will break you out of them with a traceback.

Implementation level infinite (or near-infinite, or finite-but-eating-all-of-RAM) loops by contrast are much closer to their traditional C level counterparts: your only way out is via destructive termination of the entire process.

So that's why I think this is an idea worth exploring further, even though it may still turn out to be impractical for code readability or runtime speed reasons.
msg277098 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-09-21 06:51
Can someone work on a patch? Then we can benchmark it to take a
decision ;-) Maybe we might expose signalmodule.c internals to have a
cheaper check?
msg277185 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2016-09-21 21:42
The request is to check 'occasionally'.  To me this means perhaps once a second, which is to say, ^c should generally interrupt within a second, and on average with half a second.  The following takes just under a second: "sum(i for i in range(10000000))" whereas "for i in range(10000000): pass" takes a fourth of that.
msg277190 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2016-09-22 05:14
George's initial patch that naively checks for signals on every iteration could be used to get an upper bound on the likely benchmark impact.

I do think this is a case where we'll want a dedicated microbenchmark to complement the macrobenchmark suite, though - Raymond's right that these functions are frequently used to optimise sections of code that have already been identified as performance bottlenecks in a particular application, so we need to be really careful with changes that might make them slower (even if the current macrobenchmarks say things are still broadly OK).
msg277220 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-09-22 11:33
You should try https://github.com/python/performance to get reliable
benchmark results ;-)
msg277757 - (view) Author: Trey Hunner (trey) Date: 2016-09-30 14:54
This is a problem I experience occasionally while teaching and while developing teaching curriculum.

I tend to close problem windows quickly enough to avoid a computer crash in front of a live audience, but it's still an annoyance to get the REPL state back to the way I had it before killing the process.

This mostly happens when I'm teaching iterators or itertools, but occasionally I hit a "Ctrl-C free zone" in other ways.  I believe this memory-filling snippet wouldn't respond to Ctrl-C either: x=[0]*2**30

I hadn't thought to report a bug on this because my use seemed niche.  I mostly get this error while demonstrating concepts via weird/incorrect code at the REPL.
History
Date User Action Args
2016-09-30 14:54:40treysetnosy: + trey
messages: + msg277757
2016-09-22 11:33:38hayposetmessages: + msg277220
2016-09-22 05:14:46ncoghlansetmessages: + msg277190
2016-09-21 21:42:39terry.reedysetmessages: + msg277185
2016-09-21 06:51:45hayposetmessages: + msg277098
2016-09-21 06:04:39ncoghlansetmessages: + msg277096
2016-09-21 05:55:25ncoghlansetmessages: + msg277094
2016-09-20 03:57:30rhettingersetpriority: normal -> low

messages: + msg277002
versions: + Python 3.7, - Python 3.6
2016-09-20 01:18:28gslavinsetfiles: + test_sig_int_builtins.py

messages: + msg276996
2016-09-20 01:16:48gslavinsetfiles: + KeyboardInterrupt.patch

nosy: + gslavin
messages: + msg276995

keywords: + patch
2016-03-30 22:11:23wim.glennsetnosy: + wim.glenn
2016-02-20 03:59:47terry.reedysetmessages: + msg260556
2016-02-19 23:59:36hayposetmessages: + msg260536
2016-02-19 23:53:19terry.reedysetnosy: + terry.reedy
messages: + msg260535
2016-02-14 04:13:54rhettingersetmessages: - msg260255
2016-02-13 23:25:06rhettingersetmessages: + msg260255
2016-02-13 12:56:57serhiy.storchakasetnosy: + rhettinger
2016-02-13 12:41:11ncoghlansetnosy: + ncoghlan
messages: + msg260238
2016-02-12 17:48:32hayposetnosy: + haypo
2016-02-12 17:47:32steven.dapranocreate