classification
Title: signal.signal/signal.alarm not working as expected
Type: behavior Stage: needs patch
Components: Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: alanwilter, georg.brandl, jdemeyer, martin.panter, neologix, nirai, pitrou, ronaldoussoren, vstinner
Priority: normal Keywords: patch

Created on 2010-08-04 08:33 by alanwilter, last changed 2015-11-28 10:48 by ronaldoussoren.

Files
File name Uploaded Description Edit
io_signal.diff neologix, 2011-01-09 18:36
Messages (14)
msg112771 - (view) Author: Alan Wilter (alanwilter) Date: 2010-08-04 08:33
I have this example code to illustrate a problem I am having with python3. It works fine with python 2.6 and 2.7 but does not with python 3.1.

from __future__ import print_function
import os, subprocess, signal

def signal_handler( signum, frame ):

    print( "PID: %s" % pid )
    print( "Timed out! Process %s killed, max exec time (%ss) exceeded" % (pid, timeTol ) )
    os.kill( int( pid ), 15 )
    raise Exception( "Taking too long to finish... aborting!" )

if __name__ == '__main__':

    timeTol = 5

    cmd = 'find /'

    signal.signal(signal.SIGALRM, signal_handler)
    signal.alarm(timeTol)

    p = subprocess.Popen(cmd, shell=True, stderr = subprocess.STDOUT, stdout = subprocess.PIPE)
    pid = p.pid

    out = str( p.communicate()[0].decode() )
    print(out)


Executing it:

time python3.1 timout.py
PID: 27687
Timed out! Process 27687 killed, max exec time (5s) exceeded
Traceback (most recent call last):
  File "timout.py", line 23, in <module>
    out = str( p.communicate()[0].decode() )
  File "/sw/lib/python3.1/subprocess.py", line 719, in communicate
    stdout = self.stdout.read()
  File "timout.py", line 9, in signal_handler
    raise Exception( "Taking too long to finish... aborting!" )
Exception: Taking too long to finish... aborting!
python3.1 timout.py  0.52s user 3.88s system 19% cpu 22.841 total

#### It prints essentially the same thing with a *very* *big* difference it takes 22 seconds and actually the alarm only works when the whole task ('find /') is finished.
msg125815 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-01-09 00:26
It's due to the way the python interpreter handles signals: when the signal is received, python runs a stub signal handler that just sets a flag indicating that the signal has been received: the actual handler is executed later, synchronously, mainly from the main evaluation loop.
The problem here is that since the main process is performing a read until EOF is reached (i.e. until the subprocess exits), the C code loops around the read(2) call without giving a chance for the handler to run.
A patch for is similar issue has been applied, see http://bugs.python.org/issue9617#

But it only fixed writes, not reads.
I think _bufferedreader_read_all and _bufferedreader_read_generic should be fixed too to call PyErr_CheckSignals().
msg125830 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2011-01-09 07:43
Antoine?
msg125845 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-01-09 15:13
Charles-François' analysis seems to be right. Note that the actual issue here is that read() always succeeds, returning a partial result (because you're executing a command, 'find /', which outputs a lot of data). If read() were interrupted before anything could be read, it would return EINTR and the handler would get executed immediately.

Anyone wants to propose a patch + tests?
msg125851 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-01-09 18:36
> Antoine Pitrou <pitrou@free.fr> added the comment:
>
> Charles-François' analysis seems to be right. Note that the actual issue here is that read() always succeeds, returning a partial result (because you're executing a command, 'find /', which outputs a lot of data). If read() were interrupted before anything could be read, it would return EINTR and the handler would get executed immediately.
>
> Anyone wants to propose a patch + tests?
>

Attached is a tentative patch: it checks for pending signals inside
_bufferedreader_read_generic, fileio_readall, and rawiobase_readall.
I tested quickly on regular files and on pipes, and handlers are now
called on time. As a an obvious bonus, it's also easier to interrupt a
script doing I/O with a KeyboardInterrupt (SIGINT).
If nothing seems blatantly wrong about this patch, I will go ahead and
try to write some tests for this.

Charles

> ----------
> versions: +Python 3.2
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue9504>
> _______________________________________
>
msg125863 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-01-09 20:59
> Attached is a tentative patch: it checks for pending signals inside
> _bufferedreader_read_generic, fileio_readall, and rawiobase_readall.
> I tested quickly on regular files and on pipes, and handlers are now
> called on time. As a an obvious bonus, it's also easier to interrupt a
> script doing I/O with a KeyboardInterrupt (SIGINT).
> If nothing seems blatantly wrong about this patch, I will go ahead and
> try to write some tests for this.

A potential downside is that the bytes which have been read simply get
lost. But since a signal can happen at any moment I guess nothing can be
done to protect from it in the general case, anyway.

As for the tests, the SignalsTest class in test_io.py is a natural
recipient.
msg127960 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-02-05 01:05
It seems that my patch for issue10956 also fixes this issue (although it doesn't try to fix FileIO). It also has tests and fixes read() as well as write().
I'm sorry for not noticing that the two issues were so similar. Suggest closing this issue as duplicate.
msg127964 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-02-05 01:12
Sorry, bad analysis. The patch in issue10956 only seemed to work because my system is too fast and "find /" in the script returned in less than 5 seconds. Replacing it with "yes" shows that the two issued are actually independent.
msg254946 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-11-20 01:53
With Python’s current technique for signal handling, I think there is always going to be a window where a signal can arrive at the low level after checking the Python flag and before starting a blocking call. If you need to robustly handle a signal maybe it is better to use set_signal_fd() and select() or something.

I haven’t looked at the patch closely, but it sounds like it could reduce the size of the window. This would make interactive signals like SIGINT more reliable, so it may be worthwhile. It probably needs properly reviewing in light of the recent EINTR (PEP 457) changes though.
msg254948 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-11-20 01:59
Sorry, EINTR changes were PEP 475.
msg255484 - (view) Author: Jeroen Demeyer (jdemeyer) * (Python triager) Date: 2015-11-27 20:35
Just a comment: if you need really robust signal handling, you just cannot do it with pure Python. I would recommend using Cython, where one has complete control over when signals are checked.
msg255499 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-11-27 22:44
I don't understand this issue.

I would expect that a system call fails with EINTR if it is interrupted by a signal. Python explicitly configures the C library to get EINTR errors: automatic retry is disabled when Python sets its signal handlers.

If a system call doesn't fail, what shoud we do with the result, especially when reading data (read/recv)?

Yes, we _can_ checks to check regulary if we got a C signal by adding checks, but it's unclear to me if we must raise the exception immediatly, or later. Immedialty if I understand correctly the initial message. So should we check *all* system calls, or only system calls well known to block on I/O, like read/write, send/recv, select, etc.?

Do we try to workaround design choices by the Linux kernel and the GNU libc?

Right now, my favorite choice would be to do nothing. Yes, sometimes signals are delayed. Sometimes, a signal doesn't interrupt immediatly a blocked program. It's not perfect. But it's already very complex, I'm not sure that I want to make the code even more complex to check for signals in more places.
msg255520 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-11-28 03:04
Okay, I tend to agree, and don’t have a problem with rejecting this.
msg255536 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2015-11-28 10:48
Victor: read(2) and recv(2) are defined as returning all available data when interrupted by a signal and might therefore not return an error. The same is true for write(2): it returns the number of bytes written when there are any bytes written (instead of setting errno to SIGINT and returning an error).

I'd expect the C implementation of readall to behave the same as the Python implementation: a signal handler that raises an exception can cause data loss due to discarding the buffer that's used to collect the data.  The data loss in the script isn't worse than it currently is, AFAIK it would currently lose data due to raising the exception just after the call to readall in the next loop through the eval loop.

If I read the issue correctly the signal handling issue mentioned here is basically similar to the one in this scriptlet:

# BEGIN
import signal, sys
import subprocess

def handler(sig, frames):
    raise RuntimeError("Alarm!")

signal.signal(signal.SIGALRM, handler)
signal.alarm(5)
b = sys.stdin.buffer.raw.readall()
print(len(b))
signal.alarm(0)
#END

When you run this with a large file on stdin (I've used /dev/zero) the script will continue to run indefinitely and cannot be cleanly interrupted at all (on my system its eventually killed by the system due to running out of its rlimit limitations, otherwise it can only be killed by sending it an uncatchable signal like SIGKILL or SIGABRT).
History
Date User Action Args
2015-11-28 10:48:14ronaldoussorensetnosy: + ronaldoussoren
messages: + msg255536
2015-11-28 03:04:18martin.pantersetmessages: + msg255520
2015-11-27 22:44:51vstinnersetmessages: + msg255499
2015-11-27 20:35:53jdemeyersetnosy: + jdemeyer
messages: + msg255484
2015-11-20 14:45:14vstinnersetnosy: + vstinner
2015-11-20 01:59:53martin.pantersetmessages: + msg254948
2015-11-20 01:53:04martin.pantersetnosy: + martin.panter
messages: + msg254946
2011-06-30 17:58:24pitrousetassignee: pitrou ->
stage: patch review -> needs patch
versions: + Python 3.3, - Python 3.1
2011-05-10 02:30:36niraisetnosy: + nirai
2011-02-05 01:12:58pitrousetversions: + Python 2.7
nosy: georg.brandl, pitrou, neologix, alanwilter
messages: + msg127964
resolution: duplicate ->

superseder: file.write and file.read don't handle EINTR ->
stage: patch review
2011-02-05 01:05:51pitrousetnosy: georg.brandl, pitrou, neologix, alanwilter
resolution: duplicate
superseder: file.write and file.read don't handle EINTR
messages: + msg127960
2011-01-09 20:59:24pitrousetnosy: georg.brandl, pitrou, neologix, alanwilter
messages: + msg125863
2011-01-09 18:36:40neologixsetfiles: + io_signal.diff

messages: + msg125851
keywords: + patch
nosy: georg.brandl, pitrou, neologix, alanwilter
2011-01-09 15:13:57pitrousetnosy: georg.brandl, pitrou, neologix, alanwilter
messages: + msg125845
versions: + Python 3.2
2011-01-09 07:43:29georg.brandlsetassignee: pitrou

messages: + msg125830
nosy: + georg.brandl
2011-01-09 01:41:53eric.araujosetnosy: + pitrou
2011-01-09 00:26:56neologixsetnosy: + neologix
messages: + msg125815
2010-08-04 08:33:55alanwiltercreate