classification
Title: asyncio: limit EINTR occurrences with SA_RESTART
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: aba, gregory.p.smith, gvanrossum, haypo, neologix, python-dev
Priority: normal Keywords: patch

Created on 2013-12-01 12:09 by neologix, last changed 2013-12-06 06:31 by neologix. This issue is now closed.

Files
File name Uploaded Description Edit
asyncio_sa_restart.diff neologix, 2013-12-01 12:09 review
Messages (18)
msg204915 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-12-01 12:09
asyncio makes heavy use of SIGCHLD for subprocesses.
A consequence of this if that a lot of syscalls can fail with EINTR (see e.g. issue #18885).

The attached patch uses SA_RESTART (through signal.siginterrupt()) to limit EINTR occurrences, e.g. :
"""
$ ./python -c "import os; from signal import *; signal(SIGALRM, lambda *args: None); alarm(1); os.read(0, 1)"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
InterruptedError: [Errno 4] Interrupted system call
"""

With siginterrupt(False):
"""
$ ./python -c "import os; from signal import *; signal(SIGALRM, lambda *args: None); siginterrupt(SIGALRM, False); alarm(1); os.read(0, 1)"
[manual interruption]
^CTraceback (most recent call last):
  File "<string>", line 1, in <module>
KeyboardInterrupt
"""

It doesn't come with test because it's hard to come up with a syscall that's guaranteed to be restarted on al OS (and also because I'm having a hard time with all those mock-based asyncio tests ;-), but at least it doesn't break the test suite :-)

See https://groups.google.com/d/topic/python-tulip/9T2_tGWe0Sc/discussion for more background.
msg204944 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-12-01 16:56
Do you haven an example of a program using asyncio that fails because of this?

Adding gps because he seems to agree that EINTR must die.
msg204950 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2013-12-01 19:40
It sounds like doing this is fine (as Glyph suggests in the email thread) but it isn't magically going to solve all EINTR problems, just reduce the number of calls that could encounter them.

http://man7.org/linux/man-pages/man7/signal.7.html see the "Interruption of system calls and library functions" section.

Note that with this SA_RESTART flag set via siginterrupt you _may_ be making a trade off between being able to process python signal handlers during long reads or writes vs having to wait until the entire thing has finished.

given that, I'm ambivalent about this patch being worthy.

(re: Explicitly testing this, it is hard. for the broader Python test suite as a whole I've been pondering a regrtest mode that'll continually pester all of the test processes with signals to try and expose EINTR issues.  Low on my TODO ideas list, I haven't written code to do it.)
msg204961 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-12-01 21:09
> It sounds like doing this is fine (as Glyph suggests in the email thread) but it isn't magically going to solve all EINTR problems, just reduce the number of calls that could encounter them.

Indeed, hence "*limit* EINTR occurrences" :-)

> Note that with this SA_RESTART flag set via siginterrupt you _may_ be making a trade off between being able to process python signal handlers during long reads or writes vs having to wait until the entire thing has finished.

asyncio uses a wakeup FD, registered in the main event loop: so as
soon as a signal is received, the main loop will wake up and run the
signal handler. So this would only be a problem if you were doing a
blocking syscall from the main loop thread, which would be a really
bad idea anyway.
msg204997 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-12-02 11:07
asyncio_sa_restart.diff looks good to me, it cannot make the situation worse.

signal.siginterrupt() looks to be available on all non-Windows buildbots and working correctly: test_signal tests it and the test pass.
msg205028 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-12-02 16:54
I have a question. Is there actually any need for this with asyncio? The selector already handles EINTR, and all the I/O done by asyncio's transports already handles it too (there are "except (BlockingIOError, InterruptedError)" clauses all over the place).

Any *other* I/O syscalls (unless a program violates the asyncio rules against doing your own blocking I/O) would either be disk file I/O, which IIUC cannot elicit EINTR, or run in a separate thread, where presumably it wouldn't be interrupted by a signal handler, since SIGCHLD is delivered to the main thread.  (It's actually the last part I am not 100% sure of.)
msg205029 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-12-02 16:58
> I have a question. Is there actually any need for this with asyncio?

I believe that the libc and the kernel knows better than Python how to
restart a syscalls, than Python. I expect more reliable timeout, or
the kernel may avoid context switches (don't wake up the process).

In practice, you should not see any difference. Or maybe only on some
corner cases. But do you want to handle these corner cases while there
is already a portable flag for them? :-)
msg205031 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-12-02 17:04
That's just rhetoric -- I am beginning to believe that nobody has any data.  Here's some opposite rhetoric.  If it ain't broke, don't fix it.  Plus, if it's so much better, why isn't it the default?  If you say "for backward compatibility" I will say "exactly!"
msg205057 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2013-12-02 21:46
>
> I believe that the libc and the kernel knows better than Python how to
> restart a syscalls, than Python. I expect more reliable timeout, or
> the kernel may avoid context switches (don't wake up the process).
>

See the man page i linked to, calls like select and poll with timeouts are
not auto-restarted.
msg205060 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-12-02 22:25
> Guido van Rossum added the comment:
>
> That's just rhetoric -- I am beginning to believe that nobody has any data.  Here's some opposite rhetoric.  If it ain't broke, don't fix it.  Plus, if it's so much better, why isn't it the default?  If you say "for backward compatibility" I will say "exactly!"

Well, it's the default on BSD and Linux, but Python deliberately disables it:
"""
PyOS_setsig(int sig, PyOS_sighandler_t handler)
{
#ifdef HAVE_SIGACTION
    /* Some code in Modules/signalmodule.c depends on sigaction() being
     * used here if HAVE_SIGACTION is defined.  Fix that if this code
     * changes to invalidate that assumption.
     */
    struct sigaction context, ocontext;
    context.sa_handler = handler;
    sigemptyset(&context.sa_mask);
    context.sa_flags = 0;
    if (sigaction(sig, &context, &ocontext) == -1)
        return SIG_ERR;
    return ocontext.sa_handler;
#else
    PyOS_sighandler_t oldhandler;
    oldhandler = signal(sig, handler);
#ifdef HAVE_SIGINTERRUPT
    siginterrupt(sig, 1);
#endif
    return oldhandler;
#endif
}
"""

It's done because we want syscalls to return with EINTR to be able to
run signal handlers, but since asyncio uses a wakeup file descriptor,
it's unnecessary here.

> Any *other* I/O syscalls (unless a program violates the asyncio rules against
> doing your own blocking I/O) would either be disk file I/O, which IIUC cannot
> elicit EINTR, or run in a separate thread, where presumably it wouldn't be
> interrupted by a signal handler, since SIGCHLD is delivered to the main thread.
>  (It's actually the last part I am not 100% sure of.)

In order:
- Linux avoids EINTR on file I/O, but that's not necessarily the case
on other OS (e.g. on NFS)
- Many syscalls can return EINTR, not only I/O related, e.g. waitpid().
- As for threads, there's absolutely no guarantee that the signal will
be delivered to the main thread.
msg205084 - (view) Author: Anthony Baire (aba) Date: 2013-12-03 08:36
The patch is fine, but it is hard to rely on it to prevent bugs from happening because that requires cooperation from all modules registering signal handlers.

Anyway it facilitates reusing code that was not written for an event-driven context (and many will do that through .run_in_executor()). If the patch is accepted, it would be wise to write a note in .run_in_executor()'s doc saying that asyncio uses SA_RESTART by default in all its handler and that EINTR is prevented *as long as* no other handlers are registered elsewhere.
msg205096 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-12-03 09:31
SA_RESTART doesn't need to be enforced. It's better to use it, but
selectors and asyncio modules already handle EINTR error.
msg205121 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-12-03 15:39
Please answer this question. Under what circumstances can a signal handler interrupt a blocking system call in a thread that is not the main thread?
msg205127 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-12-03 16:07
2013/12/3 Guido van Rossum <report@bugs.python.org>:
> Please answer this question. Under what circumstances can a signal handler interrupt a blocking system call in a thread that is not the main thread?

There is no guarantee that the signal handler is called in the main
thread. On FreeBSD, if I remember correctly, it is called in a random
thread.

You can control which thread gets the signal using
signal.pthread_sigmask() (block signals in other threads) and
signal.pthread_kill() (send a signal a specific thread).
msg205140 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-12-03 19:43
> The patch is fine, but it is hard to rely on it to prevent bugs from happening because that requires cooperation from all modules registering signal handlers.

Once again, that's why the bug report says "*limit* EINTR occurrences".
We all know this won't prevent all occurrences.

> Anyway it facilitates reusing code that was not written for an event-driven context (and many will do that through .run_in_executor()). If the patch is accepted, it would be wise to write a note in .run_in_executor()'s doc saying that asyncio uses SA_RESTART by default in all its handler and that EINTR is prevented *as long as* no other handlers are registered elsewhere.

The single most common cause of signals is SIGCHLD (in a "normal"
code). Since asyncio setups up the SIGCHLD handler, this should cover
most of the cases (BTW, just set up a SIGCHLD handler in any Python
process spawning subprocesses, and it becomes unusable since EINTR
isn't handled in the stdlib).

> Please answer this question. Under what circumstances can a signal handler interrupt a blocking system call in a thread that is not the main thread?

I answered in my prevous message: POSIX makes no guarantee whatsoever
as to which thread will receive the signal (except of course in case
of synchronous signales like SIGSEGV/SIGFPE). The Linux kernel makes
it best to deliver it to the main thread when possible, but in
certains cases it can't, and other OS just don't bother. See e.g.
issue #19857 for an occurrence on FreeBSD. Also, even the main thread
can sometimes make blocking calls subject to EINTR (e.g. acquiring a
lock).

So the possibility of breakage are real, but if people prefer to wait
& see, that's fine :-)
msg205141 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-12-03 19:46
OK, I've harassed you enough. Sorry. Go ahead and commit this.
msg205322 - (view) Author: Roundup Robot (python-dev) Date: 2013-12-05 21:54
New changeset 546cad3627e2 by Charles-François Natali in branch 'default':
Issue #19850: asyncio: Set SA_RESTART when registering a signal handler to
http://hg.python.org/cpython/rev/546cad3627e2
msg205354 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-12-06 06:31
Thanks!
History
Date User Action Args
2013-12-06 06:31:44neologixsetstatus: open -> closed
resolution: fixed
messages: + msg205354

stage: patch review -> resolved
2013-12-05 21:54:05python-devsetnosy: + python-dev
messages: + msg205322
2013-12-03 19:46:15gvanrossumsetmessages: + msg205141
2013-12-03 19:43:04neologixsetmessages: + msg205140
2013-12-03 16:07:17hayposetmessages: + msg205127
2013-12-03 15:39:38gvanrossumsetmessages: + msg205121
2013-12-03 09:31:01hayposetmessages: + msg205096
2013-12-03 08:36:41abasetnosy: + aba
messages: + msg205084
2013-12-02 22:25:47neologixsetmessages: + msg205060
2013-12-02 21:46:16gregory.p.smithsetmessages: + msg205057
2013-12-02 17:04:17gvanrossumsetmessages: + msg205031
2013-12-02 16:58:13hayposetmessages: + msg205029
2013-12-02 16:54:27gvanrossumsetmessages: + msg205028
2013-12-02 11:07:10hayposetnosy: + haypo
messages: + msg204997
2013-12-01 21:09:57neologixsetmessages: + msg204961
2013-12-01 19:40:23gregory.p.smithsetmessages: + msg204950
2013-12-01 16:56:13gvanrossumsetnosy: + gregory.p.smith
messages: + msg204944
2013-12-01 12:09:15neologixcreate