classification
Title: SocketServer doesn't handle syscall interruption
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Christophe Simonis, Jerzy.Kozera, Yaniv.Aknin, bda, exarkun, giampaolo.rodola, gregory.p.smith, haypo, kchen, neologix, nvetoshkin, pitrou, python-dev, spiv
Priority: normal Keywords: patch

Created on 2010-02-21 19:14 by nvetoshkin, last changed 2012-04-08 23:48 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
SocketServer_eintr.diff neologix, 2010-04-04 11:18 handle EINTR patch
eintr_safety.py Yaniv.Aknin, 2010-04-08 06:06 Toy 3.x program to play with select() and EINTR
sockServe.py bda, 2010-04-28 19:58 example
socketserver_eintr_py3k_updated.diff Yaniv.Aknin, 2010-05-13 00:32 review
socketserver_eintr_20120406.diff Jerzy.Kozera, 2012-04-06 21:42 review
Messages (32)
msg99676 - (view) Author: Vetoshkin Nikita (nvetoshkin) Date: 2010-02-21 19:14
SocketServer's handle_request function uses "select" call to handle io, but sending POSIX signal will result in 'Interrupted system call' exception raised. After that Paste (http://pythonpaste.org/) http server will crash.
I suppose EINTR must be handled properly (i.e. syscall must be restarted silently) on SocketServer's side. That must be pretty easy task.
msg100704 - (view) Author: Vetoshkin Nikita (nvetoshkin) Date: 2010-03-09 07:35
Wrapping select in (taken from twisted sources) can help:
def untilConcludes(f, *a, **kw):
    while True:
        try:
            return f(*a, **kw)
        except (IOError, OSError), e:
            if e.args[0] == errno.EINTR:
                continue
            raise
msg102331 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2010-04-04 11:18
A patch to make select calls EINTR-safe is attached, but:
- there are many modules that suffer from the same problem, so maybe a fix at select level would be better
- if not, it could be a good idea to add this EINTR-retry handler to a given module (signal, os?) so that every module doesn't have to reimplement it (I know that at least subprocess does have one).

Thoughts ?
msg102333 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-04 11:46
What kind of signals can be received in real-life?
(I'm assuming that most signals are of the kind that you only receive if someone else deliberately sends it to you, in which case you are supposed to be prepared to handle it)

Also, does your patch still allow Python to handle the signal properly (invoke a registered signal handler, or perhaps raise KeyboardInterrupt)?

You could raise the issue on the python-dev mailing-list for more advice.
msg102336 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2010-04-04 12:27
In real life, you can receive for example SIGSTOP (strace, gdb, shell), but mostly SIGCHLD (any process spawning children), etc. The attached patch just restarts calls when EINTR is received, as is done in subprocess module. The mailing list is a good idea.
msg102337 - (view) Author: Jean-Paul Calderone (exarkun) * (Python committer) Date: 2010-04-04 12:31
> What kind of signals can be received in real-life?

There are lots of possible answers.  Here's one.  You launch a child process with os.fork() (and perhaps os.exec*() as well).  Eventually you'll get a SIGCHLD in the parent when the child exits.  If you've also installed a Python-level SIGCHLD handler with signal.signal (and not flagged the handler with SA_RESTART using signal.siginterrupt) then if you're in select.select() when the SIGCHLD is received, select.select() will fail with select.error EINTR.

If your Python SIGCHLD handler deals with the SIGCHLD completely before returning, then this lets you support child processes and use SocketServer at the same time.

Changing the behavior of select.select() ("a fix at select level would be better") should be *very* carefully considered, and probably rejected after such consideration.  There are completely legitimate use cases which require select.select() to fail with EINTR, and these should remain supported (not to mention that existing code relying on this behavior shouldn't be broken by a Python upgrade in a quite insidious way).
msg102363 - (view) Author: Vetoshkin Nikita (nvetoshkin) Date: 2010-04-05 05:36
> What kind of signals can be received in real-life?

We use SIGUSR1 to reopen log files after rotation. sighandler works just fine, but after that Paste crashes.

I suppose that implementing silent syscall restart at select.select() level is a bad idea (explanation is given already), but some helper like untilConcludes in socket module might be useful.
msg102591 - (view) Author: Yaniv Aknin (Yaniv.Aknin) Date: 2010-04-08 06:06
First, let me cast my vote in favour of leaving select() itself alone, it's excessive to fix this issue with changing select().

Second, while exarkun's comments are accurate, I understand where pitrou is coming from. EINTR is not a pathological case, but it's not commonplace, either. I believe that on modern Unices (I tried Linux and OpenSolaris), the default handler for SIGCHLD will restart interrupted calls. By default, SIGUSR1 and most sane others, will kill the process anyway. So I can understand the purity of the argument that if a process sets a signal handler, it might just as well set SA_RESTART and be done with it.

That said, Imagine I write a forking server with SocketServer which sets a signal handler for SIGCHLD, and I don't set SA_RESTART because I don't think I need to. I believe this is a valid real-world scenario where (a) I don't expect an EINTER-related exception to be raised from SocketServer, (b) SocketServer can actually do something about it, and (c) I doubt the old EINTR behaviour is relied upon by anyone (now I'm /certain/ someone relying on this will pop-up...). I think its a safer bet on behalf of SocketServer (and subprocess, etc) to use something like untilConcludes.

If we agree so far, I believe that an implementation of untilConcludes *should* be added to stdlib ("signal.restartable_call", anyone?). If people agree, I'd be happy to produce the patch (trunk+py3k, doc+test). Other than SocketServer and subprocess, anything else that should use "untilConcludes"?

Finally, to answer pitrou's second question, I'm pretty sure untilConcludes as presented will not hurt other signal handlers. The interrupt handler will be executed before returning to untilConcludes' loop and manually restarting the call, so KeyboardInterrupt (or anything else signalistic) will behave as expected.

I'm attaching a toy 3.x program to play with select() and EINTR, I used it in writing this comment. Gee, it even has an OptionParser CLI to save you some commenting-out.
msg102592 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2010-04-08 06:28
wrapping select in an eintr handler that loops like SocketServer_eintr.diff is fine.
msg102594 - (view) Author: Vetoshkin Nikita (nvetoshkin) Date: 2010-04-08 06:41
According to "man 7 signal" select must be explicitely restarted, regardless of the SA_RESTART flag.
msg102596 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2010-04-08 07:19
> If we agree so far, I believe that an implementation of untilConcludes *should* be added to stdlib ("signal.restartable_call", anyone?).

Definitely, it's better to have this handler written once and correct than having various implementations scattered around libraries.

> If people agree, I'd be happy to produce the patch (trunk+py3k, doc+test).

Go ahead :-)

> According to "man 7 signal" select must be explicitely restarted, regardless of the SA_RESTART flag.

man sigaction states this:

    SA_RESTART 

Provide behaviour compatible with BSD signal semantics by making _certain_ system calls restartable across signals. 

Indeed: you can't really rely on SA_RESTART, since certain syscalls are not restartable, and this flag isn't really portable.
select is even specific since the file descriptor sets can be modified, and the timeout too.

That's why it's definitely better to take the EINTR-wrapper approach.
msg102660 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-09 00:44
Ok, I think the wrapper approach is a good one. Now remains to decide whether it should stay private to socketserver or be exposed as a generic method (for example in the "os" module as proposed).

By the way, adding a test would be nice, too. I'm not sure it what form it should be done; perhaps register a signal handler and trigger the signal from a sub-thread while the main thread is iterating in a select loop?
msg102680 - (view) Author: Vetoshkin Nikita (nvetoshkin) Date: 2010-04-09 05:57
I vote for making a global wrapper. As neologix pointed - many other modules can be (or are) affected. Futher found bugs could be fixed using that global wrapper.
msg102690 - (view) Author: Andrew Bennetts (spiv) Date: 2010-04-09 07:01
Note that a trivial untilConcludes isn't correct for select if a timeout was passed.  If a select(..., 60) was interrupted after 59 seconds, you probably want to restart it with a timeout of 1 second, not 60.

The SocketServer_eintr.diff patch has this flaw.
msg102704 - (view) Author: Yaniv Aknin (Yaniv.Aknin) Date: 2010-04-09 09:24
pitrou, re. test code: I actually started with the test code, so that part is done.

I opted to use a forked SocketServer rather than threads. I'm not an expert on the low-level details of a multi-threaded process receiving threads, but it seems to me like there's a chicken and egg issue here: suppose we have threadA (blocking on select()) and threadB (managing the test and doing something like os.kill(os.getpid(), SIGFOO). I think by definition the thread running when the signal is received is threadB (since it sent the signal), and I'm not sure threadA will be interrupted properly that way (threads are sneaky, now aren't they). So I cleaned up test_socketserver.py a bit, split it to a base test case and concrete test cases, and added a separate test case which uses os.fork() rather than threads and tests that way.

The attached patch is against recent py3k and includes both neologix's patch (adapted to py3k) and the added test code. If this looks good, I'll easily backport it to 2.x/trunk.

spiv, re. handling timeout in select: You're very right, and I'm not sure how we should handle it if we want to take the "untilConcludes" code that is currently shared between subprocess.py and socketserver.py and move it outside. For subprocess' and socketserver's specific cases, the naive implementation which doesn't handle timeouts is OK, because they don't use the timeout parameter anyway.

Moving it outside warrants more discussion. Remember this isn't just about select() - it's a general user-space-restarter for any of the myriad of "slow" system calls that might be interrupted. There is some tension between elegance and even feasibility of correct implementation vs. the wrapper hiding functionality of some of these system calls. You could argue that if the user of the wrapper is unhappy about the lost functionality, they can make their own wrapper. 

Unless we can reach a resolution quickly, my gut feeling is that the current patch should be applied as-is and this issue be closed. A new issue along the lines of "Extract duplicate interruption-restart code to a relevant module and review stdlib for unprotected select() calls" should be opened instead. Your thoughts?
msg104455 - (view) Author: Bryce Allen (bda) Date: 2010-04-28 19:58
I encountered this issue when trying to exit cleanly on SIGTERM, which I use to terminate background daemons running serve_forever.

In BaseServer, a threading.Event is used in shutdown, so it can block until server_forever is finished (after checking __serving). Since the SIGTERM interrupts the select system call, the event set is never reached, and shutdown hangs waiting on the event.

I've attached an example of the pattern I was trying to use in my server. There are several ways around the issue, but looking at the API it seems like this _should_ work, and in my experience all servers have clean-up code so it's a very common case.
msg104459 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-28 20:06
> In BaseServer, a threading.Event is used in shutdown, so it can block
> until server_forever is finished (after checking __serving). Since the
> SIGTERM interrupts the select system call, the event set is never
> reached, and shutdown hangs waiting on the event.

This precise use case is already fixed (in SVN trunk and in the 2.6
branch) since the select() loop is now wrapped in a try..finally.
I just ran your test case and killing -TERM works ok.
msg105617 - (view) Author: Yaniv Aknin (Yaniv.Aknin) Date: 2010-05-13 00:32
While bda's usecase is indeed fixed is recent versions, I want to point out the issue still exists in a recent py3k checkout. I've updated my patch to apply cleanly against recent py3k (and made less structural changes to test_socketserver.py; I still think it has an archaic unittest structure but I want this patch to remain relevant a bit longer, maybe fix the structure at a later time).

Again, I argue that resolving the general case (a generic untilConcludes to be exposed in stdlib) is hard and that we have a patch review worthy (IMO) solution at hand. Let's apply it (and possibly open a separate ticket for the generic case, if that's deemed interesting).
msg120898 - (view) Author: Vetoshkin Nikita (nvetoshkin) Date: 2010-11-09 20:28
Any news on this? Could we possibly apply patch as is? If I'm not mistaken timeout issue is the only one left unresolved.

About timeout. The most elegant way, would be to use select's syscall timeout parameter, but man 2 select says: "On  Linux,  select()  modifies  timeout to reflect the amount of time not slept; most other implementations do not do this. (POSIX.1-2001 permits either behavior.)  This causes problems both when Linux code which reads timeout is ported to other operating systems, and when code is ported to Linux that reuses a struct timeval for  multiple select()s in a loop without reinitializing it. Consider timeout to be undefined after select() returns."

I'm ready to adapt Yaniv's patch to 2.7
msg125234 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2011-01-03 20:43
I've added some code comments on http://bugs.python.org/review/7978/show

overall I think the patch is right, I pointed out one thing to clean up and I think the unittest can be greatly simplified by using stubbed out mock select.select() instead of fork+signals+sleeping.
msg137361 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-05-31 14:55
Using signalfd() (require Linux 2.6.22+), specified signals are written in a file and don't interrupt system calls (select). Bonus: we can wait for a signal using select!

Using pthread_sigmask(), you can also block signals before calling select, but it's not atomic. Or you can use pselect() which is pthread_sigmask()+select() in an atomic fashion.

pthread_sigmask() is already in Python 3.3, I plan to add signalfd() in a few weeks into Python 3.3. For pselect(), I don't know if we need it.

pthread_sigmask() is available on most POSIX systems, not on Windows. Can select() be interrupted by CTRL+c on Windows?

I don't say that we should not restart manually select() on EINTR, just that better solutions do exist today.
msg137362 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2011-05-31 14:55
I think that #12224 is a duplicate of this issue.
msg137414 - (view) Author: Vetoshkin Nikita (nvetoshkin) Date: 2011-06-01 04:06
I don't think we should block signals - we can sleep on select for about forever.
msg157695 - (view) Author: Jerzy Kozera (Jerzy.Kozera) Date: 2012-04-06 21:42
I've updated the patch according to suggestions from Gregory P. Smith. Thanks to a change from #12555 (PEP 3151) now just checking for OSError is enough.

(I've decided to use mocked select() instead of calling alarm() to avoid depending on timing.)
msg157725 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-07 10:45
Jerzy's latest patch looks ok to me.
This is a slight behaviour change so I'm not sure it should go in 3.2/2.7.
msg157814 - (view) Author: Roundup Robot (python-dev) Date: 2012-04-08 22:59
New changeset 5493d44b56d8 by Antoine Pitrou in branch '3.2':
Issue #7978: socketserver now restarts the select() call when EINTR is returned.
http://hg.python.org/cpython/rev/5493d44b56d8

New changeset 97a0c6230ece by Antoine Pitrou in branch 'default':
Issue #7978: socketserver now restarts the select() call when EINTR is returned.
http://hg.python.org/cpython/rev/97a0c6230ece
msg157815 - (view) Author: Roundup Robot (python-dev) Date: 2012-04-08 23:12
New changeset d941d1fcc6e6 by Antoine Pitrou in branch '2.7':
Issue #7978: socketserver now restarts the select() call when EINTR is returned.
http://hg.python.org/cpython/rev/d941d1fcc6e6
msg157817 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2012-04-08 23:21
You must recompute the timeout when select() is interrupted: see issue #12338.
msg157818 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-08 23:25
> You must recompute the timeout when select() is interrupted: see issue 
> #12338.

Ah, right. It's a minor issue, though. I would suggest opening a separate issue for it. The patch is now committed to all 3 branches (I'm finally convinced this is a bug worth fixing), so I'm closing this issue.
msg157822 - (view) Author: Jerzy Kozera (Jerzy.Kozera) Date: 2012-04-08 23:38
I forgot to mention my patch is 3.3-only, sorry - it depends on changes from #12555 (http://hg.python.org/cpython/rev/41a1de81ef2b#l18.21 to be precise). To support 3.2 and 2.7:
 (1) select.error must be caught as in the original patch,
 (2) e.args[0] must be used - select.error doesn't have 'errno' attribute.
Should I prepare the patch for 3.2 and 2.7?

Regarding not updating the timeout, it was already mentioned above. Though as an afterthought, it might be worrying that if the process is receiving repeated signals with interval between them less than timeout, we might fall into infinite loop of select() when it should timeout, but that is probably very obscure case.
msg157824 - (view) Author: Roundup Robot (python-dev) Date: 2012-04-08 23:47
New changeset f8e7fcd581ff by Antoine Pitrou in branch '3.2':
Fix the patch for issue #7978: select() raises select.error before 3.3, not OSError.
http://hg.python.org/cpython/rev/f8e7fcd581ff

New changeset 4298d6e79ecb by Antoine Pitrou in branch '2.7':
Fix the patch for issue #7978: select() raises select.error before 3.3, not OSError.
http://hg.python.org/cpython/rev/4298d6e79ecb
msg157825 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-08 23:48
>  To support 3.2 and 2.7:
>  (1) select.error must be caught as in the original patch,
>  (2) e.args[0] must be used - select.error doesn't have 'errno'
> attribute.
> Should I prepare the patch for 3.2 and 2.7?

It shouldn't be necessary, I've just committed fixes.
Thanks for noticing!
History
Date User Action Args
2012-04-08 23:48:11pitrousetmessages: + msg157825
2012-04-08 23:47:13python-devsetmessages: + msg157824
2012-04-08 23:38:42Jerzy.Kozerasetmessages: + msg157822
2012-04-08 23:25:25pitrousettype: enhancement -> behavior
2012-04-08 23:25:19pitrousetstatus: open -> closed

assignee: gregory.p.smith ->
versions: + Python 2.7
messages: + msg157818
type: behavior -> enhancement
resolution: fixed
stage: patch review -> resolved
2012-04-08 23:21:19hayposetmessages: + msg157817
2012-04-08 23:12:46python-devsetmessages: + msg157815
2012-04-08 22:59:33python-devsetnosy: + python-dev
messages: + msg157814
2012-04-07 10:45:24pitrousetstage: patch review
messages: + msg157725
versions: + Python 3.3, - Python 3.1, Python 2.7
2012-04-06 21:42:21Jerzy.Kozerasetfiles: + socketserver_eintr_20120406.diff
nosy: + Jerzy.Kozera
messages: + msg157695

2012-03-14 21:24:38giampaolo.rodolasetnosy: + giampaolo.rodola
2011-06-01 04:06:08nvetoshkinsetmessages: + msg137414
2011-05-31 14:55:56hayposetmessages: + msg137362
2011-05-31 14:55:26hayposetnosy: + haypo
messages: + msg137361
2011-01-03 20:43:15gregory.p.smithsetnosy: gregory.p.smith, spiv, exarkun, pitrou, Christophe Simonis, nvetoshkin, neologix, Yaniv.Aknin, bda, kchen
messages: + msg125234
2010-11-09 20:28:37nvetoshkinsetmessages: + msg120898
2010-10-26 22:21:58kchensetnosy: + kchen
2010-08-04 21:31:32terry.reedysetversions: - Python 2.6, Python 2.5
2010-05-13 00:32:54Yaniv.Akninsetfiles: + socketserver_eintr_py3k_updated.diff

messages: + msg105617
2010-05-13 00:31:54Yaniv.Akninsetfiles: - socketserver_eintr_py3k.diff
2010-05-11 15:42:11Christophe Simonissetnosy: + Christophe Simonis
2010-04-28 20:06:17pitrousetmessages: + msg104459
2010-04-28 19:58:20bdasetfiles: + sockServe.py
nosy: + bda
messages: + msg104455

2010-04-09 09:24:31Yaniv.Akninsetfiles: + socketserver_eintr_py3k.diff

messages: + msg102704
2010-04-09 07:01:06spivsetnosy: + spiv
messages: + msg102690
2010-04-09 05:57:14nvetoshkinsetmessages: + msg102680
2010-04-09 00:44:58pitrousetmessages: - msg102659
2010-04-09 00:44:50pitrousetmessages: + msg102660
2010-04-09 00:44:44pitrousetmessages: + msg102659
2010-04-08 07:19:52neologixsetmessages: + msg102596
2010-04-08 06:41:18nvetoshkinsetmessages: + msg102594
2010-04-08 06:28:54gregory.p.smithsetassignee: gregory.p.smith

messages: + msg102592
nosy: + gregory.p.smith
2010-04-08 06:06:14Yaniv.Akninsetfiles: + eintr_safety.py
nosy: + Yaniv.Aknin
messages: + msg102591

2010-04-05 05:36:59nvetoshkinsetmessages: + msg102363
2010-04-04 12:31:34exarkunsetmessages: + msg102337
2010-04-04 12:27:07neologixsetmessages: + msg102336
2010-04-04 11:47:00pitrousetnosy: + exarkun, pitrou
messages: + msg102333
2010-04-04 11:18:19neologixsetfiles: + SocketServer_eintr.diff

nosy: + neologix
messages: + msg102331

keywords: + patch
2010-03-09 07:35:37nvetoshkinsetmessages: + msg100704
2010-02-21 19:22:43nvetoshkinsettype: crash -> behavior
2010-02-21 19:14:53nvetoshkincreate