Issue 18885: handle EINTR in the stdlib

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/63085

classification

Title:	handle EINTR in the stdlib
Type:	enhancement	Stage:	patch review
Components:		Versions:	Python 3.5

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	arigo, fossilet, giampaolo.rodola, gregory.p.smith, gvanrossum, koobs, larry, martin.panter, neologix, piotr.dobrogost, pitrou, sbt, vstinner
Priority:	normal	Keywords:	needs review, patch

Created on 2013-08-30 15:02 by neologix, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
select_eintr.diff	neologix, 2013-11-30 15:09		review

Messages (28)
msg196555 - (view)	Author: Charles-François Natali (neologix) *	Date: 2013-08-30 15:02
As discussed in http://mail.python.org/pipermail/python-dev/2013-August/128204.html, I think that we shouldn't let EINTR leak to Python code: it should be handled properly by the C code, so that users (and the Python part of the stdlib) don't have to worry about this low-level historical nuisance. For code that doesn't release the GIL, we could simply use this glibc macro: # define TEMP_FAILURE_RETRY(expression) \ (__extension__ \ ({ long int __result; \ do __result = (long int) (expression); \ while (__result == -1L && errno == EINTR); \ __result; })) #endif Now, I'm not sure about how to best handle this for code that releases the GIL. Basically: Py_BEGIN_ALLOW_THREADS pid = waitpid(pid, &status, options); Py_END_ALLOW_THREADS should become begin_handle_eintr: Py_BEGIN_ALLOW_THREADS pid = waitpid(pid, &status, options); Py_END_ALLOW_THREADS if (pid < 0 && errno == EINTR) { if (PyErr_CheckSignals()) return NULL; goto begin_handle_eintr; } Should we do this with a macro? If yes, should it be a new one that should be placed around Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS (like BEGIN_SELECT_LOOP in selectmodule.c) or could we have a single macro that would do both (i.e. release the GIL / reacquire the GIL, and try again in case of EINTR, unless a signal handler raised an exception)? From a cursory look, the main files affected would be: Modules/fcntlmodule.c Modules/ossaudiodev.c Modules/posixmodule.c Modules/selectmodule.c Modules/selectmodule.c Modules/signalmodule.c Modules/socketmodule.c Modules/syslogmodule.c
msg196646 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2013-08-31 16:44
FYI - use the changes made in http://bugs.python.org/issue12268 as a guide for how to deal with EINTR properly at the C level. See the _PyIO_trap_eintr() function for example. See also _eintr_retry_call() in Lib/subprocess.py. FWIW, there are times when we want the interrupted system call to return control to Python rather than retrying the call. If someone is making a Python equivalent of the low level system call such as select() or poll(), the EINTR should be exposed for Python code to handle. Things like time.sleep() are documented as sleeping for less time when a signal has arrived even though an exception may not be raised. People have written code which depends on this behavior so adding an EINTR retry for the remaining sleep time would break some programs. Getting an EINTR errno does not mean you can simply retry the system calls with the exact same arguments. ie: If you did that with the select() call within time.sleep it'd be trivial to make the process sleep forever by sending it signals with a frequency less than the sleep time.
msg196647 - (view)	Author: Charles-François Natali (neologix) *	Date: 2013-08-31 16:56
Gregory, thanks, that's what I was planning to do. But since the recent discussions (mainly on selectors), there are points I obviously don't - and won't - agree with (such as select() returning EINTR or returning early, same for sleep()), I'm not interested in this anymore. Anyone interested can pick this up, though. (BTW, as for applications relying on EINTR being returned, I'm positive way more applications will break because of the recent change making file descriptors close-on-exec by default).
msg196648 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2013-08-31 17:00
> FWIW, there are times when we want the interrupted system call to > return control to Python rather than retrying the call. I'm a bit curious, do you know of any use cases? > If someone is making a Python equivalent of the low level system call > such as select() or poll(), the EINTR should be exposed for Python > code to handle. As mentioned in another issue, you would use a special wakeup fd to wakeup select() or poll() calls. > Getting an EINTR errno does not mean you can simply retry the system > calls with the exact same arguments. ie: If you did that with the > select() call within time.sleep it'd be trivial to make the process > sleep forever by sending it signals with a frequency less than the > sleep time. Indeed. That's already done in e.g. socketmodule.c : take a look at the BEGIN_SELECT_LOOP / END_SELECT_LOOP macros.
msg196653 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2013-08-31 17:19
On Sat, Aug 31, 2013 at 9:56 AM, Charles-François Natali <report@bugs.python.org> wrote: > > Charles-François Natali added the comment: > > Gregory, thanks, that's what I was planning to do. > > But since the recent discussions (mainly on selectors), there are points I obviously don't - and won't - agree with (such as select() returning EINTR or returning early, same for sleep()), I'm not interested in this anymore. Whoa. Maybe you're overreacting a bit? I personally see a big divide here between system calls whose functionality includes sleeping (e.g. sleep(), poll(), select()) and those that just want some I/O to complete (e.g. recv(), send(), read(), write()). The former are almost always used in a context that can handle premature returns just fine, since the return value for a premature return is the same as for hitting the deadline, and the timeout is often used just as a hint anyway. It's the latter category (recv() etc.) where the EINTR return is problematic, and I think for many of those the automatic retry (after the Python-level signal handler has been run and conditional on it not raising an exception) will be a big improvement. > Anyone interested can pick this up, though. > > (BTW, as for applications relying on EINTR being returned, I'm positive way more applications will break because of the recent change making file descriptors close-on-exec by default). Again, I'd make a distinction: I agree for send(), recv() etc., but I don't think there are many buggy uses of select()/poll() timeouts around. (And even if there are, I still think it's better to fix these by correcting the retry logic in the framework or the application, since it may have other considerations.)
msg196661 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2013-08-31 18:09
I wrote too many words. In short: time.sleep()'s behavior should remain as it is today given how it is documented to behave. If you disagree, consider adding an optional interruptable=False parameter so that both behavior options exist. ALL IO calls and wait* should handle EINTR transparently for the user and never expose it to the Python application. select(), poll() and equivalents. If you want to transparently handle EINTR on these, just make sure you deal with the timeouts properly. While I suspect a few people wanted to see the signal interruption on those I agree: very uncommon and undesirable for most. If people need a specific signal interruption they should define a signal handler that raises.
msg198681 - (view)	Author: Charles-François Natali (neologix) *	Date: 2013-09-30 07:11
(replying to Guido's post in another thread) > Charles-Francois, sorry to add you back to the bug, but (a) I thought you had agreed to a compromise patch that restarts signals in most cases but not for select(), poll() etc.; (b) I may have found a flaw in the idea. > The flaw (if it is one) is related to Py_AddPendingCall(). This "schedules" a pending callback, mostly for signals, but doesn't AFAICT interrupt the mainthread in any way. (TBH, I only understand the code for Python 2.7, and in that version I'm sure it doesn't.) > > So is this a flaw? I'm nor sure. Can you think about it? I don't think that's a problem: the way I was planning to tackle signals is to call PyErr_CheckSignals() before retrying upon EINTR: this runs signal handlers, and returns a non 0 value if an exception occured (e.g. KeyboardInterrupt): if that's the case, then we simply break out of the loop, and let the exception bubble up. See e.g. http://hg.python.org/cpython/file/default/Modules/socketmodule.c#l3397
msg204816 - (view)	Author: Charles-François Natali (neologix) *	Date: 2013-11-30 15:09
Alright, here's a first step: select/poll/epoll/etc now return empty lists/tuples upon EINTR. This comes with tests (note that all those tests could probably be factored, but that's another story).
msg204855 - (view)	Author: Armin Rigo (arigo) *	Date: 2013-11-30 22:58
Am I correct in thinking that you're simply replacing the OSError(EINTR) with returning empty lists? This is bound to subtly break code, e.g. the code that expects reasonably that a return value of three empty lists means the timeout really ran out (i.e. the version of the code that is already the most careful). Shouldn't you restart the poll with the remaining time until timeout?
msg204858 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2013-11-30 23:20
I wouldn't call that "being the most careful". I've always had an implicit understanding that calls with timeouts may, for whatever reason, return sooner than requested (or later!), and the most careful approach is to re-check the clock again.
msg204863 - (view)	Author: Richard Oudkerk (sbt) *	Date: 2013-12-01 00:21
> I've always had an implicit understanding that calls with timeouts may, > for whatever reason, return sooner than requested (or later!), and the > most careful approach is to re-check the clock again. I've always had the implicit understanding that if I use an infinite timeout then the call will not timeout.
msg204865 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2013-12-01 00:31
> > I've always had an implicit understanding that calls with timeouts may, > > for whatever reason, return sooner than requested (or later!), and the > > most careful approach is to re-check the clock again. > > I've always had the implicit understanding that if I use an infinite > timeout then the call will not timeout. Wow, that's a good point. select() and friends are not documented to exhibit successful spurious wakeups. It would be a pretty strong compatibility breach if they started doing so. If we don't want select() to silently retry on EINTR, then I think we should leave it alone. Speaking of which, I see that SelectSelector.select() returns an empty list when interrupted, but this is nowhere documented.
msg204868 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2013-12-01 01:10
> I've always had an implicit understanding that calls with timeouts may, for whatever reason, return sooner than requested (or later!), and the most careful approach is to re-check the clock again. exactly. at the system call level you can be interrupted. re-checking the clock is the right thing to do if the elapsed time actually matters. > If we don't want select() to silently retry on EINTR, then I think we should leave it alone. We should go ahead and retry for the user for select/poll/epoll/kqueue. If they care about being able to break out of that low level call due to a signal, they should set a signal handler which raises an exception. I have never seen code intentionally get an EINTR exception from a select or poll call and have often seen code tripped up because it or a library it was using forgot to handle it. We're a high level language: Lets be sane by default and do the most desirable thing for the user. Retry the call internally with a safely adjusted timeout: new_timeout = min(original_timeout, time_now-start_time) if new_timeout <= 0: return an empty list # ie: the system clock changed retry the call with new_timeout
msg204872 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2013-12-01 01:48
We went through this whole discussion before. Returning immediately with three empty lists is better than raising InterruptedError. Retrying is not always better.
msg204875 - (view)	Author: Armin Rigo (arigo) *	Date: 2013-12-01 02:15
Modules/socketmodule.c is using a simple style to implement socket timeouts using select(). If I were to naively copy this style over to pure Python, it would work in current Pythons; I'd get occasionally an OSError(EINTR), which I would have presumably been annoyed with and am now catching properly. Now if my working code was made to run with a select() modified as proposed, an EINTR would instead cause the program to fail more obscurely: its sockets occasionally -- and apparently without reason -- time out much earlier. In that situation I would have a hard time finding the reason, particularly if running on an OS where the system select() doesn't spuriously return early with a timeout ("man select" on Linux guarantees this, for example). Similarly, an existing program might rely on select() with an infinite timeout to only return when one of the descriptors is ready, particularly if called with only one or two descriptors. Overall, I would far prefer the status quo over a change in the logic from one slightly-subtle situation to another differently slightly-subtle one. I believe this would end up with programs that need to take special care about both kinds of subtlenesses just to run on two versions of Python. I may be wrong, in this case sorry to take your time. :-)
msg204878 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2013-12-01 02:47
Guido's point was that it is already a bug in code to not check the elapsed time after a select call returns rather than assuming the full timeout time has elapsed. Correct code today already needs to deal with both situations (OSError(EINTR) and select returning an empty set before the desired time has elapsed) because both can happen on existing systems today. So correct code in the future wishing to be compatible with older Pythons will need to continue to do so. As for "presumably have been annoyed by the occasional OSError(EINTR) and fix that bug" that isn't always true. EINTRs are not guaranteed to happen and are likely to crop up on different systems (production systems) long after you've deployed and successfully run your code as they are something that happens due to things _outside_ of the control of your deployed program: signals. That's what has gotten me on a kick to hide EINTR from python developers when at all possible. For the record: I am perfectly fine with select and friends returning an empty set early on EINTR (as Guido seems to prefer). If this worries some people lets at least highlight this in the documentation as part of this change. What I don't want is to ever see OSError(EINTR) in the future.
msg204890 - (view)	Author: Charles-François Natali (neologix) *	Date: 2013-12-01 08:14
Just for the record, I was initially in favor of recomputing the timeout and retrying upon EINTR, but Guido prefers to return empty lists, and since that's a better compromise than the current situation (I've seen many people complaining on EINTR popping-up at random points in the code, including myself), I went ahead and implemented it. AFAICT, an early return for calls such as poll()/epoll() etc is something which is definitely acceptable: if you have a look at e.g. Tornado, Twisted & Co, they all return empty lists on EINTR. > I've always had the implicit understanding that if I use an infinite timeout then > the call will not timeout. Well, I've always assumed that time.sleep(n) would sleep n seconds, but: """ static int floatsleep(double secs) [...] Py_BEGIN_ALLOW_THREADS err = select(0, (fd_set )0, (fd_set )0, (fd_set *)0, &t); Py_END_ALLOW_THREADS if (err != 0) { #ifdef EINTR if (errno == EINTR) { if (PyErr_CheckSignals()) return -1; } else #endif { PyErr_SetFromErrno(PyExc_IOError); return -1; } } [...] """ So really, I'm like Gregory: I don't care which solution we chose, but I just don't want to have to let the user handle EINTR.
msg204906 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2013-12-01 10:44
> Guido's point was that it is already a bug in code to not check the elapsed > time after a select call returns rather than assuming the full timeout time > has elapsed. I don't understand how it's a bug. You're assuming select() has unreliable timing, but it doesn't (if you are using the same clock).
msg204907 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2013-12-01 10:46
On dim., 2013-12-01 at 08:14 +0000, Charles-François Natali wrote: > So really, I'm like Gregory: I don't care which solution we chose, but > I just don't want to have to let the user handle EINTR. Well this is wishing thinking, since by returning an empty list you force the user to handle EINTR - just in a different way.
msg204912 - (view)	Author: Charles-François Natali (neologix) *	Date: 2013-12-01 11:33
> Well this is wishing thinking, since by returning an empty list you > force the user to handle EINTR - just in a different way. I know that returning an empty list changes the semantics: I just think that's better - or not as bad - than the current possibility of having any single piece of code possibly die upon EINTR. If you want to implement retry with timeout re-computation, I'm not the one to who must be convinced :-) (BTW, if we go this way, then time.sleep() should probably also be fixed to retry upon EINTR).
msg204913 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2013-12-01 11:35
> I know that returning an empty list changes the semantics: I just > think that's better - or not as bad - than the current possibility of > having any single piece of code possibly die upon EINTR. > > If you want to implement retry with timeout re-computation, I'm not > the one to who must be convinced :-) Or, since we now have the selectors module, we could let select() live with the current semantics. By the way, it's already too late for 3.4, which is in feature freeze.
msg204949 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2013-12-01 19:14
I do not consider this a feature; that EINTR is exposed as an exception from the API is a bug. But Larry is the only one who can actually make that decision as the 3.4 release manager (+nosy'd). > by returning an empty list you force the user to handle EINTR - > just in a different way. The user now only has one thing to deal with instead of two: an empty list being returned; something they should already have been dealing with. Gone will be the OSError(EINTR) exception as a rare, often never tested for, alternate form of the same retry needed indication. I never see code intentionally wanting to receive and handle an OSError(EINTR) exception but I constantly run into code that is buggy due to some library it is using not getting this right... Where it isn't up to the code exhibiting the problem because the only place to fix it is within the library they use that is outside of that code's control. We've got the opportunity to fix this nit once and for all here, lets do it.
msg204953 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2013-12-01 20:03
> I do not consider this a feature; that EINTR is exposed as an > exception from the API is a bug. select() currently works as specified; you are proposing a compatibility-breaking change to the API, not a bugfix. We're left with the fact that the API is inconvenient: but we now have the selectors module and can advocate that instead of breaking existing code during a feature freeze period. (or we can retry on EINTR, which has the benefit of not creating new situations to deal with in existing code) > The user now only has one thing to deal with instead of two: an empty > list being returned; something they should already have been dealing > with. Returning an empty list when no timeout has been passed has never been a feature of select(), which is why users are not expected to be dealing with it.
msg204962 - (view)	Author: Larry Hastings (larry) *	Date: 2013-12-01 21:13
I don't want this checked in to 3.4. (Congratulations, this is my first "no" as a release manager!)
msg224339 - (view)	Author: STINNER Victor (vstinner) *	Date: 2014-07-30 18:49
FYI Charles-François and me are working on a PEP to address this issue: the PEP 475. The PEP is not ready yet for a review.
msg235543 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-02-08 03:36
See also Issue 23285 for the PEP
msg244892 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-06-06 05:05
With PEP 475 now implemented (see Issue 23648), perhaps this could be closed? Or is there something else to be done?
msg245055 - (view)	Author: STINNER Victor (vstinner) *	Date: 2015-06-09 10:04
> With PEP 475 now implemented (see Issue 23648), perhaps this could be closed? Or is there something else to be done? Yes, this issue was fully fixed by the implementation of the PEP 475 in Python 3.5.

History
Date	User	Action	Args
2022-04-11 14:57:50	admin	set	github: 63085
2015-06-09 10:04:36	vstinner	set	status: open -> closed resolution: fixed messages: + msg245055
2015-06-06 05:05:08	martin.panter	set	messages: + msg244892
2015-02-08 03:36:56	martin.panter	set	messages: + msg235543
2015-02-08 03:05:30	martin.panter	set	nosy: + martin.panter
2014-07-30 18:49:34	vstinner	set	messages: + msg224339
2014-07-30 16:51:34	piotr.dobrogost	set	nosy: + piotr.dobrogost
2014-07-24 16:36:25	vstinner	link	issue11266 superseder
2014-07-22 20:16:28	neologix	link	issue21772 superseder
2014-07-22 20:16:05	neologix	link	issue22007 superseder
2013-12-10 13:48:02	fossilet	set	nosy: + fossilet
2013-12-01 21:28:22	vstinner	set	versions: + Python 3.5, - Python 3.4
2013-12-01 21:13:02	larry	set	messages: + msg204962
2013-12-01 20:03:01	pitrou	set	messages: + msg204953
2013-12-01 19:14:01	gregory.p.smith	set	nosy: + larry messages: + msg204949
2013-12-01 11:35:41	pitrou	set	messages: + msg204913
2013-12-01 11:33:45	neologix	set	messages: + msg204912
2013-12-01 10:46:59	pitrou	set	messages: + msg204907
2013-12-01 10:44:59	pitrou	set	messages: + msg204906
2013-12-01 08:14:58	neologix	set	messages: + msg204890
2013-12-01 02:47:08	gregory.p.smith	set	messages: + msg204878
2013-12-01 02:15:02	arigo	set	messages: + msg204875
2013-12-01 01:48:36	gvanrossum	set	messages: + msg204872
2013-12-01 01:10:14	gregory.p.smith	set	messages: + msg204868
2013-12-01 00:31:51	pitrou	set	messages: + msg204865
2013-12-01 00:21:46	sbt	set	messages: + msg204863
2013-11-30 23:20:49	gvanrossum	set	messages: + msg204858
2013-11-30 22:58:33	arigo	set	messages: + msg204855
2013-11-30 16:31:34	koobs	set	nosy: + koobs
2013-11-30 15:09:57	neologix	set	keywords: + patch, needs review files: + select_eintr.diff messages: + msg204816 stage: needs patch -> patch review
2013-09-30 07:11:02	neologix	set	nosy: + neologix messages: + msg198681
2013-09-01 12:56:26	arigo	set	nosy: + arigo
2013-08-31 18:47:37	giampaolo.rodola	set	nosy: + giampaolo.rodola
2013-08-31 18:09:31	gregory.p.smith	set	messages: + msg196661
2013-08-31 17:19:22	gvanrossum	set	messages: + msg196653
2013-08-31 17:00:14	pitrou	set	messages: + msg196648
2013-08-31 16:57:57	neologix	set	nosy: - neologix
2013-08-31 16:56:29	neologix	set	nosy: gvanrossum, gregory.p.smith, pitrou, vstinner, neologix, sbt messages: + msg196647
2013-08-31 16:48:30	gvanrossum	set	nosy: + gvanrossum
2013-08-31 16:44:13	gregory.p.smith	set	nosy: + gregory.p.smith messages: + msg196646
2013-08-30 15:02:58	neologix	set	nosy: + pitrou, vstinner, sbt
2013-08-30 15:02:35	neologix	create