classification
Title: selectors: towards uniform EBADF handling
Type: enhancement Stage: needs patch
Components: asyncio Versions: Python 3.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: giampaolo.rodola, glyph, gvanrossum, neologix, pitrou, sbt, vstinner, yselivanov
Priority: normal Keywords:

Created on 2013-09-14 11:01 by neologix, last changed 2015-02-04 16:48 by vstinner.

Messages (19)
msg197701 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-09-14 11:01
Currently, in case of invalid file descriptor (which can happen easily e.g. if the user closes the FD before unregistering it), selector.select() will react differently depending on the backend:

SelectSelector fails with EBADF
PollSelector returns EVENT_READ/EVENT_WRITE for the FD (translated from poll()'s POLLNVAL)
EpollSelector returns nothing (epoll silently removes the FD from the set of monitored FD, there's no EPOLLNVAL)
KqueueSelector behaves like EpollSelector

One possibility would be for PollSelector.select() to raise EBADF in case of POLLNVAL. That would be consistent with select, but not with epoll and kqueue.

Another possibility would be to silently fix those, i.e. in case of EBADF/POLLNVAL in select/poll, remove the offending FD from the list of monitored FDs. That's what libev does (see e.g. http://cvs.schmorp.de/libev/ev_poll.c?revision=1.39&view=markup and http://cvs.schmorp.de/libev/ev.c?revision=1.457&view=markup), and that's also what twisted does, at least for select (see http://twistedmatrix.com/trac/browser/trunk/twisted/internet/selectreactor.py#L77).

One advantage is that once we've detected a bad FD, we remove it so we're sure we won't loop endlessly: e.g. for poll, if the user doesn't try to use the FD no exception will be raised and selector.select() keep reporting the FD ready.

(Note that performing this cleanup is easy for poll since it's flagged with POLLNVAL, but for select this requires looping over the registered FDs, but since that's only called in case of error, it's not performance critical, see Twisted's code)

I personally favor the later solution.

Second point: if we opt for the later solution, we should probably update {EpollSelector,KqueueSelector}.unregister() to silently catch an error if the FD wasn't registered in the underlying epoll/kqueue.
That's what libev does, and amusingly that's what the select module already does:
http://hg.python.org/cpython/file/c9644c1f9c94/Modules/selectmodule.c#l1329

Third point: should the current select behavior (ignoring epoll.unregister() error) be kept?

Thoughts?
msg197702 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-09-14 11:11
> Another possibility would be to silently fix those, i.e. in case of
> EBADF/POLLNVAL in select/poll, remove the offending FD from the list of
> monitored FDs.

Or we could remove the fd *and* still raise an error the first time, or at least a warning. It sounds like a programming error, so I'm not sure it's a good idea to silence it.

Note that Twisted still logs an error message. Also, Tornado lets errors through:
https://github.com/facebook/tornado/blob/master/tornado/ioloop.py#L652
https://github.com/facebook/tornado/blob/master/tornado/platform/select.py#L61
msg197703 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-09-14 11:20
> Or we could remove the fd *and* still raise an error the first time, or at least a warning. It sounds like a programming error, so I'm not sure it's a good idea to silence it.

Yeah, sure. By "silently" I meant "without user intervention", sorry
for the confusion.

Now I'm not sure between raising an error or just logging it.
msg197704 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-09-14 11:21
> Now I'm not sure between raising an error or just logging it.

If "logged", it should be a Warning rather than an actual logging call,
IMO.
msg197718 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-09-14 16:27
Would it be possible to emit an event "invalid fd" with the fd? So the user
can decide what to do on such case: ignore, log, raise an exception, ...
msg197917 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-09-16 16:33
Interesting issue.  ISTM that closing the FD before unregistering it is a programmer's mistake that shouldn't pass silently.  And closing it in a separate thread while the selector is active sounds like an even bigger bug.

Could we report an new event type for this situation?  E.g. EVENT_CLOSED.  The proper response would be to unregister the FD.  (And yes, unregistering the FD when it was previously registered should not be an error, even if it has been closed.)

I'm not sure I care that there will be an infinite loop if the caller doesn't take action -- it's the same for any event, if you don't read from an FD when you get an EVENT_READ event, you'll get an infinite loop too.

Note that this is not exactly the same as the "invalid fd" that Victor proposes -- although they aren't always distinguishable, the logic errors in the app are different: in one case, closing a FD (that was valid and registered) without unregistering, in the other case, registering an FD that isn't valid.  (Ideally the register() call should fail, but I suppose we can't detect that without an extra syscall -- although in some cases there's a syscall in register() anyway, and then we can use that to reject the register() call.
msg197929 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-09-16 18:41
> Interesting issue.  ISTM that closing the FD before unregistering it is a programmer's mistake that shouldn't pass silently.  And closing it in a separate thread while the selector is active sounds like an even bigger bug.

Agreed.

> Could we report an new event type for this situation?  E.g. EVENT_CLOSED.  The proper response would be to unregister the FD.  (And yes, unregistering the FD when it was previously registered should not be an error, even if it has been closed.)

The problem is that for epoll (and kqueue I think) the FD is
automagically removed from the backend, which means that we won't get
any notification for this FD, hence we're unable to report it as
closed.
msg197932 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-09-16 19:04
> The problem is that for epoll (and kqueue I think) the FD is
> automagically removed from the backend, which means that we won't get
> any notification for this FD, hence we're unable to report it as
> closed.

That makes it sound like it will be hard to respond the same way on all platforms. :-(

Even silently removing the registration for select and poll still leaves the behavior different: if you register a bad FD with select or poll, the register() call will accept it but the select() call will silently remove it; however if you register a bad FD with epoll or kqueue, register() will raise EBADF.

I would prefer to see an EBADF exception in all four cases, even if it happens in register() for some platforms and in select() for others.

There is also the issue of trying to unregister (or modify?) an FD that has gone bad after successful registration -- this will also have different behavior on the different platform, right?

Maybe we'll get better design guidance if we look at this from the POV of a server loop that doesn't fully control the code that may be registering and unregistering FDs.  It seems a good idea to make it so that if some bad piece of code tries to register a bad FD or close a FD without unregistering doesn't bring down the entire server, but the bad code always sees an exception.  Maybe we should be silent in select() but cause unregister() to fail?
msg198674 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-09-30 03:48
Charles-Francois, sorry to add you back to the bug, but (a) I thought you had agreed to a compromise patch that restarts signals in most cases but not for select(), poll() etc.; (b) I may have found a flaw in the idea.

The flaw (if it is one) is related to Py_AddPendingCall(). This "schedules" a pending callback, mostly for signals, but doesn't AFAICT interrupt the mainthread in any way. (TBH, I only understand the code for Python 2.7, and in that version I'm sure it doesn't.)

So is this a flaw? I'm nor sure. Can you think about it?
msg202122 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-04 12:54
What is the status of this issue?
msg202139 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-11-04 14:06
> What is the status of this issue?

AFAICT, we haven't reached a consensus yet on the best way to handle EBADF.
msg202146 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-11-04 15:48
> AFAICT, we haven't reached a consensus yet on the best way to handle
EBADF.

By which you mean that you still don't agree with my proposal? Which is to
fix it for most syscalls but not for select(), poll() and similar (anything
that the new selectors module interfaces to).
msg202147 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-11-04 16:16
> By which you mean that you still don't agree with my proposal? Which is to
> fix it for most syscalls but not for select(), poll() and similar (anything
> that the new selectors module interfaces to).

I agree with your proposal, but that's another issue :-) (this thread
is about EBADF in selectors, not EINTR).
msg202150 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-11-04 17:42
Eww, sorry. That's the second time I mistook this thread for the other. I re-read the original description and I now think that we should follow your original advice.

There are two separate cases:

(1) Registering an invalid FD; this succeeds for select/poll, fails for epoll/kqueue.

(2) Registering a valid FD, then closing it, then calling <selector>.select(); this raises for select, returns a special event for poll, and is silently ignored for epoll/kqueue.

I agree on making all selectors do the same thing for (2), and I agree that the best thing is to be silent and unregister the bad FD if we can (IIUC epoll/kqueue don't even tell us this happened?).  We then need to make unregister() be silent when the FD isn't registered.  Maybe make it return True when it actually unregistered something and False when there's nothing?  An alternative would be to put the bad FD into a separate "quarantine" set so that it won't be used by the select() method but will still be recognized by unregister(). (And register() should look there too.)

This still leaves case (1), where the FD is already bad when we register it.  I am actually fine with sometimes raising and sometimes not; I don't want to pay the extra overhead of doing an fstat() or some other syscall just to verify that it is valid.  (Although this would make the argument about wanting to choose a selector class that doesn't make extra syscalls less compelling. :-)  And neither do I want to ignore the error in register() and pretend success.

I guess we should in any case make it consistent so that if you successfully register() an FD you can always unregister() it without raising.

I really wish there was an event to tell us that an FD has become invalid, but if epoll/kqueue really don't support that, i don't think it's worth it to introduce that only for select/poll.  IOW let's do what those other systems you mention do.
msg202161 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-04 20:02
To find an invalid FD when select() fails with EBAD, we can use something like:
http://ufwi.org/projects/nufw/repository/revisions/b4f66edc5d4dc837f75857f8bffe9015454fdebc/entry/src/nuauth/tls_nufw.c#L408
msg205784 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-12-10 10:47
What is the status of this issue?
msg205785 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-12-10 10:48
The issue #19876 has been closed.
msg235388 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-02-04 16:41
> To find an invalid FD when select() fails with EBAD, we can use something like:
> http://ufwi.org/projects/nufw/repository/revisions/b4f66edc5d4dc837f75857f8bffe9015454fdebc/entry/src/nuauth/tls_nufw.c#L408

Oh, the link is dead. Copy/paste of the code:
---
/* errno == EBADF */
int i;
/* A client disconnects between FD_SET and select.
 * Will try to find it */
for (i=0; i<context->mx; ++i){
    struct stat s;
    if (FD_ISSET(i, &context->tls_rx_set)){
        if (fstat(i, &s)<0) {
            log_message(CRITICAL, DEBUG_AREA_USER,
                    "Warning: %d is a bad file descriptor.", i);
            FD_CLR(i, &context->tls_rx_set);
        }
    }
}
continue;   /* retry select */
---

In short: call fstat() if check if the FD is valid or not. O(n) complexity.
msg235389 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-02-04 16:48
Guido wrote:
"This still leaves case (1), where the FD is already bad when we register it.  I am actually fine with sometimes raising and sometimes not; I don't want to pay the extra overhead of doing an fstat() or some other syscall just to verify that it is valid.  (Although this would make the argument about wanting to choose a selector class that doesn't make extra syscalls less compelling. :-)  And neither do I want to ignore the error in register() and pretend success."

The asyncio has a debug mode which can execute expensive checks. These checks would not be acceptable in release mode since they are an impact on performances. In debug mode, asyncio can explicitly call fstat() before calling selector.register() or selector.modify().

This solution may be suggested in the documentation if someone wants a portable behaviour for all selector.
History
Date User Action Args
2015-02-04 16:48:35vstinnersetmessages: + msg235389
2015-02-04 16:41:18vstinnersetmessages: + msg235388
2015-02-03 15:01:42vstinnersetnosy: + yselivanov
components: + asyncio
2013-12-10 10:48:15vstinnersetmessages: + msg205785
2013-12-10 10:47:24vstinnersetmessages: + msg205784
2013-11-04 20:02:44vstinnersetmessages: + msg202161
2013-11-04 17:42:58gvanrossumsetmessages: + msg202150
2013-11-04 16:16:09neologixsetmessages: + msg202147
2013-11-04 15:48:01gvanrossumsetmessages: + msg202146
2013-11-04 14:06:56neologixsetmessages: + msg202139
2013-11-04 12:54:01vstinnersetmessages: + msg202122
2013-09-30 03:48:21gvanrossumsetmessages: + msg198674
2013-09-16 19:04:54gvanrossumsetmessages: + msg197932
2013-09-16 18:41:52neologixsetmessages: + msg197929
2013-09-16 16:33:26gvanrossumsetmessages: + msg197917
versions: + Python 3.4
2013-09-14 16:27:29vstinnersetmessages: + msg197718
2013-09-14 11:21:41pitrousetmessages: + msg197704
2013-09-14 11:20:39neologixsetmessages: + msg197703
2013-09-14 11:11:00pitrousetnosy: + glyph
messages: + msg197702
2013-09-14 11:01:58neologixcreate