This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: add a Selector to the select module
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: christian.heimes, felipecruz, giampaolo.rodola, gvanrossum, jcea, meador.inge, neologix, pitrou, python-dev, rosslagerwall, sbt, vstinner
Priority: normal Keywords: needs review, patch

Created on 2013-01-03 17:38 by neologix, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
selector_multiprocessing.diff neologix, 2013-01-03 17:38 review
selector_telnetlib.diff neologix, 2013-01-03 17:38 review
selector_bench.py neologix, 2013-01-04 13:18
tulip-selectors-1.diff gvanrossum, 2013-01-07 02:46 Tulip patches
selector-8.diff neologix, 2013-01-07 22:12 review
tulip-selectors-2.diff neologix, 2013-01-07 22:12
tulip-selectors-3.diff gvanrossum, 2013-01-08 22:22 Fixed tulip patch
tulip-selectors-3.diff neologix, 2013-01-08 22:31
return_key_fix.diff gvanrossum, 2013-01-08 22:47
selector-9.diff neologix, 2013-08-02 15:16 review
selector-11.diff neologix, 2013-08-02 17:59 review
tulip_selector_stdlib.diff neologix, 2013-08-02 17:59
selector-12.diff neologix, 2013-08-02 22:47 review
selectors-14.diff neologix, 2013-08-28 18:33 review
selectors-15.diff neologix, 2013-08-31 17:06 review
Repositories containing patches
https://tulip.googlecode.com/hg/
Messages (106)
msg178970 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-01-03 17:38
Recently, the multiprocessing and telnetlib modules have been patched to use poll() instead of select() when available (mainly for the FD_SETSIZE limitation):
http://bugs.python.org/issue10527
http://bugs.python.org/issue14635

This leads to code duplication (just have a look at the commits to convince yourself), and also it highlights the fact that the select module is too low-level: right now, there's no easy way to do portable and efficient I/O multiplexing. What's more, /dev/poll and epoll() support have been added recently, which can make one puzzled by the myriad of available options.

Therefore, to reduce code duplication, and make writing portable and efficient code easier, I suggest the addition of a new select.Selector() class, which offers a - hopefully - simple and consistent API.
It has the following advantages:
- portable
- consistent (select() accepts and returns file objects, whereas pol() only returns FD, the timeouts don't have the same units - seconds/milliseconds, etc).

You'll find a tentative implementation attached, with tests (but no docs).

I'm also attaching two patches, one for multiprocessing.connection and the other for telnetlib, to show the before/after.

I'd like to have your feedback, especially on those points:
- right now, Selector.select() returns a set of (fd, event mask). What do you think, would a bare list be OK?
- the current version doesn't retry on EINTR, and I'd like to catch it by default: what do you think
- since I use an abstract base class, I don't know how to have docstrings show up in the help
msg178973 - (view) Author: Felipe Cruz (felipecruz) * Date: 2013-01-03 18:44
I think you have a point. Did you know about the tulip project?

http://code.google.com/p/tulip/source/browse/tulip/unix_events.py#76

It has a PollsterBase class and a SelectPollster(PollsterBase) so the idea is to have a Poller(and you call poll()) but select can be used underneath. Since tulip will be merged to stdlib, maybe you can work on tulip itself.

Current tulip Pollers don't return (fd, event) but I have a fork that does for Poll, EPoll, KQueue and WindowsPollster: https://bitbucket.org/felipecruz/tulip/

The selector class doesn't have this change (return fd, mask tuple) right now.
msg178974 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-01-03 19:18
tulip is much bigger than simply a select / poll wrapper. It would probably make more sense for tulip to reuse the abstraction which is proposed here.
msg178979 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-01-03 19:53
> I think you have a point. Did you know about the tulip project?
>
> http://code.google.com/p/tulip/source/browse/tulip/unix_events.py#76
>
> It has a PollsterBase class and a SelectPollster(PollsterBase) so the idea is to have a Poller(and you call poll()) but select can be used underneath. Since tulip will be merged to stdlib, maybe you can work on tulip itself.

I've had a look, and indeed it has its own Pollster wrapper.
Since it can be useful outside of Tulip, it would probably make sense
to make it part of the expose API.

> Current tulip Pollers don't return (fd, event) but I have a fork that does for Poll, EPoll, KQueue and WindowsPollster

Indeed, I think it would be useful.

I'm not sure what's best, accept events masks or have distinct methods
to register read and write events.
msg178980 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-01-03 19:57
Oh, I noticed that EpollPollster never closes the underlying epoll FD:
that's why Pollster objects should have a close() method (and probably
support a context manager).
msg178981 - (view) Author: Felipe Cruz (felipecruz) * Date: 2013-01-03 19:57
Hi Antonie,

What you said also makes sense to me.

There is one problem(?) that _map_events() is called for every event(for Poll and EPoll) and this can make things slower (I didn't tested it). Also, does it needs to be thread-safe?
msg178992 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-01-03 22:26
Here's a new version closer to Tulip's one.
msg179001 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-01-04 01:37
I think this is a great idea.  But let's wait until Tulip is a bit further along; the design of its Pollster might still change as PEP 3156 gets more review and feedback.  According to PEP 429 the first beta of 3.4 won't go out until November this year.
msg179018 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-01-04 13:18
Here's a new version adressing Guido's comments (except for kqueue, for which I'll add support later when I can test it).

I'm also attaching a benchmark to compare the implementations: as noted by Guido, the complexity of select/poll/epoll are the theoretical ones: in real life, the syscall's cost is probably dwarfed by objects creation, etc.

Here's a run, for two ready FDs among 1018:
"""
$ ./python ~/selector_bench.py -r 2 -m 1018 -t socket 
Trying with 2 ready FDs out of 1018, type socket
<class 'select.EpollSelector'>
0.056010190999586484
<class 'select.PollSelector'>
0.2639519829990604
<class 'select.SelectSelector'>
1.1859817369986558
"""

So this can be interesting when a large number of FDs are monitored.

For sake of cmpleteness, for a sparse allocation (i.e. just a couple FDS allocated near FD_SETSIZE), there's not much gain:
"""
$ ./python ~/selector_bench.py -r 2 -m 1018 -t socket  -s
Trying with 2 FDs starting at 1018, type socket
<class 'select.EpollSelector'>
0.06651040699944133
<class 'select.PollSelector'>
0.06033727799876942
<class 'select.SelectSelector'>
0.0948788189998595
"""

Two points I'm not sure about:
- should EINTR be handled (i.e. retry, with an updated timeout). I'm tempted to say yes, because EINTR is just a pain the user should never be exposed with.
- what should be done with POLLNVAL and POLLERR? Raise an exception (that's what Java does, but since you can get quite quite easily it would be a pain to use), return a generic SELECT_ERR event? FWIW, twisted returns POLLERR|POLLNVAL as a CONNECTION_LOST event.
msg179042 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-01-04 16:48
Would it make sense to explore this in the Tulip project first?  It could be a new module, tulip/selector.py.  (Heck, I'll even give you commit privileges in Tulip.)

Also, I've heard (but don't know from personal experience) that Jython supports select but it takes only lists of socket objects, not file descriptors (which don't exist in Java).

Finally, what about Windows?
msg179046 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-01-04 17:06
I've posted a review on Rietveld. Not sure the notification e-mail was sent since I got a weird response from the server.

> should EINTR be handled (i.e. retry, with an updated timeout). I'm tempted to say
> yes, because EINTR is just a pain the user should never be exposed with.

You could add an optional argument to the select() method?

> what should be done with POLLNVAL and POLLERR? Raise an exception (that's what Java 
> does, but since you can get quite quite easily it would be a pain to use), return a 
> generic SELECT_ERR event?

I would say return SELECT_ERR.

> So this can be interesting when a large number of FDs are monitored.

The difference is impressive indeed.
msg179066 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2013-01-04 18:44
Is this currently being written from scratch? It shouldn't be IMO.
Instead I recommend using the existent Tornado or pyftpdlib IO loop as an example:

https://github.com/facebook/tornado/tree/master/tornado/platform
https://github.com/facebook/tornado/blob/master/tornado/ioloop.py#L544

http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/ioloop.py?spec=svn1144&r=1135#266

...and modify the central polling method so that it returns a list of (fd, events) instead of calling handler methods.
Everything (except Windows support) is already in place and they have been tested extensively.

Exploring this into Tulip would be important IMO, as it would help figuring out whether the API is good enough to support what this component aims to achieve: being a generic poller upon which to build async IO loops.
msg179070 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-01-04 19:11
Giampaolo, there are many details in the Tornado and pyftpdlib I/O loops that may or may not be intentional.  I wrote the Tulip pollster from scratch but am looking to merge in relevant knowledge and features from Tornado, Twisted and pyftpdlib.  AFAIK the code here is mostly adapted from Tulip.  Please suggest specific features from Tornado or pyftpdlib that are currently missing.  However I don't think we should just copy over the existing 3rd party code (if only because we're trying to write pure Python 3.3 code here, without historical baggage).
msg179074 - (view) Author: Ross Lagerwall (rosslagerwall) (Python committer) Date: 2013-01-04 19:53
Interesting benchmark. There is no gain for epoll with a large number of
ready fds (as expected) but at least it is no worse than poll. Poll offers
a large improvement on select, in this case.

$ ./python selector_bench.py -r 2 -m 1000 -t pipe
Trying with 2 ready FDs out of 1000, type pipe
<class 'select.EpollSelector'>
0.004317363000154728
<class 'select.PollSelector'>
0.08897221900042496
<class 'select.SelectSelector'>
0.1558117479999055

$ ./python selector_bench.py -r 100 -m 1000 -t pipe
Trying with 100 ready FDs out of 1000, type pipe
<class 'select.EpollSelector'>
0.11067074099992169
<class 'select.PollSelector'>
0.18777027299984184
<class 'select.SelectSelector'>
0.30157805000089866

$ ./python selector_bench.py -r 1000 -m 1000 -t pipe
Trying with 1000 ready FDs out of 1000, type pipe
<class 'select.EpollSelector'>
1.08963231099915
<class 'select.PollSelector'>
1.111169247998987
<class 'select.SelectSelector'>
8.283167362000313
msg179075 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-01-04 19:57
Indeed, tornado and pyftpdlib implementations are much more complex,
have somewhat different APIs (i.e. only support FDs), have special
cases for older python versions, lack some newer features (e.g.
context manager).
The current code is much close to Tulip's.

> Exploring this into Tulip would be important IMO, as it would help figuring out whether the API is good enough to support what this component aims to achieve: being a generic poller upon which to build async IO loops.

That's indeed a good idea, there's no need to rush this in.
msg179078 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-01-04 20:34
Ross, the select() result for a large number of ready FDs was completely skewed because of a bug spotted by Antoine (complexity was much worse than it ought to be).
Here are the results with the fix:
"""
$ ./python ~/selector_bench.py -r 10 -m 1000 -t socket
Trying with 10 ready FDs out of 1000, type socket
<class 'select.EpollSelector'>
0.05156186099338811
<class 'select.PollSelector'>
0.23772043800272513
<class 'select.SelectSelector'>
0.5181516080047004
$ ./python ~/selector_bench.py -r 100 -m 1000 -t socket
Trying with 100 ready FDs out of 1000, type socket
<class 'select.EpollSelector'>
0.47576940699946135
<class 'select.PollSelector'>
0.6458770600002026
<class 'select.SelectSelector'>
0.828609222000523
$ ./python ~/selector_bench.py -r 1000 -m 1000 -t socket
Trying with 1000 ready FDs out of 1000, type socket
<class 'select.EpollSelector'>
4.970445963997918
<class 'select.PollSelector'>
5.7709292660001665
<class 'select.SelectSelector'>
4.030775418999838
"""

With a large number of FDs, many of which are ready, select can be faster.
Here's the output of "strace -c -e select,poll,epoll_wait":
"""

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 83.86    0.000421           0      1024           epoll_wait
 16.14    0.000081           0      1024           poll
  0.00    0.000000           0      1024           select
------ ----------- ----------- --------- --------- ----------------
100.00    0.000502                  3072           total
"""

The systime is greater in epoll, but the systime is negligible, as show by 'time':
"""

real    0m14.856s
user    0m14.289s
sys     0m0.340s
"""

The time is really spent in the interpreter.
I'll dig some more (even though I doubt having 1000/1000 ready FDs is a common use case).

Note that it doesn't change anything when a small number of FDs are ready:
"""

$ ./python ~/selector_bench.py -r 10 -m 1000 -t socket
Trying with 10 ready FDs out of 1000, type socket
<class 'select.EpollSelector'>
0.05238822099636309
<class 'select.PollSelector'>
0.25595822899776977
<class 'select.SelectSelector'>
0.5156362060006359
"""
msg179082 - (view) Author: Ross Lagerwall (rosslagerwall) (Python committer) Date: 2013-01-04 21:35
> Ross, the select() result for a large number of ready FDs was
> completely skewed because of a bug spotted by Antoine (complexity
> was much worse than it ought to be).

Ah, that makes a little more sense now.
msg179120 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-01-05 12:40
Here's a new version, with the following changes:
- there's no SELECT_ERR anymore: error conditions (POLLNVAL|POLLERR and also POLLHUP) are reported as SELECT_IN|SELECT_OUT, depending on the input event mask: I don't think that a separate error flag is needed, because the error will be reported when the FD will be read/written (and in the context where the error can be handled appropriately, which wouldn't be the case if an exception was thrown by the Selector). That's what Java and libevent do, for example. libevent reports IN|OUT inconditionally in case of POLLER|POLLHUP, but I think that the output events should be a subset ot input events.
- the exceptions set isn't passed to select() anymore: exception events (like OOB) are already reported in the read or write sets (checked with libevent and in the Linux kernel source)
- most importantly, there was something missing/annoying in the API: now, it only accepts only file descriptors, and also an optional opaque object: the object - called "attached data" - can be used for example to pass a callback (that can be invoked by a higher-level reactor), or for example context-specific data: it could be a kind of a thread state to make it easy to do a context-switch, or simply a high-level object like a file, a socket or a telnet instance. This is actually what Java does, but they take it further into encapsulating the FD, interested events, attached data and ready events into a SelectionKey, which is passed and returned to/from the Selector: http://docs.oracle.com/javase/7/docs/api/java/nio/channels/SelectionKey.html
- I've added some tests
msg179122 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2013-01-05 14:16
@Guido: agreed.

> there's no SELECT_ERR anymore [...] the error will be reported when the FD will be read/written

I don't think this is a good idea, particularly during this early stage.
This assumption might be valid for select() but not for poll() and epoll() where (POLLERR | POLLHUP | POLLNVAL) is an alias for "connection lost", and that's an event which both Twisted and Tornado take into account and treat differently than a pure "read" event.
The same assumption should also apply for kqueue().

> the exceptions set isn't passed to select() anymore: exception events (like OOB) are already reported in the read or write sets

That's true only if SO_OOBINLINE has been pre-emptively set against the socket.
It might be ok to ignore select()'s exceptional events anyway though.
This is probably another thing you might want to decide later.

> there was something missing/annoying in the API [...]  now, it only accepts only file descriptors, and also an optional opaque 'data' object

I like this change as it avoids maintaining a separate fd->handler map.

Will add line-by-line comments by using the review tool.
msg179126 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2013-01-05 15:22
Inline comments:
http://bugs.python.org/review/16853/
msg179127 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-01-05 15:30
>> there's no SELECT_ERR anymore [...] the error will be reported when the FD will be read/written
>
> I don't think this is a good idea, particularly during this early stage.
> This assumption might be valid for select() but not for poll() and epoll() where (POLLERR | POLLHUP | POLLNVAL) is an alias for "connection lost", and that's an event which both Twisted and Tornado take into account and treat differently than a pure "read" event.

POLLERR and POLLNVAL semantics can be really different from
"connection lost", so this mapping sounds wrong. The actual error
(which may be ECONNRESET, but also EBADF, ENOBUFS...) will be raised
upon the next call to read()/write(), or can be retrieved through
getsockopt(SO_ERROR).

Also, that's exactly what the Linux kernel does for select():
http://lxr.free-electrons.com/source/fs/select.c#L381
POLLHUP is mapped to the read-set, POLLERR both to the read and write set.

This means that "connection lost" could never be returned on platforms
where the select() syscall is used, which could lead to subtle
portability problems.

But I'm definitely open on this (I've based the behavior on libevent
and Java, FWIW).

>> the exceptions set isn't passed to select() anymore: exception events (like OOB) are already reported in the read or write sets
>
> That's true only if SO_OOBINLINE has been pre-emptively set against the socket.
> It might be ok to ignore select()'s exceptional events anyway though.

Thanks, I didn't know: since the POSIX man pages was rather evasive on
this, I checked the Linux kernel source code, but apparently it's
non-standard. If we do ignore OOB in select(), then we probably also
want to suppress POLLPRI|EPOLLPRI, no?
msg179129 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2013-01-05 16:28
> The actual error (which may be ECONNRESET, but also EBADF, ENOBUFS...) will be raised
> upon the next call to read()/write(), or can be retrieved through
getsockopt(SO_ERROR).

Mmmm... I would be tempted to think the same but the fact that both Tornado and Twisted distinguish between READ and ERROR events is suspicious.

> then we probably also want to suppress POLLPRI|EPOLLPRI, no?

Yes.
msg179135 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-01-05 19:20
I think that this needs extensive tests that verify the behavior of many end cases, including under duress (e.g. when there are too many connections for the kernel to handle).  That would seem the only way to make sure that the code is reliable across platforms.  It is likely that you could borrow some ideas for test scenarios from Twisted.
msg179142 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-01-05 20:23
> I think that this needs extensive tests that verify the behavior of many end cases, including under duress (e.g. when there are too many connections for the kernel to handle).  That would seem the only way to make sure that the code is reliable across platforms.  It is likely that you could borrow some ideas for test scenarios from Twisted.

Will do.

I'm adding a new version taking into account some of Giampaolo's remarks.

Also, the API now also allows passing a file descriptor or any object
with a `fileno()` method, since it will likely be useful.

To sum up, the API is:

    def register(self, fileobj, events, data=None):
        """Register a file object.

        Parameters:
        fileobj -- file object
        events  -- events to monitor (bitwise mask of SELECT_IN|SELECT_OUT)
        data    -- attached data
        """

    def unregister(self, fileobj):
        """Unregister a file object.

        Parameters:
        fileobj -- file object
        """

    def modify(self, fileobj, events, data=None):
        """Change a registered file object monitored events or attached data.

        Parameters:
        fileobj -- file object
        events  -- events to monitor (bitwise mask of SELECT_IN|SELECT_OUT)
        data    -- attached data
        """

    def select(self, timeout=None):
        """Perform the actual selection, until some monitored file objects are
        ready or a timeout expires.

        Parameters:
        timeout -- if timeout > 0, this specifies the maximum wait time, in
                   seconds
                   if timeout == 0, the select() call won't block, and will
                   report the currently ready file objects
                   if timeout is None, select() will block until a monitored
                   file object becomes ready

        Returns:
        list of (fileobj, events, attached data) for ready file objects
        `events` is a bitwise mask of SELECT_IN|SELECT_OUT

Selector.select() output looks a lot like poll()/epoll() except for
two details: the output is the file object, and not the file
descriptor (poll()/epoll() are unfortunately inconsistent in this
regard), and there's a third field, the attached data (will be None if
not provided in register()/modify()). I think that this optional field
is really useful to pass e.g. a callback or some context information.
msg179144 - (view) Author: Meador Inge (meador.inge) * (Python committer) Date: 2013-01-05 21:05
The EpollSelector and PollSelector implementations are basically identical.  You might want to refactor these to share more code.
msg179156 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-01-06 00:28
I've noticed a bug present in Tulip, pyftpdlib and tornado (and this
implementation, too): none of them passes the maxevents argument to
epoll.poll(), and by default the poll() wrapper in the socket module
uses FD_SETSIZE-1, so you'll never get more than FD_SETSIZE-1 events.

Note that the default value used by epoll() is probably too low (we
could maybe use an heuristic to use RLIMIT_NOFILE hard limit, with
capping because it can be quite large on FreeBSD or the Hurd ;-), or
at least the documentation doesn't warn users enough about this. The
signature reads:
"""
.. method:: epoll.poll(timeout=-1, maxevents=-1)

   Wait for events. timeout in seconds (float)
"""

Which confuses users into thinking that the number of returned events
is unlimited by default.

> The EpollSelector and PollSelector implementations are basically identical.  You might want to refactor these to share more code.

Yes, but I find the current code much easier to read: epoll() and
poll() have different constants, different units for timeout, epoll()
must accept a maxevents argument, have a close() method, etc.
msg179161 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-01-06 00:59
I am trying to use this module in Tulip instead of its pollster implementation, and am running into a problem.  Tulip defines separate add_reader()/add_writer() methods, which call to the pollster's register_reader()/register_writer(), respectively.  The translation of this to the selector's API is somewhat complex; I would have to keep track of whether I already have a reader or writer registered, and then decide whether to call register() or modify().  If you don't want to change the API back to separate register_*() methods for readers and writers, perhaps you can add a method that tells me, for a given fileobj, whether it is registered, and with which poll flags (SELECT_IN/OUT/both) and the user data?

Also, I need a method that gives me the number of registered FDs.
msg179231 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-01-06 21:36
> If you don't want to change the API back to separate register_*() methods for readers and writers, perhaps you can add a method that tells me, for a given fileobj, whether it is registered, and with which poll flags (SELECT_IN/OUT/both) and the user data.
>
> Also, I need a method that gives me the number of registered FDs.

Does that look OK?

    def get_info(self, fileobj):
        """Return information about a registered file object.

        Returns:
        (events, data) associated to this file object

        Raises KeyError if the file object is not registered.
        """

    def registered_count(self):
        """Return the number of registered file objects.

        Returns:
        number of currently registered file objects
        """
msg179240 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-01-07 02:46
This seems okay.  I am attaching the changes I had to make to Tulip to support this.  However, two Tulip tests are now failing:

- tulip.events_test.PollEventLoopTests.testCreateSslTransport fails with spurious file descriptors returned by poll() that aren't in the _fd_to_key dict (but the corresponding test with Select passes)

- test_sock_client_fail() hangs completely.

Can you see why?

The first failure has this traceback:

Traceback (most recent call last):
  File "/Users/guido/tulip/tulip/selectors.py", line 178, in _key_from_fd
    return self._fd_to_key[fd]
KeyError: 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/guido/tulip/tulip/events_test.py", line 216, in testCreateSslTra\
nsport
    el.run()
  File "/Users/guido/tulip/tulip/unix_events.py", line 120, in run
    self._run_once()
  File "/Users/guido/tulip/tulip/unix_events.py", line 592, in _run_once
    event_list = self._selector.select(timeout)
  File "/Users/guido/tulip/tulip/selectors.py", line 255, in select
    key = self._key_from_fd(fd)
  File "/Users/guido/tulip/tulip/selectors.py", line 180, in _key_from_fd
    raise RuntimeError("No key found for fd {}".format(fd))
RuntimeError: No key found for fd 0

(But the fd value varies -- sometimes it is -2, sometimes a large number.)

The other test busy-loops (keeps polling) and I have no useful traceback.

Also notice the need for a third constant, SELECT_CONNECT.  For details see the class WindowsPollPollster in the Tulip code.
msg179241 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-01-07 02:59
Maybe I should explain the need for SELECT_CONNECT more.  This is necessary so that on Windows the PollSelector can use WSAPoll().  The reason is that for async connect() calls, WSAPoll() doesn't return the FD as writable, just as having an exception.  I didn't write the code (Richard Oudkerk did), and I've never run it (no access to Windows), but the workaround for the problem with WSAPoll() is apparently quite complex:
http://code.google.com/p/tulip/source/browse/tulip/unix_events.py#236

The long and short of it is that portable code must use add_connector() and remove_connector() instead of add_writer() and remove_writer() for async connect() calls.  So these need to be mapped to something different in the selector world.  The right thing would seem to add a separate flag, SELECT_CONNECT, with a value distinct from SELECT_IN and SELECT_OUT.  But I didn't write the code for that -- I made a hackish change so that the selector code exports SELECT_CONNECT as an alias for SELECT_OUT.  I did write code for add_connector() and remove_connector() but these may have to be modified slightly again later once the selector code has SELECT_OUT != SELECT_CONNECT.

Please do try to understand this -- it would not be possible to support WSAPoll() at all without distinguishing between write and connect readyness.

Feel free to ping Richard Oudkerk (shibturn at gmail.com) for more information.

I don't *think* the problems are in my code, since both tests pass when the SelectSelector is used.

I can currently only test on OSX 10.7 (Mountain Lion); hopefully I have more platforms after tomorrow (my first day at the new job).
msg179242 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-01-07 03:00
(Adding Richard Oudkerk to the nosy list as I am pleading here for support of WSAPoll(), see last few messages.)
msg179252 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-01-07 09:31
> Also notice the need for a third constant, SELECT_CONNECT.  For details see the class WindowsPollPollster in the Tulip code.

I'll trust Richard on all Windows matter, so if you need a
SELECT_CONNECT constant, I'll expose one.

> However, two Tulip tests are now failing:

I can't reproduce those failures on Linux (I had to adapt the
hostnames because I'm at work and my test machine can't access the
internet, but otherwise everything is the same). All tests pass with
select, poll and epoll.

Note that I had to fix a typo in the patch: EPollSelector -> EpollSelector.

> - tulip.events_test.PollEventLoopTests.testCreateSslTransport fails with spurious file descriptors returned by poll() that aren't in the _fd_to_key dict (but the corresponding test with Select passes)
>
> The first failure has this traceback:
>
> Traceback (most recent call last):
>   File "/Users/guido/tulip/tulip/selectors.py", line 178, in _key_from_fd
>     return self._fd_to_key[fd]
> KeyError: 0
>
> During handling of the above exception, another exception occurred:
>
> Traceback (most recent call last):
>   File "/Users/guido/tulip/tulip/events_test.py", line 216, in testCreateSslTra\
> nsport
>     el.run()
>   File "/Users/guido/tulip/tulip/unix_events.py", line 120, in run
>     self._run_once()
>   File "/Users/guido/tulip/tulip/unix_events.py", line 592, in _run_once
>     event_list = self._selector.select(timeout)
>   File "/Users/guido/tulip/tulip/selectors.py", line 255, in select
>     key = self._key_from_fd(fd)
>   File "/Users/guido/tulip/tulip/selectors.py", line 180, in _key_from_fd
>     raise RuntimeError("No key found for fd {}".format(fd))
> RuntimeError: No key found for fd 0
>
> (But the fd value varies -- sometimes it is -2, sometimes a large number.)

This failure is really weird, because the file descriptor is really
just the value returned by poll.poll(): I don't know how this could
possibly ever be negative, unless some sort of overflow in the
poll.poll() code itself?

I can however see why the previous version wouldn't fail:
"""
        for fd, flags in self._poll.poll(msecs):
            if flags & ~select.POLLOUT:
                if fd in self.readers:
                    events.append(self.readers[fd])
            if flags & ~select.POLLIN:
                if fd in self.writers:
                    events.append(self.writers[fd])
"""

If a spurious fd is reported, it's silently ignored.

The second test runs fine on Linux, and from a cursory look, I don't
see how it could fail (the socket should be reported as write ready
upon ECONNREFUSED).

I'll try running the exact test case with the same hostnames from
home, but I doubt it'll make a difference, so maybe it's really OS-X
specific (if there's strace/dtrace on OS-X, it'll help see what's
going on).
msg179255 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-01-07 10:50
> The second test runs fine on Linux, and from a cursory look, I don't
> see how it could fail (the socket should be reported as write ready
> upon ECONNREFUSED).

Hum, thinking about it, I wonder is OS-X doesn't report POLLPRI or
some other esoteric event in case of ECONNREFUSED...

Could you try the patch attached?
msg179275 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-01-07 19:06
It works fine on Linux.

I'm attaching the latest version that should hopefully fix the second
failure, as a standalone diff and a diff for Tulip.
It also adds a SELECT_CONNECT, and - untested - kqueue support.
msg179299 - (view) Author: Felipe Cruz (felipecruz) * Date: 2013-01-08 00:36
Hi.. 2 comments related to Kqueue/OSX(16.8)

1 - In tulip/selectors.py L311 and L314 - is key.fd not fd

2 - In EventLoopTestsMixin::test_writer_callback if the writer socket isn't non-blocking, the test hangs forever..

3 - Errors:

ERROR: testCreateSslTransport (tulip.events_test.PollEventLoopTests)
  File "/Users/felipecruz/Projects/tulip3/tulip/selectors.py", line 180, in _key_from_fd
    raise RuntimeError("No key found for fd {}".format(fd))
RuntimeError: No key found for fd -2


ERROR: test_sock_client_ops (tulip.events_test.KqueueEventLoopTests)
  File "/Users/felipecruz/Projects/tulip3/tulip/selectors.py", line 180, in _key_from_fd
    raise RuntimeError("No key found for fd {}".format(fd))
RuntimeError: No key found for fd 77
msg179300 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-01-08 00:45
There's a bug in the kqueue selector.  The following code in your patch has 'fd' instead of 'key.fd', twice:

            if events & SELECT_IN:
                kev = kevent(key.fd, KQ_FILTER_READ, KQ_EV_ADD)
                self._kqueue.control([kev], 0, 0)
            if events & SELECT_OUT:
                kev = kevent(key.fd, KQ_FILTER_WRITE, KQ_EV_ADD)
		self._kqueue.control([kev], 0, 0)

One kqueue test fails on OSX:

ERROR: test_sock_client_ops (tulip.events_test.KqueueEventLoopTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/guido/tulip/tulip/selectors.py", line 178, in _key_from_fd
    return self._fd_to_key[fd]
KeyError: 11

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/guido/tulip/tulip/events_test.py", line 167, in test_sock_client_ops
    el.run_until_complete(el.sock_sendall(sock, b'GET / HTTP/1.0\r\n\r\n'))
  File "/Users/guido/tulip/tulip/unix_events.py", line 146, in run_until_complete
    self.run()
  File "/Users/guido/tulip/tulip/unix_events.py", line 110, in run
    self._run_once()
  File "/Users/guido/tulip/tulip/unix_events.py", line 582, in _run_once
    event_list = self._selector.select(timeout)
  File "/Users/guido/tulip/tulip/selectors.py", line 329, in select
    key = self._key_from_fd(fd)
  File "/Users/guido/tulip/tulip/selectors.py", line 180, in _key_from_fd
    raise RuntimeError("No key found for fd {}".format(fd))
RuntimeError: No key found for fd 11
msg179328 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-01-08 10:49
So I assume that the second failure is fixed, which means that OS-X
returns a funny event (and not POLLIN/POLLHUP/POLLERR) in case of
ECONNREFUSED :-(

> 2 - In EventLoopTestsMixin::test_writer_callback if the writer socket isn't non-blocking, the test hangs forever..

This test is buggy:

"""
    def test_writer_callback(self):
        el = events.get_event_loop()
        r, w = unix_events.socketpair()
        el.add_writer(w.fileno(), w.send, b'x'*100)
        el.call_later(0.1, el.remove_writer, w.fileno())
        el.run()
        w.close()
"""

Because the writer socket isn't non-blocking, if the output socket
buffer fills up in less than 0.1 seconds, the call to w.send(b'x'*100)
will block.
select()/poll()/kqueue() use a watermark to decide whether the FD is
readable/writable: for a Unix domain socket, I guess OS-X returns that
the socket is writable is there's at least one byte free in the output
socket buffer: since send() tries to write 100 bytes at once, it
blocks.

I can reproduce the hang on Linux with vanilla (unpatched) tulip by
increasing the number of bytes sent() in one syscall:
"""
--- tulip-bf4cb136c121/tulip/events_test.py     2092-08-05
00:00:00.000000000 +0200
+++ tulip/tulip/events_test.py  2013-01-08 11:35:27.400198000 +0100
@@ -149,7 +149,7 @@
     def test_writer_callback(self):
         el = events.get_event_loop()
         r, w = unix_events.socketpair()
-        el.add_writer(w.fileno(), w.send, b'x'*100)
+        el.add_writer(w.fileno(), w.send, b'x'*(1<<18))
         el.call_later(0.1, el.remove_writer, w.fileno())
         el.run()
         w.close()
"""

(I have to do that because Linux uses an adaptive watermark, see
http://lxr.free-electrons.com/source/net/unix/af_unix.c#L2156 and
http://lxr.free-electrons.com/source/net/unix/af_unix.c#L319 ).

That's why all FDs monitored with select()/poll()/epoll() must be
non-blocking (and there's also the risk of spurious wakeups...).

> 3 - Errors:
>
> ERROR: testCreateSslTransport (tulip.events_test.PollEventLoopTests)
>   File "/Users/felipecruz/Projects/tulip3/tulip/selectors.py", line 180, in _key_from_fd
>     raise RuntimeError("No key found for fd {}".format(fd))
> RuntimeError: No key found for fd -2
>
>
> ERROR: test_sock_client_ops (tulip.events_test.KqueueEventLoopTests)
>   File "/Users/felipecruz/Projects/tulip3/tulip/selectors.py", line 180, in _key_from_fd
>     raise RuntimeError("No key found for fd {}".format(fd))
> RuntimeError: No key found for fd 77

Yes, it's the same problem: poll()/kqueue() returning garbage.
I guess I'll have to patch Selector to ignore unknown FDs, but that
really looks like an OS-X bug.
Could you dump the content of the returned kevent when the fd is not found?
msg179365 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-01-08 18:56
Ok, I fixed test_writer_callback() in the tulip repo.

Indeed the error I reported first is gone now.

I logged the value of kev on failure, and got this output:

ERROR:root:kev = <select.kevent ident=83 filter=-2 flags=0x1 fflags=0x0 data=0x131750 udata=0x0>
Traceback (most recent call last):
  File "/Users/guido/tulip/tulip/selectors.py", line 178, in _key_from_fd
    return self._fd_to_key[fd]
KeyError: 83

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/guido/tulip/tulip/selectors.py", line 330, in select
    key = self._key_from_fd(fd)
  File "/Users/guido/tulip/tulip/selectors.py", line 180, in _key_from_fd
    raise RuntimeError("No key found for fd {}".format(fd))
RuntimeError: No key found for fd 83



The full traceback printed by the unittest framework was:

ERROR: test_sock_client_ops (tulip.events_test.KqueueEventLoopTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/guido/tulip/tulip/selectors.py", line 178, in _key_from_fd
    return self._fd_to_key[fd]
KeyError: 83

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/guido/tulip/tulip/events_test.py", line 168, in test_sock_client_ops
    el.run_until_complete(el.sock_sendall(sock, b'GET / HTTP/1.0\r\n\r\n'))
  File "/Users/guido/tulip/tulip/unix_events.py", line 146, in run_until_complete
    self.run()
  File "/Users/guido/tulip/tulip/unix_events.py", line 110, in run
    self._run_once()
  File "/Users/guido/tulip/tulip/unix_events.py", line 582, in _run_once
    event_list = self._selector.select(timeout)
  File "/Users/guido/tulip/tulip/selectors.py", line 330, in select
    key = self._key_from_fd(fd)
  File "/Users/guido/tulip/tulip/selectors.py", line 180, in _key_from_fd
    raise RuntimeError("No key found for fd {}".format(fd))
RuntimeError: No key found for fd 83
msg179381 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-01-08 22:22
Figured it out. KqueueSelector was missing the unregister() implementation.  Here it is:

	def unregister(self, fileobj):
            key = super().unregister(fileobj)
            mask = 0
            if key.events & SELECT_IN:
                mask |= KQ_FILTER_READ
            if key.events & SELECT_OUT:
                mask |= KQ_FILTER_WRITE
            kev = kevent(key.fd, mask, KQ_EV_DELETE)
            self._kqueue.control([kev], 0, 0)

I also added a __repr__ to the _Key class:

    def __repr__(self):
        return '{}<fileobje={}, fd={}, events={:#x}, data={}>'.format(
            self.__class__.__name__,
            self.fileobj, self.fd, self.events, self.data)

Attached a new patch for tulip (but no new patch for select.py).  I think I may have left the debug logging in.  I will submit this to the Tulip repo, but we still need the changes for SELECT_CONNECT (I can't really test that, still no Windows access).
msg179382 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-01-08 22:31
This should fix some kqueue failures (the FDs were not unregistered,
which could result in spurious FDs being reported at a later time).

For the "negative FDs", all spurious events are now caught and logged
(I think it's better than silently ignoring them as it is now).
Hopefully we'll see an EV_ERROR with kev.data set to a sensible errno.

The implementation should now be more or less complete, apart from the
WSAPoll, which I'm unable to add/test.

Richard, in Tulip's WSAPoll code, it reads:

    class WindowsPollPollster(PollPollster):
        """Pollster implementation using WSAPoll.

        WSAPoll is only available on Windows Vista and later.  Python
        does not currently support WSAPoll, but there is a patch
        available at http://bugs.python.org/issue16507.
        """

Does this means that this code need the patch from issue #16507 to work?

Also, I've read something about IOCP: is this a replacement for
WSAPoll, are there plans to get it merged at some point to python (and
if yes, would the select module be a proper home for this)?
msg179383 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-01-08 22:44
Whoops, the unregister() function needs to return key.
msg179384 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-01-08 22:45
That ought to be added to the docstrings for register()/unregister() in _BaseSelector.
msg179385 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-01-08 22:47
A patch for making register()/unregister() return the key.
msg179387 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-01-08 22:52
As it is now, _Key is actually an implementation detail, that's why it's not part of the methods signature. Oonly the base register()/unregister() return the key because it's an easy way for the derived classes to get the FD without calling _fileobj_to_fd again (it's not really elegant, I'm open to suggestions).

We could make it part of the API though, and maybe return it also from get_info() instead of the (fileobj, data) tuple.
msg179389 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-01-08 23:08
Please consider my patches instead; it seems our patches crossed.  Merging is now difficult because I already submitted my version to Tulip.  Your version still has a bug: it uses select.kevent(...) twice, where it should just use kevent(...).  Also my version makes fewer syscalls when unregistering a FD that has both read and write events registered.

Regarding the _Key return value: I think it's asking for trouble if the signature of the base class differs from that of the subclass.  The return value may even be useful occasionally.

Given that no spurious FD events are now reported by the unittests, I'm not sure that it is useful to log and ignore them; it may be better to have the exception be raised, as it might expose an app bug, and in my experience it usually ends up in an infinite busy-wait loop once it happens.
msg179392 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-01-08 23:13
> Richard, in Tulip's WSAPoll code, it reads:
>
>     class WindowsPollPollster(PollPollster):
>         """Pollster implementation using WSAPoll.
> 
>         WSAPoll is only available on Windows Vista and later.  Python
>         does not currently support WSAPoll, but there is a patch
>         available at http://bugs.python.org/issue16507.
>         """
> 
> Does this means that this code need the patch from issue #16507 to work?

Yes.

> Also, I've read something about IOCP: is this a replacement for
> WSAPoll, are there plans to get it merged at some point to python (and
> if yes, would the select module be a proper home for this)?

IOCP is not a replacement for WSAPoll (or select).  Among other things it is not possible to use the current ssl module with IOCP (although Twisted manages to use IOCP with PyOpenSSL).

Tulip should have IOCP support, so presumable when tulip is merged some support for IOCP will also be available in python.  But I am not convinced that the select module is the proper home.
msg179423 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-01-09 08:02
> Please consider my patches instead; it seems our patches crossed.  Merging is now difficult because I already submitted my version to Tulip.

That's fine, I don't think there's a point into maintaining a
standalone patch for now: we can see how it goes with Tulip, flesh out
the API and merge it when it's ready (or when Tulip goes in)?

> Also my version makes fewer syscalls when unregistering a FD that has both read and write events registered.

It did it on purpose, because I wasn't sure whether kqueue ops can
accept more than one filter at a time.
If that's the case, then you can do the same thing for register().

> Regarding the _Key return value: I think it's asking for trouble if the signature of the base class differs from that of the subclass.  The return value may even be useful occasionally.

Agreed (but then it might be better to change _Key to Key if it's public).

> Given that no spurious FD events are now reported by the unittests, I'm not sure that it is useful to log and ignore them; it may be better to have the exception be raised, as it might expose an app bug, and in my experience it usually ends up in an infinite busy-wait loop once it happens.

Sounds good to me.
msg179481 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-01-09 18:04
- How shall we go forward?  I've made a variety of small changes to the Tulip version (selector.py) now.  Can you work those into a fresh unified patch for CPython 3.4?

- I tried making a single combined kqueue call in register(), and it caused the SSL test to hang, so I reverted that and changed my unregister code to make two syscalls as wel.  (This makes me think however that KqueueSelector should override modify() to avoid redundant syscalls when flipping only one of the two bits.)

- I'll change _Key to Key in the Tulip copy (though I wonder if maybe it should be a longer name -- 'Key' is rather generic).

- Are you going to implement the SELECT_CONNECT flag?

- Thanks for being so responsive!  It's great to be able to factor the selector functionality out of Tulip.

- Have you submitted a PSF contributor form yet?  (Probably yes, years ago, just making sure. :-)
msg179486 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-01-09 20:47
> - How shall we go forward?  I've made a variety of small changes to the
> Tulip version (selector.py) now.  Can you work those into a fresh unified
> patch for CPython 3.4?

Yes, but I think we could wait a little, to make sure the API is good
enough to fulfill all tulip's need?
There are a couple things I'm not completely happy with:
- if we add a CONNECT event, then I think SELECT_IN/SELECT_OUT aren't
good names anymore: I think that EVENT_READ/EVENT_WRITE/EVENT_CONNECT
would be more consistent (since it embeds the operation that should
now succeed on the socket). Maybe even shorter names like EVT_READ?
- right now, selectors don't handle EINTR, which means that upon
reception of a signal, Selector.select() can fail with EINTR (the
original tulip implementation also had this problem). I've been more
or less midly advocating to change all Python syscalls wrappers to
handle EINTR because it's really just a nuisance, but I think
Selector.select() should definitely handle EINTR and retry the
syscalls (with an updated timeout).
Note: the test testAddSignalHandler() added recently doesn't
demonstrate this problem because os.kill(os.getpid(), signal.SIGINT)
because the signal is delivered synchronously (when kill() syscall
returns to user-space), before select()/poll()/epoll() gets called.

> - I'll change _Key to Key in the Tulip copy (though I wonder if maybe it
> should be a longer name -- 'Key' is rather generic).

SelectorKey, or something along those lines?

> - Are you going to implement the SELECT_CONNECT flag?

Unfortunately, I don't know anything about Windows, so I won't be able to.
But I guess that porting Pollster.WSAPollster to selectors shouldn't
be too hard.

> - Have you submitted a PSF contributor form yet?  (Probably yes, years ago,
> just making sure. :-)

Actually, I already have commit rights ;-)
msg179489 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-01-09 21:28
On Wed, Jan 9, 2013 at 12:47 PM, Charles-François Natali
<report@bugs.python.org> wrote:
>> - How shall we go forward?  I've made a variety of small changes to the
>> Tulip version (selector.py) now.  Can you work those into a fresh unified
>> patch for CPython 3.4?
>
> Yes, but I think we could wait a little, to make sure the API is good
> enough to fulfill all tulip's need?

Sure. I think in terms of API, it is pretty good except for the
'connect' issue below (and the related issue of a WSAPoll() wrapper,
which Tulip had before the switch). In terms of implementation, I
expect there's plenty to do, but it will be a long time before we'll
know. It's probably best to do this incrementally.

There's also the question of IOCP support, but I expect that it is too
unwieldy to add to the select module -- it'll be a separate project.
Hopefully Richard can help with this.

> There are a couple things I'm not completely happy with:
> - if we add a CONNECT event, then I think SELECT_IN/SELECT_OUT aren't
> good names anymore: I think that EVENT_READ/EVENT_WRITE/EVENT_CONNECT
> would be more consistent (since it embeds the operation that should
> now succeed on the socket). Maybe even shorter names like EVT_READ?

I like the longer names, EVENT_READ; they aren't used enough to need
abbreviating, and it's always hard to remember exactly what
abbreviation is used.

> - right now, selectors don't handle EINTR, which means that upon
> reception of a signal, Selector.select() can fail with EINTR (the
> original tulip implementation also had this problem). I've been more
> or less midly advocating to change all Python syscalls wrappers to
> handle EINTR because it's really just a nuisance, but I think
> Selector.select() should definitely handle EINTR and retry the
> syscalls (with an updated timeout).

I'm not sure what the pros and cons would be of doing this for all
syscall wrappers, but I do know I want Tulip to work with Python 3.3
out of the box, so we need to add this to the select() calls. I
imagine it's a bit tricky to test... Maybe I could use alarm() or
setitimer() for testing?

> Note: the test testAddSignalHandler() added recently doesn't
> demonstrate this problem because os.kill(os.getpid(), signal.SIGINT)
> because the signal is delivered synchronously (when kill() syscall
> returns to user-space), before select()/poll()/epoll() gets called.

I know. :-(

>> - I'll change _Key to Key in the Tulip copy (though I wonder if maybe it
>> should be a longer name -- 'Key' is rather generic).
>
> SelectorKey, or something along those lines?

Sounds good.

>> - Are you going to implement the SELECT_CONNECT flag?
>
> Unfortunately, I don't know anything about Windows, so I won't be able to.
> But I guess that porting Pollster.WSAPollster to selectors shouldn't
> be too hard.

Just check out the last rev of Tulip before I switched:
http://code.google.com/p/tulip/source/browse/tulip/unix_events.py?spec=svn28b7efd465b8529cdaa34e6b710f291c835c657b&r=28b7efd465b8529cdaa34e6b710f291c835c657b

>> - Have you submitted a PSF contributor form yet?  (Probably yes, years ago,
>> just making sure. :-)
>
> Actually, I already have commit rights ;-)

Doesn't surprise me. :-)
msg179490 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-01-09 21:43
I've fixed the signal handling issue.  Please check Tulip revision 1c4db3d1a688: http://code.google.com/p/tulip/source/detail?r=1c4db3d1a68874dc22c84f9c1c376c5371037f09
msg179491 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-01-09 21:45
And the next Tulip rev renames Key to SelectorKey.
msg186286 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-04-08 10:07
If this goes inside the select module, it could probably help issue #17552 (add socket.sendfile()) a bit.
msg186617 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-04-12 11:13
> If this goes inside the select module, it could probably help issue #17552 (add socket.sendfile()) a bit.

Yes, that's exactly the idea. See also the attached patches for
telnetlib and multiprocessing (there would probably be other
candidates like subprocess, etc).

I guess we can merge this when tulip goes in.
msg194179 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-08-02 15:16
Here's a patch, based on the version in tulip's repo.
I've added doc update and tests.

I also made the following changes:
- BaseSelector is an abstract base class (so one could imagine user code implementing its own selector on top of it)
- SelectorKey is a named tuple: I think that's better than a class, because SelectorKeys can be returned to the user (register(), get_info(), etc), and we don't want him to mess with them. Also, we get a nice repr() for free
- since SelectorKey is immutable, I added a get_keys() method, which returns all the keys registered. Since it's a dict values, the user can't mess with it

This get_keys() method superseedes the registered_count() method, which can be simply replaced with len(selector.get_keys()). We could probably remove it? It also sort of superseedes the get_info() method: I don't remember, why was it needed?

Would it be possible to get it into 3.4?
msg194182 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-08-02 16:16
Thanks!  We should make sure this makes it into alpha 2.  Could you also prepare a patch for Tulip itself?  That way I can make sure Tulip works with either its own or the stdlib implementation.

One thought: would it be better if the selectors ended up in a separate module rather than extending 'select'?
msg194184 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-02 16:22
> One thought: would it be better if the selectors ended up in a
> separate module rather than extending 'select'?

IMHO they're fine in select. It's quite the natural place for them.
msg194190 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-08-02 17:59
> Thanks!  We should make sure this makes it into alpha 2.  Could you also
> prepare a patch for Tulip itself?  That way I can make sure Tulip works with
> either its own or the stdlib implementation.

Sure, here's a patch (tulip_selector_stdlib.diff).

It seems to work fine, but I'm getting random errors even with vanilla tulip:
"""
======================================================================
FAIL: test_run_until_complete_timeout (events_test.EPollEventLoopTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/events_test.py", line 194, in test_run_until_complete_timeout
    self.assertTrue(0.009 <= t2-t1 <= 0.018, t2-t1)
AssertionError: False is not true : 0.0055361779959639534
"""

Looks like the timeouts check may be a bit too tight?

I'm also updating the patch a bit:
- when a timeout < 0 is passed to select(), it's treated as timeout ==
0, i.e. polling. It fixes a common kind of nasty bugs in user code
where when calculating timeouts, if you end up passing a negative
value, it can either be interpreted as an infinite wait, or can raise
EINVAL
- previously, upon EINTR (InterruptedError), select() would return an
empty list. That can be a problem if the user specified a timeout,
because you end up returning too early. Thanks to a decorator,
select() now retries upon EINTR, re-calculating the timeout. I also
added a test for this.

I'm pretty happy with the state now, so I think it's ready for review.
There are a couple points I'd like more specific feedback:
- is the registered_count() method still useful, now that get_keys()
can return all the keys?
- same thing for get_info(), it feels redundant
- the documentation ;-)
msg194218 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-08-02 22:47
Version addressing Antoine and Christian's comments.
msg194236 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-08-03 04:21
No time for a full review yet, but a comment on the timeout test failure you are observing in Tulip.  This particular test schedules a task that sleeps for 20 msecs, and first calls run_until_complete() with a timeout of 10 msecs, and then calls it again without a timeout.  It checks that the first call takes approximately 10 msec and the second call too.  But for some reason it is okay if the first call takes quite a bit longer (up to 18 msec) and then the second check is wrong.

Can you try again with the failing assert replaced with this?

	self.assertTrue(0.018 <= t2-t0 <= 0.028, t2-t0)

That should be a better way to check that code works.
msg194243 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-08-03 10:22
> Guido van Rossum added the comment:
> Can you try again with the failing assert replaced with this?
>
> 	self.assertTrue(0.018 <= t2-t0 <= 0.028, t2-t0)
>
> That should be a better way to check that code works.

I'm still getting - less frequent - failures:
"""
======================================================================
FAIL: test_run_until_complete_timeout (events_test.SelectEventLoopTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests/events_test.py", line 194, in test_run_until_complete_timeout
    self.assertTrue(0.018 <= t2-t0 <= 0.028, t2-t0)
AssertionError: False is not true : 0.029771103999337356

----------------------------------------------------------------------
"""

Looking at strace output:
11:00:47.383145 select(4, [3], [], [], {0, 9765}) = 0 (Timeout) <0.015713>

select() takes an extra 5ms (15ms instead of 10ms).

5ms is quite low for a GPOS (the typical quantum is around 4ms for
250Hz timer, and on my machine I have high-resolution timers
configured but probably a crappy hardware).

I'd suggest increasing the timeouts (especially when Tulip gets
merged, it'll likely fail on many buildbots).
msg194590 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-08-06 23:01
Yeah, the timeouts are a bad idea anyway.  I've filed http://code.google.com/p/tulip/issues/detail?id=49 to remind us to do something about this.

I'll review your tulip patch next.
msg194592 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-08-06 23:33
Actually there is a misunderstanding underlying you Tulip patch.  Tulip needs to support Python 3.3 as well as Python 3.4, so it needs to do a little dance trying to import BaseSelector (or some other key class) from select and if that fails fall back to its own selectors.py.

Since you aren't changing the BaseSelector API at all this is trivial and I can take care of it.
msg194593 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-08-06 23:55
Sorry, replying to some direct questions in the bug:

> I also made the following changes:
> - BaseSelector is an abstract base class (so one could imagine user code
> implementing its own selector on top of it)

Fine (though I had to work around this for some tulip tests).

> - SelectorKey is a named tuple: I think that's better than a class,
> because SelectorKeys can be returned to the user (register(), get_info(),
> etc), and we don't want him to mess with them. Also, we get a nice repr()
> for free

Fine.

> - since SelectorKey is immutable, I added a get_keys() method, which
> returns all the keys registered. Since it's a dict values, the user can't
> mess with it

Fine. (I missed this in the code review.)

> This get_keys() method superseedes the registered_count() method, which
> can be simply replaced with len(selector.get_keys()). We could probably
> remove it? It also sort of superseedes the get_info() method: I don't
> remember, why was it needed?

registered_count() is only used internally so you can kill it.

get_info() is important for Tulip, please keep it.

> Would it be possible to get it into 3.4?

I think so, if Antoine agrees. The next alpha would be at your service.
msg195722 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-08-20 22:21
FYI I created the issue #18794: "select.devpoll objects have no close() method". If this method is added, it should also be called by the close() method of the selector.
msg195724 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-08-20 22:32
I've lost track -- who is waiting for whom?  I reviewed the patch selector-12.diff and received zero response AFAICT.  (Maybe the email from Rietveld didn't make it?)

I'd like to get this into alpha2, even if the proactor won't make it (though I'd love to see that as well).

Maybe the main issue is that I would prefer if the selector classes were in a separate module (selectors.py) than the low-level select/poll/etc. calls.  I think it makes sense if the selectors module *only* defines the high level classes and not the messy low-level stuff, since an app shouldn't be using both at the same time.
msg195726 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-20 22:37
Well, I suppose it waits on Charles-François to update the patch.

I'd prefer not to add a separate module for this, personally. I think this functionality is fine in the select module. People who know how to use a selector certainly have heard about select() or poll() already :-)
(no strong feelings about it anyway)
msg195749 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-08-21 07:09
> I've lost track -- who is waiting for whom?  I reviewed the patch selector-12.diff and received zero response AFAICT.  (Maybe the email from Rietveld didn't make it?)

I was on vacation, and then had to cope with many emails :-)
I'll update the patch with your comments soon.

> Maybe the main issue is that I would prefer if the selector classes were in a separate module (selectors.py) than the low-level select/poll/etc. calls.  I think it makes sense if the selectors module *only* defines the high level classes and not the messy low-level stuff, since an app shouldn't be using both at the same time.

I agree with Antoine (i.e. I think select is OK), but if it really
bothers you, I can change it.
I just don't want to do it twice :-)
msg195760 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-08-21 09:37
I like the idea of renaming select to _select, and add Lib/select.py
for the high-level wrapper (Selector). In my opinion, adding a module
just for 5 classes is overkill. Selector classes are a very thin
abstraction over the low-level objects. A new module can be justified
for something more evolved, something like an event loop with
callbacks and a scheduler, something like the PEP 3156 (tulip) :-)

This issue is for code written in Python. Is it a similar issue for
code written in C? internal_select_ex() of Modules/socketmodule.c and
check_socket_and_wait_for_timeout() of Modules/_ssl.c uses poll() and
select().
msg195801 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-08-21 16:15
On Wed, Aug 21, 2013 at 2:37 AM, STINNER Victor <report@bugs.python.org>wrote:

>
> STINNER Victor added the comment:
>
> I like the idea of renaming select to _select, and add Lib/select.py
> for the high-level wrapper (Selector). In my opinion, adding a module
> just for 5 classes is overkill. Selector classes are a very thin
> abstraction over the low-level objects. A new module can be justified
> for something more evolved, something like an event loop with
> callbacks and a scheduler, something like the PEP 3156 (tulip) :-)
>

OK, I'm just going to impose my decision. I want a separate selectors.py.
The rename business feels unpleasant, *and* the interface abstraction level
is different.

> This issue is for code written in Python. Is it a similar issue for
> code written in C? internal_select_ex() of Modules/socketmodule.c and
> check_socket_and_wait_for_timeout() of Modules/_ssl.c uses poll() and
> select().
>

Let's leave those alone. AFAIK they are mostly to simulate timeouts in
*blocking* code.
msg196075 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-08-24 12:04
Alright, I've updated the patch to have a distinct selectors module,
and with Guido's comments.

Before I post it for final review, I have three more questions:
1) In the documentation, I don't know how to best refer to files
object registered: is "file descriptor" OK, or is it too low-level?
Otherwise I'd be tempted to use just "file", but then this doesn't
include sockets, pipes, etc. Or maybe "file object"/"file-like
object"?

2) currently, the select() method returns a list of
(<file>, <event>, <data>)

But the opaque "data" object is optional, which means that many user
might end up doing:

for file, event, data in selector.select():
    if event & READ_EVENT:
        file.recv(1024)
    # don't use data

i.e. have to unpack it for no reason.

Would it make sense to return (<key>, <event>) instead?
This way the user has all the interesting information, and can do:

for key, event in selector.select():
    if event & READ_EVENT:
        key.file.recv(1024)
        or
        os.read(key.fd, 1024)

3) Concerning get_info(): right now the signature is:
get_info(): fileobj -> (events, data)

Wouldn't it be better to just return the whole key instead, which
contains all the relevant information (events, data, fd and fileobj).
Then we should probably also rename the method to get_key()?
msg196092 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-24 19:27
> Alright, I've updated the patch to have a distinct selectors module,
> and with Guido's comments.

Didn't you forget to upload it?
msg196098 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-08-24 20:33
> Didn't you forget to upload it?

I wanted to have answer to my questions first :-):

> Before I post it for final review, I have three more questions:
msg196099 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-24 20:45
> > Didn't you forget to upload it?
> 
> I wanted to have answer to my questions first :-):

Ah, indeed, sorry!

> 1) In the documentation, I don't know how to best refer to files
> object registered: is "file descriptor" OK, or is it too low-level?
> Otherwise I'd be tempted to use just "file", but then this doesn't
> include sockets, pipes, etc. Or maybe "file object"/"file-like
> object"?

"file object" and "file-like object" are the convention used in the
docs:
http://docs.python.org/dev/glossary.html#term-file-object

That said, if the APIs also accept file descriptors, it must be
mentioned.

> i.e. have to unpack it for no reason.

That doesn't seem much of a concern, does it?
Perhaps there can be two separate methods, select() and
select_with_data(), but that sounds a bit overkill.

> 3) Concerning get_info(): right now the signature is:
> get_info(): fileobj -> (events, data)
> 
> Wouldn't it be better to just return the whole key instead [...]

No opinion :-)
msg196240 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-08-26 19:49
On Sat, Aug 24, 2013 at 5:04 AM, Charles-François Natali
<report@bugs.python.org> wrote:
> Before I post it for final review, I have three more questions:
> 1) In the documentation, I don't know how to best refer to files
> object registered: is "file descriptor" OK, or is it too low-level?
> Otherwise I'd be tempted to use just "file", but then this doesn't
> include sockets, pipes, etc. Or maybe "file object"/"file-like
> object"?

What Antoine said.

And, moreover, it should be mentioned that on Windows it only accepts
socket FDs; and on UNIX it should probably qualify this in some way to
exclude disk FDs (which are always considered ready by select and
friends). I don't really know what the correct UNIX-flavor-neutral
terminology is for this case -- perhaps we have to enumerate the
possibilities? (I know of FIFOs, character special devices, and
sockets.)

> 2) currently, the select() method returns a list of
> (<file>, <event>, <data>)
>
> But the opaque "data" object is optional, which means that many user
> might end up doing:
>
> for file, event, data in selector.select():
>     if event & READ_EVENT:
>         file.recv(1024)
>     # don't use data
>
> i.e. have to unpack it for no reason.
>
> Would it make sense to return (<key>, <event>) instead?
> This way the user has all the interesting information, and can do:
>
> for key, event in selector.select():
>     if event & READ_EVENT:
>         key.file.recv(1024)
>         or
>         os.read(key.fd, 1024)

I like this -- return the key plus the specific event. You must
document SelectorKey, but that seems fine.

(Looks like Antoine read too fast and didn't realize you proposed to
return the key instead of the file and the data.)

> 3) Concerning get_info(): right now the signature is:
> get_info(): fileobj -> (events, data)
>
> Wouldn't it be better to just return the whole key instead, which
> contains all the relevant information (events, data, fd and fileobj).
> Then we should probably also rename the method to get_key()?

Yes, and yes.
msg196403 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-08-28 18:33
Alright, here's the patch, with all the changes discussed (and of
cours a separate selectors.py).
msg196527 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2013-08-30 12:30
Follows a couple of minor concerns.

- What about Solaris' /dev/poll?

- I'm not sure why in case of EINTR you retry with a different timeout; can't you just return []? 

- this is probably because I'm paranoid about performances but given that select() method will be called repeatedly I would not use a decorator. Also, similarly to what has been done elsewhere in the stdlib, for "critical" parts I would recommend localizing variable access in order to minimize overhead as in:

    def select(self, timeout=None):
        ... 
        key_from_fd = self._key_from_fd
        ready_append = ready.append
        for fd in r | w:
            ...
            key = key_from_fd(fd)
            if key:
                ready_append((key, events & key.events))
msg196536 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-08-30 13:34
Hello,

> - What about Solaris' /dev/poll?

That should be pretty easy to add by someone who has access to a
Solaris box: I could use the buildbots, but it'd take a couple
iterations to get it right.

> - I'm not sure why in case of EINTR you retry with a different timeout; can't you just return []?

Because when I do:
selector.select(10)

I expect the selector to wait 10 seconds if no event occurs: the fact
that a signal comes in shouldn't break the contract made by the API,
i.e. that it will return when a FD is ready or the timeout expires.
Early return can lead to spurious errors: just imagine you send a
request to a busy server: it would be bad to raise a timeout error
just because the user put the client in the background with CTRL-Z
(which results in SIGSTOP).

> - this is probably because I'm paranoid about performances but given that select() method will be called repeatedly I would not use a decorator. Also, similarly to what has been done elsewhere in the stdlib, for "critical" parts I would recommend localizing variable access in order to minimize overhead as in:
>
>     def select(self, timeout=None):
>         ...
>         key_from_fd = self._key_from_fd
>         ready_append = ready.append
>         for fd in r | w:
>             ...
>             key = key_from_fd(fd)
>             if key:
>                 ready_append((key, events & key.events))

I find that localizing variables leads to unreadable code, and is tied
to the current CPython interpreter implementation: such optimizations
belong to the interpreter, not user code.

As for the decorator performance overhead, I don't think it weights
much compared to the cost of a syscall (+ GIL acquire/release):
with decorator:
$ ./python -m timeit -s "from selectors import DefaultSelector,
EVENT_WRITE; import os; s = DefaultSelector();
s.register(os.pipe()[1], EVENT_WRITE)" "s.select()"
100000 loops, best of 3: 3.69 usec per loop
without decorator:
$ ./python -m timeit -s "from selectors import DefaultSelector,
EVENT_WRITE; import os; s = DefaultSelector();
s.register(os.pipe()[1], EVENT_WRITE)" "s.select()"
100000 loops, best of 3: 3.52 usec per loop

That's a 4% overhead, with a single FD that's always ready (and I
suspect that most of the overhead is due to the call to time(), not
the decorator per se).

Also, I'll shortly propose a patch to handle EINTR within C code, so
those EINTR wrappers won't  be needed anymore.
msg196537 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2013-08-30 13:39
> That should be pretty easy to add by someone who has access to a
> Solaris box: I could use the buildbots, but it'd take a couple
> iterations to get it right.

I can probably help with that.
msg196575 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2013-08-30 21:04
I'm playing with devpoll on Open Solaris and it seems it works exactly as poll(). Basically a copy & paste of PollSelector class with a different self._poll attribute will suffice.
With dev/poll support you now have 3 classes (PollSelector, EpollSelector, DevpollSelector) which look basically the same except for very little details. It looks like there's space for some code reuse / refactoring.
msg196596 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-08-30 23:22
On Fri, Aug 30, 2013 at 6:34 AM, Charles-François Natali
<report@bugs.python.org> wrote:
[Giampaolo Rodola']
> > - I'm not sure why in case of EINTR you retry with a different timeout; can't you just return []?
>
> Because when I do:
> selector.select(10)
>
> I expect the selector to wait 10 seconds if no event occurs: the fact
> that a signal comes in shouldn't break the contract made by the API,
> i.e. that it will return when a FD is ready or the timeout expires.

I disagree. In this case it is almost certain that there is already
some kind of retry loop around the Selector.select() call, so there is
no need for the retry. (Certainly this is how it happens in Tulip.) In
the case of select() the contract is actually quite clear, and there
shouldn't be any problem with returning an empty list early. (This is
quite different from e.g. an interrupted read() or recv() call, where
there is no existing return value that you can return safely, since
their users don't expect None and will usually treat an empty string
as an EOF indicator.)

> Early return can lead to spurious errors: just imagine you send a
> request to a busy server: it would be bad to raise a timeout error
> just because the user put the client in the background with CTRL-Z
> (which results in SIGSTOP).

Yeah, and Tulip's retry handles this just fine.

I also noticed that you are hardcoding a dependency on
time.monotonic() if it exists. If it weren't for your desire to "fix"
this issue you wouldn't have needed a time function at all!

> > - this is probably because I'm paranoid about performances but given that select() method will be called repeatedly I would not use a decorator. Also, similarly to what has been done elsewhere in the stdlib, for "critical" parts I would recommend localizing variable access in order to minimize overhead as in:
> >
> >     def select(self, timeout=None):
> >         ...
> >         key_from_fd = self._key_from_fd
> >         ready_append = ready.append
> >         for fd in r | w:
> >             ...
> >             key = key_from_fd(fd)
> >             if key:
> >                 ready_append((key, events & key.events))
>
> I find that localizing variables leads to unreadable code, and is tied
> to the current CPython interpreter implementation: such optimizations
> belong to the interpreter, not user code.
>
> As for the decorator performance overhead, I don't think it weights
> much compared to the cost of a syscall (+ GIL acquire/release):
> with decorator:
> $ ./python -m timeit -s "from selectors import DefaultSelector,
> EVENT_WRITE; import os; s = DefaultSelector();
> s.register(os.pipe()[1], EVENT_WRITE)" "s.select()"
> 100000 loops, best of 3: 3.69 usec per loop
> without decorator:
> $ ./python -m timeit -s "from selectors import DefaultSelector,
> EVENT_WRITE; import os; s = DefaultSelector();
> s.register(os.pipe()[1], EVENT_WRITE)" "s.select()"
> 100000 loops, best of 3: 3.52 usec per loop
>
> That's a 4% overhead, with a single FD that's always ready (and I
> suspect that most of the overhead is due to the call to time(), not
> the decorator per se).

But still I agree with Giampaolo that the decorator doesn't feel
right. It's like a try/except around the entire body of your function,
which is usually a code smell (even though I realize the exception it
catches can only be raised by the syscall). Please go back to the
explicit try/except returning [].

> Also, I'll shortly propose a patch to handle EINTR within C code, so
> those EINTR wrappers won't  be needed anymore.

Hm, it may be a while before everyone is comfortable with that change.
Who knows what might break?
msg196620 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-08-31 07:32
2013/8/31 Guido van Rossum <report@bugs.python.org>:
> But still I agree with Giampaolo that the decorator doesn't feel
> right. It's like a try/except around the entire body of your function,
> which is usually a code smell (even though I realize the exception it
> catches can only be raised by the syscall). Please go back to the
> explicit try/except returning [].

Well, I'll change it, but:
We have to change the select() docstring to "Wait until some
registered file objects become ready, or for *at most* timeout
seconds"

All user code that wants to handle EINTR or correct timeout (which
should be "all user code" in a perfect world) will have to wrap the
call to select() inside a loop equivalent to this decorator, and I
personally like the a rule of "never let the user do what the library
can do for him".
This includes for example multiprocessing and telnetlib which
currently implement this EINTR handling loop with timeout computation,
see e.g.:
http://hg.python.org/cpython/file/acc7439b1406/Lib/telnetlib.py#l297

And other places in the stdlib either don't handle EINTR properly, or
don't re-compute timeout (see e.g. http://bugs.python.org/issue12338).

Also, I don't really see the problem with retrying upon EINTR, since:
- the signal handler will be called right away anyway
- in case where the select() is a part of an event loop, the event
loop registers a wakup file descriptor with the selector, which means
that upon interesting signal delivery, the select() call will return
to the schedulor

So in short, I don't see how that could be a nuisance, but I can
certainly see how this could trick user code into raising spurious
timeout-related errors, or be much more complicated than it ought to
be (the above telnetlib example is IMO a great example of why EINTR
and timeout computation should be handled at the selector level).
msg196625 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-08-31 07:54
Oh, and FWIW, that's also what Java does:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5100121
http://hg.openjdk.java.net/jdk6/jdk6-gate/jdk/file/6daa81bdfd18/src/solaris/native/sun/nio/ch/PollArrayWrapper.c

(I know some people don't like Java, but I've found that they have
some really competent people and an extensive real-life exposure and
feedback).
msg196632 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2013-08-31 12:09
I may be missing something here but isn't the whole point of EINTR to interrupt a potentially long running syscall?
Why protract function return time as a consequence?
All the event loops I'm familiar with (twisted, tornado and asyncore) just 'return' on EINTR, supposedly because that was the best compromise to support syscall interruption without forcing users to handle EINTR themselves.
msg196633 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-31 12:23
> I may be missing something here but isn't the whole point of EINTR to
> interrupt a potentially long running syscall?

Not exactly. The point is to signal (!) that a signal was received.
The received signal is not necessarily expected.

Also, from what I understand, it is more reliable to use a wakeup fd
(using e.g. signal.set_wakeup_fd) rather than expect EINTR to be
returned:
http://mail.python.org/pipermail/python-dev/2013-August/128204.html
msg196634 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-31 12:26
> All the event loops I'm familiar with (twisted, tornado and asyncore)
> just 'return' on EINTR, supposedly because that was the best
> compromise to support syscall interruption without forcing users to
> handle EINTR themselves.

What do you mean? Tornado simply retries on EINTR:
https://github.com/facebook/tornado/blob/master/tornado/ioloop.py#L652
msg196635 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-08-31 12:28
> I may be missing something here but isn't the whole point of EINTR to interrupt a potentially long running syscall?

No.
EINTR is an artifact of the early Unix design, because failing when a
signal was delivered was simpler than restarting the syscall.
Nowadays, most syscalls are restarted by default (that's was true on
BSD, and nowadays Linux also follows this trend).
See e.g. http://lkml.indiana.edu/hypermail/linux/kernel/0104.0/0743.html
and http://man7.org/linux/man-pages/man7/signal.7.html

One reason why EINTR can occur so often in CPython is because we set
up signal handlers without SA_RESTART (which is set by default by
signal(3)). The reason is likely that we want the syscall to return to
be able to call PyCheckSignals() (since we cannot run Python code from
a signal handler).
But since in this case, PyCheckSignals() is called anyway from the
main eval loop, signal handlers will still be run in a timely manner
(and as noted, the proper way to handle signals is to use a wakeup
file descriptor which can be used by select()).
msg196641 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-08-31 15:14
> Charles-François Natali added the comment:

> Well, I'll change it, but:
> We have to change the select() docstring to "Wait until some
> registered file objects become ready, or for *at most* timeout
> seconds"

Doesn't sound so bad to me. It's about the same guarantee as sleep().

> All user code that wants to handle EINTR or correct timeout (which
> should be "all user code" in a perfect world) will have to wrap the
> call to select() inside a loop equivalent to this decorator, and I
> personally like the a rule of "never let the user do what the library
> can do for him".

No, they'll use a framework with an event loop like Tulip that takes
care of all of this for them. Such frameworks all have logic in them
to handle early returns from the low-level select/poll operation.

I find it easier to reason about the correctness of the framework's
code without your decorator:

(1) I have to convince myself that the code wrapped by your decorator
doesn't change any global state, *or* that there is a guarantee that
the exception caught is only raised by the select()/poll()/etc.
syscall, not anywhere else in the wrapped method.

(2) I have to remember that if a signal handler is called that
modifies the event loop's deadline, the selector will return
immediately anyway (so the event loop can recalculate its deadline)
because of the self-pipe.

(3) I have to prove that your decorator uses the same clock as my framework.

(4) I have to prove that your code does the same thing if the process
is suspended for a really long time and the system clock changes in
the meantime.

> This includes for example multiprocessing and telnetlib which
> currently implement this EINTR handling loop with timeout computation,
> see e.g.:
> http://hg.python.org/cpython/file/acc7439b1406/Lib/telnetlib.py#l297
>
> And other places in the stdlib either don't handle EINTR properly, or
> don't re-compute timeout (see e.g. http://bugs.python.org/issue12338).
>
> Also, I don't really see the problem with retrying upon EINTR, since:
> - the signal handler will be called right away anyway
> - in case where the select() is a part of an event loop, the event
> loop registers a wakup file descriptor with the selector, which means
> that upon interesting signal delivery, the select() call will return
> to the schedulor
>
> So in short, I don't see how that could be a nuisance, but I can
> certainly see how this could trick user code into raising spurious
> timeout-related errors, or be much more complicated than it ought to
> be (the above telnetlib example is IMO a great example of why EINTR
> and timeout computation should be handled at the selector level).

The selector is a helper for higher-level frameworks. I think the
EINTR handling belongs in the framework.
msg196649 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-08-31 17:06
Here's a patch returning [] on EINTR.

I've tested it on Linux, Windows, FreeBSD, OpenSolaris and OS-X.
msg196651 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-31 17:15
> Here's a patch returning [] on EINTR.

Hmm... Shouldn't it simply let the InterruptedError bubble up?
I know there's the sleep() precedent, but I find it a bit embarassing actually: the call can return early, but there's no way for the caller to know that it returned early.
msg196654 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-08-31 17:26
Hm... I like the [] return just fine, it matches what Tulip's selector did.  The framework should be re-checking the clock anyway -- it's possible that select() returned late too...
msg196926 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-09-04 17:05
New changeset e4d45315c38c by Charles-François Natali in branch 'default':
Issue #16853: Add new selectors module.
http://hg.python.org/cpython/rev/e4d45315c38c
msg196930 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-09-04 18:27
> New changeset e4d45315c38c by Charles-François Natali in branch 'default':
> Issue #16853: Add new selectors module.
> http://hg.python.org/cpython/rev/e4d45315c38c

Great! Congrats Charles-François for the new module! I tried to
implement such module once but I gave up because it was more complex
than what I expected.
msg196931 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-09-04 18:30
New changeset 6b14ebe0f7ac by Victor Stinner in branch 'default':
Issue #16853: Mention the new selectors module in What's New in Python 3.4
http://hg.python.org/cpython/rev/6b14ebe0f7ac
msg196932 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-09-04 18:31
Agreed, this was a great feat of implementation and API design.

Is there anything left to do or can we close this issue?

(I looked at the Proactor code in Tulip again and I think it is not quite as easy to isolate it, so I'm not requesting that Proactors be added to the stdlib at this time.)
msg196933 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-09-04 18:35
> Is there anything left to do or can we close this issue?

It would be nice to mention the new selectors module in the documentation of the select module.
msg196934 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-09-04 18:37
> > Is there anything left to do or can we close this issue?
> 
> It would be nice to mention the new selectors module in the documentation of the select module.

Feel free to make documentation commits :) They don't necessarily have
to pass a review stage.
msg196935 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-09-04 18:40
New changeset 142ff216e4b4 by Victor Stinner in branch 'default':
Issue #16853: Mention the new selectors module in the select module
http://hg.python.org/cpython/rev/142ff216e4b4
msg196938 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-09-04 18:45
> Feel free to make documentation commits :)

Ok, done.

I also created the issue #18923: "Use the new selectors module in the subprocess module".
msg196939 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-09-04 18:47
I feel this can be closed then :) Thanks, Charles-François!
msg196940 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2013-09-04 18:48
> Is there anything left to do or can we close this issue?

Add support for select.devpoll on Solaris.
msg196941 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-09-04 18:49
> Is there anything left to do or can we close this issue?

I usually wait until the test is run on the buildbots to check I didn't break anything: apparently that's OK, so we can close.

> It would be nice to mention the new selectors module in the documentation of the select module.

Indeed (I mentioned the select module in selectors' doc, but the contrary is certainly useful to point users to this new module), thanks for your commit.

Where were you when I asked for documentation review ;-)
msg196957 - (view) Author: Giampaolo Rodola' (giampaolo.rodola) * (Python committer) Date: 2013-09-04 21:10
I have realized just now that at least epoll() and kqueue() selectors could take advantage and define their own modify() method (there was even a TODO I totally missed). 
Anyway, from now on I'm gonna file separate issues.
msg196984 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-09-05 08:27
> Giampaolo Rodola' added the comment:
>
> I have realized just now that at least epoll() and kqueue() selectors could take advantage and define their own modify() method (there was even a TODO I totally missed).

As a side note, in the general case, there's more than a performance
optimization: the problem with unregister() + register() vs a real
modify (e.g. EPOLL_CTL_MOD) is that it's subject to a race condition,
if an event fires during the short window where the FD isn't
registered anymore.

But since selectors use level-triggered notification, that's not an
issue: I'll probably add a comment in the code explaining why it's
safe, though.
msg196996 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-09-05 13:09
On 05/09/2013 9:28am, Charles-François Natali wrote:
> As a side note, in the general case, there's more than a performance
> optimization: the problem with unregister() + register() vs a real
> modify (e.g. EPOLL_CTL_MOD) is that it's subject to a race condition,
> if an event fires during the short window where the FD isn't
> registered anymore.

Even with edge-triggered notification, doesn't registering an fd check 
the current readiness:

 >>> import os, select
 >>> r, w = os.pipe()
 >>> p = select.epoll()
 >>> os.write(w, b"h")
 >>> p.register(r, select.EPOLLIN | select.EPOLLET)
 >>> p.poll(0)
[(3, 1)]
 >>> p.poll(0)
[]

Otherwise it would be very difficult to edge-triggered notification.
msg197010 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-09-05 16:42
> Richard Oudkerk added the comment:
>
> Even with edge-triggered notification, doesn't registering an fd check
> the current readiness:

Hum...
Looks like I forgot to turn my brain on this morning :-)
History
Date User Action Args
2022-04-11 14:57:40adminsetgithub: 61057
2014-03-17 19:47:21jceasetnosy: + jcea
2013-09-05 16:42:34neologixsetmessages: + msg197010
2013-09-05 13:09:02sbtsetmessages: + msg196996
2013-09-05 08:28:00neologixsetmessages: + msg196984
2013-09-04 21:10:27giampaolo.rodolasetmessages: + msg196957
2013-09-04 18:49:26neologixsetmessages: + msg196941
2013-09-04 18:48:10giampaolo.rodolasetmessages: + msg196940
2013-09-04 18:47:05pitrousetstatus: open -> closed
resolution: fixed
messages: + msg196939

stage: patch review -> resolved
2013-09-04 18:45:33vstinnersetmessages: + msg196938
2013-09-04 18:40:27python-devsetmessages: + msg196935
2013-09-04 18:37:14pitrousetmessages: + msg196934
2013-09-04 18:35:44vstinnersetmessages: + msg196933
2013-09-04 18:31:46gvanrossumsetmessages: + msg196932
2013-09-04 18:30:48python-devsetmessages: + msg196931
2013-09-04 18:27:17vstinnersetmessages: + msg196930
2013-09-04 17:05:45python-devsetnosy: + python-dev
messages: + msg196926
2013-08-31 17:26:47gvanrossumsetmessages: + msg196654
2013-08-31 17:15:46pitrousetmessages: + msg196651
2013-08-31 17:06:54neologixsetfiles: + selectors-15.diff

messages: + msg196649
2013-08-31 15:14:18gvanrossumsetmessages: + msg196641
2013-08-31 12:28:39neologixsetmessages: + msg196635
2013-08-31 12:26:00pitrousetmessages: + msg196634
2013-08-31 12:23:07pitrousetmessages: + msg196633
2013-08-31 12:09:35giampaolo.rodolasetmessages: + msg196632
2013-08-31 07:54:36neologixsetmessages: + msg196625
2013-08-31 07:32:07neologixsetmessages: + msg196620
2013-08-30 23:22:21gvanrossumsetmessages: + msg196596
2013-08-30 21:04:13giampaolo.rodolasetmessages: + msg196575
2013-08-30 13:39:46giampaolo.rodolasetmessages: + msg196537
2013-08-30 13:34:09neologixsetmessages: + msg196536
2013-08-30 12:30:49giampaolo.rodolasetmessages: + msg196527
2013-08-28 18:33:38neologixsetfiles: + selectors-14.diff

messages: + msg196403
2013-08-26 19:49:17gvanrossumsetmessages: + msg196240
2013-08-24 20:45:12pitrousetmessages: + msg196099
2013-08-24 20:33:24neologixsetmessages: + msg196098
2013-08-24 19:27:10pitrousetmessages: + msg196092
2013-08-24 12:04:15neologixsetmessages: + msg196075
2013-08-21 16:15:14gvanrossumsetmessages: + msg195801
2013-08-21 09:37:26vstinnersetmessages: + msg195760
2013-08-21 07:09:50neologixsetmessages: + msg195749
2013-08-20 22:37:44pitrousetmessages: + msg195726
2013-08-20 22:32:21gvanrossumsetmessages: + msg195724
2013-08-20 22:21:31vstinnersetnosy: + vstinner
messages: + msg195722
2013-08-06 23:55:39gvanrossumsetmessages: + msg194593
2013-08-06 23:33:16gvanrossumsetmessages: + msg194592
2013-08-06 23:01:41gvanrossumsetmessages: + msg194590
2013-08-03 10:22:39neologixsetmessages: + msg194243
2013-08-03 04:21:36gvanrossumsetmessages: + msg194236
2013-08-02 22:47:55neologixsetfiles: + selector-12.diff

messages: + msg194218
2013-08-02 17:59:46neologixsetfiles: + selector-11.diff, tulip_selector_stdlib.diff

messages: + msg194190
2013-08-02 16:22:38pitrousetmessages: + msg194184
2013-08-02 16:16:14gvanrossumsetmessages: + msg194182
2013-08-02 15:16:53neologixsetkeywords: + needs review
files: + selector-9.diff
messages: + msg194179
2013-04-12 11:13:11neologixsetmessages: + msg186617
2013-04-08 10:07:12pitrousetmessages: + msg186286
2013-01-09 21:45:04gvanrossumsetmessages: + msg179491
2013-01-09 21:43:32gvanrossumsetmessages: + msg179490
2013-01-09 21:28:28gvanrossumsetmessages: + msg179489
2013-01-09 20:47:31neologixsetmessages: + msg179486
2013-01-09 18:04:46gvanrossumsetmessages: + msg179481
2013-01-09 08:02:21neologixsetmessages: + msg179423
2013-01-08 23:13:55sbtsetmessages: + msg179392
2013-01-08 23:08:37gvanrossumsetmessages: + msg179389
2013-01-08 22:52:52neologixsetmessages: + msg179387
2013-01-08 22:47:49gvanrossumsetfiles: + return_key_fix.diff

messages: + msg179385
2013-01-08 22:45:27gvanrossumsetmessages: + msg179384
2013-01-08 22:44:20gvanrossumsetmessages: + msg179383
2013-01-08 22:31:38neologixsetfiles: + tulip-selectors-3.diff

messages: + msg179382
2013-01-08 22:22:25gvanrossumsetfiles: + tulip-selectors-3.diff

messages: + msg179381
2013-01-08 18:56:12gvanrossumsetmessages: + msg179365
2013-01-08 10:49:28neologixsetmessages: + msg179328
2013-01-08 00:45:47gvanrossumsetmessages: + msg179300
2013-01-08 00:36:58felipecruzsetmessages: + msg179299
2013-01-07 22:12:52neologixsetfiles: - tulip-selectors-2.diff
2013-01-07 22:12:42neologixsetfiles: - selector-8.diff
2013-01-07 22:12:32neologixsetfiles: + tulip-selectors-2.diff
2013-01-07 22:12:18neologixsetfiles: + selector-8.diff
2013-01-07 19:38:43neologixsetfiles: - selector-data.diff
2013-01-07 19:38:34neologixsetfiles: - selector-3.diff
2013-01-07 19:38:24neologixsetfiles: + tulip-selectors-2.diff
2013-01-07 19:37:44neologixsetfiles: - selector-5.diff
2013-01-07 19:37:19neologixsetfiles: - tulip-selectors-2.diff
2013-01-07 19:33:03neologixsetfiles: + tulip-selectors-2.diff
2013-01-07 19:32:47neologixsetfiles: + selector-8.diff
2013-01-07 19:32:10neologixsetfiles: - selector-6.diff
2013-01-07 19:32:01neologixsetfiles: - selector_poll_events.diff
2013-01-07 19:31:52neologixsetfiles: - tulip-selectors-2.diff
2013-01-07 19:31:42neologixsetfiles: - selector-8.diff
2013-01-07 19:06:37neologixsetfiles: + tulip-selectors-2.diff, selector-8.diff

messages: + msg179275
2013-01-07 10:50:13neologixsetfiles: + selector_poll_events.diff

messages: + msg179255
2013-01-07 09:31:26neologixsetmessages: + msg179252
2013-01-07 03:00:36gvanrossumsetnosy: + sbt
messages: + msg179242
2013-01-07 02:59:13gvanrossumsetmessages: + msg179241
2013-01-07 02:46:45gvanrossumsetfiles: + tulip-selectors-1.diff
hgrepos: + hgrepo167
messages: + msg179240
2013-01-06 21:36:13neologixsetfiles: + selector-6.diff

messages: + msg179231
2013-01-06 00:59:10gvanrossumsetmessages: + msg179161
2013-01-06 00:28:54neologixsetmessages: + msg179156
2013-01-05 21:05:59meador.ingesetnosy: + meador.inge
messages: + msg179144
2013-01-05 20:23:27neologixsetfiles: + selector-5.diff

messages: + msg179142
2013-01-05 19:20:23gvanrossumsetmessages: + msg179135
2013-01-05 16:28:53giampaolo.rodolasetmessages: + msg179129
2013-01-05 15:30:10neologixsetmessages: + msg179127
2013-01-05 15:22:39giampaolo.rodolasetmessages: + msg179126
2013-01-05 14:16:40giampaolo.rodolasetmessages: + msg179122
2013-01-05 12:40:12neologixsetfiles: + selector-data.diff

messages: + msg179120
2013-01-04 21:35:01rosslagerwallsetmessages: + msg179082
2013-01-04 20:34:29neologixsetfiles: - selector-2.diff
2013-01-04 20:34:18neologixsetfiles: + selector-3.diff

messages: + msg179078
2013-01-04 19:57:42neologixsetmessages: + msg179075
2013-01-04 19:53:07rosslagerwallsetmessages: + msg179074
2013-01-04 19:11:39gvanrossumsetmessages: + msg179070
2013-01-04 18:44:06giampaolo.rodolasetmessages: + msg179066
2013-01-04 17:06:20pitrousetmessages: + msg179046
2013-01-04 16:48:11gvanrossumsetmessages: + msg179042
2013-01-04 13:28:04christian.heimessetnosy: + christian.heimes
2013-01-04 13:18:57neologixsetfiles: - selector-1.diff
2013-01-04 13:18:47neologixsetfiles: + selector_bench.py
2013-01-04 13:18:11neologixsetfiles: + selector-2.diff

messages: + msg179018
2013-01-04 01:37:06gvanrossumsetmessages: + msg179001
2013-01-03 22:26:58neologixsetfiles: - selector.diff
2013-01-03 22:26:45neologixsetfiles: + selector-1.diff

messages: + msg178992
2013-01-03 19:57:23felipecruzsetmessages: + msg178981
2013-01-03 19:57:16neologixsetmessages: + msg178980
2013-01-03 19:53:04neologixsetmessages: + msg178979
2013-01-03 19:32:29rosslagerwallsetnosy: + rosslagerwall
2013-01-03 19:18:27pitrousetnosy: + gvanrossum

messages: + msg178974
stage: patch review
2013-01-03 18:44:45felipecruzsetnosy: + felipecruz
messages: + msg178973
2013-01-03 17:38:33neologixsetfiles: + selector_telnetlib.diff
2013-01-03 17:38:18neologixsetfiles: + selector_multiprocessing.diff
2013-01-03 17:38:06neologixcreate