Issue16853
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2013-01-03 17:38 by neologix, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
selector_multiprocessing.diff | neologix, 2013-01-03 17:38 | review | ||
selector_telnetlib.diff | neologix, 2013-01-03 17:38 | review | ||
selector_bench.py | neologix, 2013-01-04 13:18 | |||
tulip-selectors-1.diff | gvanrossum, 2013-01-07 02:46 | Tulip patches | ||
selector-8.diff | neologix, 2013-01-07 22:12 | review | ||
tulip-selectors-2.diff | neologix, 2013-01-07 22:12 | |||
tulip-selectors-3.diff | gvanrossum, 2013-01-08 22:22 | Fixed tulip patch | ||
tulip-selectors-3.diff | neologix, 2013-01-08 22:31 | |||
return_key_fix.diff | gvanrossum, 2013-01-08 22:47 | |||
selector-9.diff | neologix, 2013-08-02 15:16 | review | ||
selector-11.diff | neologix, 2013-08-02 17:59 | review | ||
tulip_selector_stdlib.diff | neologix, 2013-08-02 17:59 | |||
selector-12.diff | neologix, 2013-08-02 22:47 | review | ||
selectors-14.diff | neologix, 2013-08-28 18:33 | review | ||
selectors-15.diff | neologix, 2013-08-31 17:06 | review |
Repositories containing patches | |||
---|---|---|---|
https://tulip.googlecode.com/hg/ |
Messages (106) | |||
---|---|---|---|
msg178970 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-01-03 17:38 | |
Recently, the multiprocessing and telnetlib modules have been patched to use poll() instead of select() when available (mainly for the FD_SETSIZE limitation): http://bugs.python.org/issue10527 http://bugs.python.org/issue14635 This leads to code duplication (just have a look at the commits to convince yourself), and also it highlights the fact that the select module is too low-level: right now, there's no easy way to do portable and efficient I/O multiplexing. What's more, /dev/poll and epoll() support have been added recently, which can make one puzzled by the myriad of available options. Therefore, to reduce code duplication, and make writing portable and efficient code easier, I suggest the addition of a new select.Selector() class, which offers a - hopefully - simple and consistent API. It has the following advantages: - portable - consistent (select() accepts and returns file objects, whereas pol() only returns FD, the timeouts don't have the same units - seconds/milliseconds, etc). You'll find a tentative implementation attached, with tests (but no docs). I'm also attaching two patches, one for multiprocessing.connection and the other for telnetlib, to show the before/after. I'd like to have your feedback, especially on those points: - right now, Selector.select() returns a set of (fd, event mask). What do you think, would a bare list be OK? - the current version doesn't retry on EINTR, and I'd like to catch it by default: what do you think - since I use an abstract base class, I don't know how to have docstrings show up in the help |
|||
msg178973 - (view) | Author: Felipe Cruz (felipecruz) * | Date: 2013-01-03 18:44 | |
I think you have a point. Did you know about the tulip project? http://code.google.com/p/tulip/source/browse/tulip/unix_events.py#76 It has a PollsterBase class and a SelectPollster(PollsterBase) so the idea is to have a Poller(and you call poll()) but select can be used underneath. Since tulip will be merged to stdlib, maybe you can work on tulip itself. Current tulip Pollers don't return (fd, event) but I have a fork that does for Poll, EPoll, KQueue and WindowsPollster: https://bitbucket.org/felipecruz/tulip/ The selector class doesn't have this change (return fd, mask tuple) right now. |
|||
msg178974 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-01-03 19:18 | |
tulip is much bigger than simply a select / poll wrapper. It would probably make more sense for tulip to reuse the abstraction which is proposed here. |
|||
msg178979 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-01-03 19:53 | |
> I think you have a point. Did you know about the tulip project? > > http://code.google.com/p/tulip/source/browse/tulip/unix_events.py#76 > > It has a PollsterBase class and a SelectPollster(PollsterBase) so the idea is to have a Poller(and you call poll()) but select can be used underneath. Since tulip will be merged to stdlib, maybe you can work on tulip itself. I've had a look, and indeed it has its own Pollster wrapper. Since it can be useful outside of Tulip, it would probably make sense to make it part of the expose API. > Current tulip Pollers don't return (fd, event) but I have a fork that does for Poll, EPoll, KQueue and WindowsPollster Indeed, I think it would be useful. I'm not sure what's best, accept events masks or have distinct methods to register read and write events. |
|||
msg178980 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-01-03 19:57 | |
Oh, I noticed that EpollPollster never closes the underlying epoll FD: that's why Pollster objects should have a close() method (and probably support a context manager). |
|||
msg178981 - (view) | Author: Felipe Cruz (felipecruz) * | Date: 2013-01-03 19:57 | |
Hi Antonie, What you said also makes sense to me. There is one problem(?) that _map_events() is called for every event(for Poll and EPoll) and this can make things slower (I didn't tested it). Also, does it needs to be thread-safe? |
|||
msg178992 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-01-03 22:26 | |
Here's a new version closer to Tulip's one. |
|||
msg179001 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2013-01-04 01:37 | |
I think this is a great idea. But let's wait until Tulip is a bit further along; the design of its Pollster might still change as PEP 3156 gets more review and feedback. According to PEP 429 the first beta of 3.4 won't go out until November this year. |
|||
msg179018 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-01-04 13:18 | |
Here's a new version adressing Guido's comments (except for kqueue, for which I'll add support later when I can test it). I'm also attaching a benchmark to compare the implementations: as noted by Guido, the complexity of select/poll/epoll are the theoretical ones: in real life, the syscall's cost is probably dwarfed by objects creation, etc. Here's a run, for two ready FDs among 1018: """ $ ./python ~/selector_bench.py -r 2 -m 1018 -t socket Trying with 2 ready FDs out of 1018, type socket <class 'select.EpollSelector'> 0.056010190999586484 <class 'select.PollSelector'> 0.2639519829990604 <class 'select.SelectSelector'> 1.1859817369986558 """ So this can be interesting when a large number of FDs are monitored. For sake of cmpleteness, for a sparse allocation (i.e. just a couple FDS allocated near FD_SETSIZE), there's not much gain: """ $ ./python ~/selector_bench.py -r 2 -m 1018 -t socket -s Trying with 2 FDs starting at 1018, type socket <class 'select.EpollSelector'> 0.06651040699944133 <class 'select.PollSelector'> 0.06033727799876942 <class 'select.SelectSelector'> 0.0948788189998595 """ Two points I'm not sure about: - should EINTR be handled (i.e. retry, with an updated timeout). I'm tempted to say yes, because EINTR is just a pain the user should never be exposed with. - what should be done with POLLNVAL and POLLERR? Raise an exception (that's what Java does, but since you can get quite quite easily it would be a pain to use), return a generic SELECT_ERR event? FWIW, twisted returns POLLERR|POLLNVAL as a CONNECTION_LOST event. |
|||
msg179042 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2013-01-04 16:48 | |
Would it make sense to explore this in the Tulip project first? It could be a new module, tulip/selector.py. (Heck, I'll even give you commit privileges in Tulip.) Also, I've heard (but don't know from personal experience) that Jython supports select but it takes only lists of socket objects, not file descriptors (which don't exist in Java). Finally, what about Windows? |
|||
msg179046 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-01-04 17:06 | |
I've posted a review on Rietveld. Not sure the notification e-mail was sent since I got a weird response from the server. > should EINTR be handled (i.e. retry, with an updated timeout). I'm tempted to say > yes, because EINTR is just a pain the user should never be exposed with. You could add an optional argument to the select() method? > what should be done with POLLNVAL and POLLERR? Raise an exception (that's what Java > does, but since you can get quite quite easily it would be a pain to use), return a > generic SELECT_ERR event? I would say return SELECT_ERR. > So this can be interesting when a large number of FDs are monitored. The difference is impressive indeed. |
|||
msg179066 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * | Date: 2013-01-04 18:44 | |
Is this currently being written from scratch? It shouldn't be IMO. Instead I recommend using the existent Tornado or pyftpdlib IO loop as an example: https://github.com/facebook/tornado/tree/master/tornado/platform https://github.com/facebook/tornado/blob/master/tornado/ioloop.py#L544 http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/ioloop.py?spec=svn1144&r=1135#266 ...and modify the central polling method so that it returns a list of (fd, events) instead of calling handler methods. Everything (except Windows support) is already in place and they have been tested extensively. Exploring this into Tulip would be important IMO, as it would help figuring out whether the API is good enough to support what this component aims to achieve: being a generic poller upon which to build async IO loops. |
|||
msg179070 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2013-01-04 19:11 | |
Giampaolo, there are many details in the Tornado and pyftpdlib I/O loops that may or may not be intentional. I wrote the Tulip pollster from scratch but am looking to merge in relevant knowledge and features from Tornado, Twisted and pyftpdlib. AFAIK the code here is mostly adapted from Tulip. Please suggest specific features from Tornado or pyftpdlib that are currently missing. However I don't think we should just copy over the existing 3rd party code (if only because we're trying to write pure Python 3.3 code here, without historical baggage). |
|||
msg179074 - (view) | Author: Ross Lagerwall (rosslagerwall) | Date: 2013-01-04 19:53 | |
Interesting benchmark. There is no gain for epoll with a large number of ready fds (as expected) but at least it is no worse than poll. Poll offers a large improvement on select, in this case. $ ./python selector_bench.py -r 2 -m 1000 -t pipe Trying with 2 ready FDs out of 1000, type pipe <class 'select.EpollSelector'> 0.004317363000154728 <class 'select.PollSelector'> 0.08897221900042496 <class 'select.SelectSelector'> 0.1558117479999055 $ ./python selector_bench.py -r 100 -m 1000 -t pipe Trying with 100 ready FDs out of 1000, type pipe <class 'select.EpollSelector'> 0.11067074099992169 <class 'select.PollSelector'> 0.18777027299984184 <class 'select.SelectSelector'> 0.30157805000089866 $ ./python selector_bench.py -r 1000 -m 1000 -t pipe Trying with 1000 ready FDs out of 1000, type pipe <class 'select.EpollSelector'> 1.08963231099915 <class 'select.PollSelector'> 1.111169247998987 <class 'select.SelectSelector'> 8.283167362000313 |
|||
msg179075 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-01-04 19:57 | |
Indeed, tornado and pyftpdlib implementations are much more complex, have somewhat different APIs (i.e. only support FDs), have special cases for older python versions, lack some newer features (e.g. context manager). The current code is much close to Tulip's. > Exploring this into Tulip would be important IMO, as it would help figuring out whether the API is good enough to support what this component aims to achieve: being a generic poller upon which to build async IO loops. That's indeed a good idea, there's no need to rush this in. |
|||
msg179078 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-01-04 20:34 | |
Ross, the select() result for a large number of ready FDs was completely skewed because of a bug spotted by Antoine (complexity was much worse than it ought to be). Here are the results with the fix: """ $ ./python ~/selector_bench.py -r 10 -m 1000 -t socket Trying with 10 ready FDs out of 1000, type socket <class 'select.EpollSelector'> 0.05156186099338811 <class 'select.PollSelector'> 0.23772043800272513 <class 'select.SelectSelector'> 0.5181516080047004 $ ./python ~/selector_bench.py -r 100 -m 1000 -t socket Trying with 100 ready FDs out of 1000, type socket <class 'select.EpollSelector'> 0.47576940699946135 <class 'select.PollSelector'> 0.6458770600002026 <class 'select.SelectSelector'> 0.828609222000523 $ ./python ~/selector_bench.py -r 1000 -m 1000 -t socket Trying with 1000 ready FDs out of 1000, type socket <class 'select.EpollSelector'> 4.970445963997918 <class 'select.PollSelector'> 5.7709292660001665 <class 'select.SelectSelector'> 4.030775418999838 """ With a large number of FDs, many of which are ready, select can be faster. Here's the output of "strace -c -e select,poll,epoll_wait": """ % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 83.86 0.000421 0 1024 epoll_wait 16.14 0.000081 0 1024 poll 0.00 0.000000 0 1024 select ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000502 3072 total """ The systime is greater in epoll, but the systime is negligible, as show by 'time': """ real 0m14.856s user 0m14.289s sys 0m0.340s """ The time is really spent in the interpreter. I'll dig some more (even though I doubt having 1000/1000 ready FDs is a common use case). Note that it doesn't change anything when a small number of FDs are ready: """ $ ./python ~/selector_bench.py -r 10 -m 1000 -t socket Trying with 10 ready FDs out of 1000, type socket <class 'select.EpollSelector'> 0.05238822099636309 <class 'select.PollSelector'> 0.25595822899776977 <class 'select.SelectSelector'> 0.5156362060006359 """ |
|||
msg179082 - (view) | Author: Ross Lagerwall (rosslagerwall) | Date: 2013-01-04 21:35 | |
> Ross, the select() result for a large number of ready FDs was > completely skewed because of a bug spotted by Antoine (complexity > was much worse than it ought to be). Ah, that makes a little more sense now. |
|||
msg179120 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-01-05 12:40 | |
Here's a new version, with the following changes: - there's no SELECT_ERR anymore: error conditions (POLLNVAL|POLLERR and also POLLHUP) are reported as SELECT_IN|SELECT_OUT, depending on the input event mask: I don't think that a separate error flag is needed, because the error will be reported when the FD will be read/written (and in the context where the error can be handled appropriately, which wouldn't be the case if an exception was thrown by the Selector). That's what Java and libevent do, for example. libevent reports IN|OUT inconditionally in case of POLLER|POLLHUP, but I think that the output events should be a subset ot input events. - the exceptions set isn't passed to select() anymore: exception events (like OOB) are already reported in the read or write sets (checked with libevent and in the Linux kernel source) - most importantly, there was something missing/annoying in the API: now, it only accepts only file descriptors, and also an optional opaque object: the object - called "attached data" - can be used for example to pass a callback (that can be invoked by a higher-level reactor), or for example context-specific data: it could be a kind of a thread state to make it easy to do a context-switch, or simply a high-level object like a file, a socket or a telnet instance. This is actually what Java does, but they take it further into encapsulating the FD, interested events, attached data and ready events into a SelectionKey, which is passed and returned to/from the Selector: http://docs.oracle.com/javase/7/docs/api/java/nio/channels/SelectionKey.html - I've added some tests |
|||
msg179122 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * | Date: 2013-01-05 14:16 | |
@Guido: agreed. > there's no SELECT_ERR anymore [...] the error will be reported when the FD will be read/written I don't think this is a good idea, particularly during this early stage. This assumption might be valid for select() but not for poll() and epoll() where (POLLERR | POLLHUP | POLLNVAL) is an alias for "connection lost", and that's an event which both Twisted and Tornado take into account and treat differently than a pure "read" event. The same assumption should also apply for kqueue(). > the exceptions set isn't passed to select() anymore: exception events (like OOB) are already reported in the read or write sets That's true only if SO_OOBINLINE has been pre-emptively set against the socket. It might be ok to ignore select()'s exceptional events anyway though. This is probably another thing you might want to decide later. > there was something missing/annoying in the API [...] now, it only accepts only file descriptors, and also an optional opaque 'data' object I like this change as it avoids maintaining a separate fd->handler map. Will add line-by-line comments by using the review tool. |
|||
msg179126 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * | Date: 2013-01-05 15:22 | |
Inline comments: http://bugs.python.org/review/16853/ |
|||
msg179127 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-01-05 15:30 | |
>> there's no SELECT_ERR anymore [...] the error will be reported when the FD will be read/written > > I don't think this is a good idea, particularly during this early stage. > This assumption might be valid for select() but not for poll() and epoll() where (POLLERR | POLLHUP | POLLNVAL) is an alias for "connection lost", and that's an event which both Twisted and Tornado take into account and treat differently than a pure "read" event. POLLERR and POLLNVAL semantics can be really different from "connection lost", so this mapping sounds wrong. The actual error (which may be ECONNRESET, but also EBADF, ENOBUFS...) will be raised upon the next call to read()/write(), or can be retrieved through getsockopt(SO_ERROR). Also, that's exactly what the Linux kernel does for select(): http://lxr.free-electrons.com/source/fs/select.c#L381 POLLHUP is mapped to the read-set, POLLERR both to the read and write set. This means that "connection lost" could never be returned on platforms where the select() syscall is used, which could lead to subtle portability problems. But I'm definitely open on this (I've based the behavior on libevent and Java, FWIW). >> the exceptions set isn't passed to select() anymore: exception events (like OOB) are already reported in the read or write sets > > That's true only if SO_OOBINLINE has been pre-emptively set against the socket. > It might be ok to ignore select()'s exceptional events anyway though. Thanks, I didn't know: since the POSIX man pages was rather evasive on this, I checked the Linux kernel source code, but apparently it's non-standard. If we do ignore OOB in select(), then we probably also want to suppress POLLPRI|EPOLLPRI, no? |
|||
msg179129 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * | Date: 2013-01-05 16:28 | |
> The actual error (which may be ECONNRESET, but also EBADF, ENOBUFS...) will be raised > upon the next call to read()/write(), or can be retrieved through getsockopt(SO_ERROR). Mmmm... I would be tempted to think the same but the fact that both Tornado and Twisted distinguish between READ and ERROR events is suspicious. > then we probably also want to suppress POLLPRI|EPOLLPRI, no? Yes. |
|||
msg179135 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2013-01-05 19:20 | |
I think that this needs extensive tests that verify the behavior of many end cases, including under duress (e.g. when there are too many connections for the kernel to handle). That would seem the only way to make sure that the code is reliable across platforms. It is likely that you could borrow some ideas for test scenarios from Twisted. |
|||
msg179142 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-01-05 20:23 | |
> I think that this needs extensive tests that verify the behavior of many end cases, including under duress (e.g. when there are too many connections for the kernel to handle). That would seem the only way to make sure that the code is reliable across platforms. It is likely that you could borrow some ideas for test scenarios from Twisted. Will do. I'm adding a new version taking into account some of Giampaolo's remarks. Also, the API now also allows passing a file descriptor or any object with a `fileno()` method, since it will likely be useful. To sum up, the API is: def register(self, fileobj, events, data=None): """Register a file object. Parameters: fileobj -- file object events -- events to monitor (bitwise mask of SELECT_IN|SELECT_OUT) data -- attached data """ def unregister(self, fileobj): """Unregister a file object. Parameters: fileobj -- file object """ def modify(self, fileobj, events, data=None): """Change a registered file object monitored events or attached data. Parameters: fileobj -- file object events -- events to monitor (bitwise mask of SELECT_IN|SELECT_OUT) data -- attached data """ def select(self, timeout=None): """Perform the actual selection, until some monitored file objects are ready or a timeout expires. Parameters: timeout -- if timeout > 0, this specifies the maximum wait time, in seconds if timeout == 0, the select() call won't block, and will report the currently ready file objects if timeout is None, select() will block until a monitored file object becomes ready Returns: list of (fileobj, events, attached data) for ready file objects `events` is a bitwise mask of SELECT_IN|SELECT_OUT Selector.select() output looks a lot like poll()/epoll() except for two details: the output is the file object, and not the file descriptor (poll()/epoll() are unfortunately inconsistent in this regard), and there's a third field, the attached data (will be None if not provided in register()/modify()). I think that this optional field is really useful to pass e.g. a callback or some context information. |
|||
msg179144 - (view) | Author: Meador Inge (meador.inge) * | Date: 2013-01-05 21:05 | |
The EpollSelector and PollSelector implementations are basically identical. You might want to refactor these to share more code. |
|||
msg179156 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-01-06 00:28 | |
I've noticed a bug present in Tulip, pyftpdlib and tornado (and this implementation, too): none of them passes the maxevents argument to epoll.poll(), and by default the poll() wrapper in the socket module uses FD_SETSIZE-1, so you'll never get more than FD_SETSIZE-1 events. Note that the default value used by epoll() is probably too low (we could maybe use an heuristic to use RLIMIT_NOFILE hard limit, with capping because it can be quite large on FreeBSD or the Hurd ;-), or at least the documentation doesn't warn users enough about this. The signature reads: """ .. method:: epoll.poll(timeout=-1, maxevents=-1) Wait for events. timeout in seconds (float) """ Which confuses users into thinking that the number of returned events is unlimited by default. > The EpollSelector and PollSelector implementations are basically identical. You might want to refactor these to share more code. Yes, but I find the current code much easier to read: epoll() and poll() have different constants, different units for timeout, epoll() must accept a maxevents argument, have a close() method, etc. |
|||
msg179161 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2013-01-06 00:59 | |
I am trying to use this module in Tulip instead of its pollster implementation, and am running into a problem. Tulip defines separate add_reader()/add_writer() methods, which call to the pollster's register_reader()/register_writer(), respectively. The translation of this to the selector's API is somewhat complex; I would have to keep track of whether I already have a reader or writer registered, and then decide whether to call register() or modify(). If you don't want to change the API back to separate register_*() methods for readers and writers, perhaps you can add a method that tells me, for a given fileobj, whether it is registered, and with which poll flags (SELECT_IN/OUT/both) and the user data? Also, I need a method that gives me the number of registered FDs. |
|||
msg179231 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-01-06 21:36 | |
> If you don't want to change the API back to separate register_*() methods for readers and writers, perhaps you can add a method that tells me, for a given fileobj, whether it is registered, and with which poll flags (SELECT_IN/OUT/both) and the user data. > > Also, I need a method that gives me the number of registered FDs. Does that look OK? def get_info(self, fileobj): """Return information about a registered file object. Returns: (events, data) associated to this file object Raises KeyError if the file object is not registered. """ def registered_count(self): """Return the number of registered file objects. Returns: number of currently registered file objects """ |
|||
msg179240 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2013-01-07 02:46 | |
This seems okay. I am attaching the changes I had to make to Tulip to support this. However, two Tulip tests are now failing: - tulip.events_test.PollEventLoopTests.testCreateSslTransport fails with spurious file descriptors returned by poll() that aren't in the _fd_to_key dict (but the corresponding test with Select passes) - test_sock_client_fail() hangs completely. Can you see why? The first failure has this traceback: Traceback (most recent call last): File "/Users/guido/tulip/tulip/selectors.py", line 178, in _key_from_fd return self._fd_to_key[fd] KeyError: 0 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/guido/tulip/tulip/events_test.py", line 216, in testCreateSslTra\ nsport el.run() File "/Users/guido/tulip/tulip/unix_events.py", line 120, in run self._run_once() File "/Users/guido/tulip/tulip/unix_events.py", line 592, in _run_once event_list = self._selector.select(timeout) File "/Users/guido/tulip/tulip/selectors.py", line 255, in select key = self._key_from_fd(fd) File "/Users/guido/tulip/tulip/selectors.py", line 180, in _key_from_fd raise RuntimeError("No key found for fd {}".format(fd)) RuntimeError: No key found for fd 0 (But the fd value varies -- sometimes it is -2, sometimes a large number.) The other test busy-loops (keeps polling) and I have no useful traceback. Also notice the need for a third constant, SELECT_CONNECT. For details see the class WindowsPollPollster in the Tulip code. |
|||
msg179241 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2013-01-07 02:59 | |
Maybe I should explain the need for SELECT_CONNECT more. This is necessary so that on Windows the PollSelector can use WSAPoll(). The reason is that for async connect() calls, WSAPoll() doesn't return the FD as writable, just as having an exception. I didn't write the code (Richard Oudkerk did), and I've never run it (no access to Windows), but the workaround for the problem with WSAPoll() is apparently quite complex: http://code.google.com/p/tulip/source/browse/tulip/unix_events.py#236 The long and short of it is that portable code must use add_connector() and remove_connector() instead of add_writer() and remove_writer() for async connect() calls. So these need to be mapped to something different in the selector world. The right thing would seem to add a separate flag, SELECT_CONNECT, with a value distinct from SELECT_IN and SELECT_OUT. But I didn't write the code for that -- I made a hackish change so that the selector code exports SELECT_CONNECT as an alias for SELECT_OUT. I did write code for add_connector() and remove_connector() but these may have to be modified slightly again later once the selector code has SELECT_OUT != SELECT_CONNECT. Please do try to understand this -- it would not be possible to support WSAPoll() at all without distinguishing between write and connect readyness. Feel free to ping Richard Oudkerk (shibturn at gmail.com) for more information. I don't *think* the problems are in my code, since both tests pass when the SelectSelector is used. I can currently only test on OSX 10.7 (Mountain Lion); hopefully I have more platforms after tomorrow (my first day at the new job). |
|||
msg179242 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2013-01-07 03:00 | |
(Adding Richard Oudkerk to the nosy list as I am pleading here for support of WSAPoll(), see last few messages.) |
|||
msg179252 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-01-07 09:31 | |
> Also notice the need for a third constant, SELECT_CONNECT. For details see the class WindowsPollPollster in the Tulip code. I'll trust Richard on all Windows matter, so if you need a SELECT_CONNECT constant, I'll expose one. > However, two Tulip tests are now failing: I can't reproduce those failures on Linux (I had to adapt the hostnames because I'm at work and my test machine can't access the internet, but otherwise everything is the same). All tests pass with select, poll and epoll. Note that I had to fix a typo in the patch: EPollSelector -> EpollSelector. > - tulip.events_test.PollEventLoopTests.testCreateSslTransport fails with spurious file descriptors returned by poll() that aren't in the _fd_to_key dict (but the corresponding test with Select passes) > > The first failure has this traceback: > > Traceback (most recent call last): > File "/Users/guido/tulip/tulip/selectors.py", line 178, in _key_from_fd > return self._fd_to_key[fd] > KeyError: 0 > > During handling of the above exception, another exception occurred: > > Traceback (most recent call last): > File "/Users/guido/tulip/tulip/events_test.py", line 216, in testCreateSslTra\ > nsport > el.run() > File "/Users/guido/tulip/tulip/unix_events.py", line 120, in run > self._run_once() > File "/Users/guido/tulip/tulip/unix_events.py", line 592, in _run_once > event_list = self._selector.select(timeout) > File "/Users/guido/tulip/tulip/selectors.py", line 255, in select > key = self._key_from_fd(fd) > File "/Users/guido/tulip/tulip/selectors.py", line 180, in _key_from_fd > raise RuntimeError("No key found for fd {}".format(fd)) > RuntimeError: No key found for fd 0 > > (But the fd value varies -- sometimes it is -2, sometimes a large number.) This failure is really weird, because the file descriptor is really just the value returned by poll.poll(): I don't know how this could possibly ever be negative, unless some sort of overflow in the poll.poll() code itself? I can however see why the previous version wouldn't fail: """ for fd, flags in self._poll.poll(msecs): if flags & ~select.POLLOUT: if fd in self.readers: events.append(self.readers[fd]) if flags & ~select.POLLIN: if fd in self.writers: events.append(self.writers[fd]) """ If a spurious fd is reported, it's silently ignored. The second test runs fine on Linux, and from a cursory look, I don't see how it could fail (the socket should be reported as write ready upon ECONNREFUSED). I'll try running the exact test case with the same hostnames from home, but I doubt it'll make a difference, so maybe it's really OS-X specific (if there's strace/dtrace on OS-X, it'll help see what's going on). |
|||
msg179255 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-01-07 10:50 | |
> The second test runs fine on Linux, and from a cursory look, I don't > see how it could fail (the socket should be reported as write ready > upon ECONNREFUSED). Hum, thinking about it, I wonder is OS-X doesn't report POLLPRI or some other esoteric event in case of ECONNREFUSED... Could you try the patch attached? |
|||
msg179275 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-01-07 19:06 | |
It works fine on Linux. I'm attaching the latest version that should hopefully fix the second failure, as a standalone diff and a diff for Tulip. It also adds a SELECT_CONNECT, and - untested - kqueue support. |
|||
msg179299 - (view) | Author: Felipe Cruz (felipecruz) * | Date: 2013-01-08 00:36 | |
Hi.. 2 comments related to Kqueue/OSX(16.8) 1 - In tulip/selectors.py L311 and L314 - is key.fd not fd 2 - In EventLoopTestsMixin::test_writer_callback if the writer socket isn't non-blocking, the test hangs forever.. 3 - Errors: ERROR: testCreateSslTransport (tulip.events_test.PollEventLoopTests) File "/Users/felipecruz/Projects/tulip3/tulip/selectors.py", line 180, in _key_from_fd raise RuntimeError("No key found for fd {}".format(fd)) RuntimeError: No key found for fd -2 ERROR: test_sock_client_ops (tulip.events_test.KqueueEventLoopTests) File "/Users/felipecruz/Projects/tulip3/tulip/selectors.py", line 180, in _key_from_fd raise RuntimeError("No key found for fd {}".format(fd)) RuntimeError: No key found for fd 77 |
|||
msg179300 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2013-01-08 00:45 | |
There's a bug in the kqueue selector. The following code in your patch has 'fd' instead of 'key.fd', twice: if events & SELECT_IN: kev = kevent(key.fd, KQ_FILTER_READ, KQ_EV_ADD) self._kqueue.control([kev], 0, 0) if events & SELECT_OUT: kev = kevent(key.fd, KQ_FILTER_WRITE, KQ_EV_ADD) self._kqueue.control([kev], 0, 0) One kqueue test fails on OSX: ERROR: test_sock_client_ops (tulip.events_test.KqueueEventLoopTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/guido/tulip/tulip/selectors.py", line 178, in _key_from_fd return self._fd_to_key[fd] KeyError: 11 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/guido/tulip/tulip/events_test.py", line 167, in test_sock_client_ops el.run_until_complete(el.sock_sendall(sock, b'GET / HTTP/1.0\r\n\r\n')) File "/Users/guido/tulip/tulip/unix_events.py", line 146, in run_until_complete self.run() File "/Users/guido/tulip/tulip/unix_events.py", line 110, in run self._run_once() File "/Users/guido/tulip/tulip/unix_events.py", line 582, in _run_once event_list = self._selector.select(timeout) File "/Users/guido/tulip/tulip/selectors.py", line 329, in select key = self._key_from_fd(fd) File "/Users/guido/tulip/tulip/selectors.py", line 180, in _key_from_fd raise RuntimeError("No key found for fd {}".format(fd)) RuntimeError: No key found for fd 11 |
|||
msg179328 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-01-08 10:49 | |
So I assume that the second failure is fixed, which means that OS-X returns a funny event (and not POLLIN/POLLHUP/POLLERR) in case of ECONNREFUSED :-( > 2 - In EventLoopTestsMixin::test_writer_callback if the writer socket isn't non-blocking, the test hangs forever.. This test is buggy: """ def test_writer_callback(self): el = events.get_event_loop() r, w = unix_events.socketpair() el.add_writer(w.fileno(), w.send, b'x'*100) el.call_later(0.1, el.remove_writer, w.fileno()) el.run() w.close() """ Because the writer socket isn't non-blocking, if the output socket buffer fills up in less than 0.1 seconds, the call to w.send(b'x'*100) will block. select()/poll()/kqueue() use a watermark to decide whether the FD is readable/writable: for a Unix domain socket, I guess OS-X returns that the socket is writable is there's at least one byte free in the output socket buffer: since send() tries to write 100 bytes at once, it blocks. I can reproduce the hang on Linux with vanilla (unpatched) tulip by increasing the number of bytes sent() in one syscall: """ --- tulip-bf4cb136c121/tulip/events_test.py 2092-08-05 00:00:00.000000000 +0200 +++ tulip/tulip/events_test.py 2013-01-08 11:35:27.400198000 +0100 @@ -149,7 +149,7 @@ def test_writer_callback(self): el = events.get_event_loop() r, w = unix_events.socketpair() - el.add_writer(w.fileno(), w.send, b'x'*100) + el.add_writer(w.fileno(), w.send, b'x'*(1<<18)) el.call_later(0.1, el.remove_writer, w.fileno()) el.run() w.close() """ (I have to do that because Linux uses an adaptive watermark, see http://lxr.free-electrons.com/source/net/unix/af_unix.c#L2156 and http://lxr.free-electrons.com/source/net/unix/af_unix.c#L319 ). That's why all FDs monitored with select()/poll()/epoll() must be non-blocking (and there's also the risk of spurious wakeups...). > 3 - Errors: > > ERROR: testCreateSslTransport (tulip.events_test.PollEventLoopTests) > File "/Users/felipecruz/Projects/tulip3/tulip/selectors.py", line 180, in _key_from_fd > raise RuntimeError("No key found for fd {}".format(fd)) > RuntimeError: No key found for fd -2 > > > ERROR: test_sock_client_ops (tulip.events_test.KqueueEventLoopTests) > File "/Users/felipecruz/Projects/tulip3/tulip/selectors.py", line 180, in _key_from_fd > raise RuntimeError("No key found for fd {}".format(fd)) > RuntimeError: No key found for fd 77 Yes, it's the same problem: poll()/kqueue() returning garbage. I guess I'll have to patch Selector to ignore unknown FDs, but that really looks like an OS-X bug. Could you dump the content of the returned kevent when the fd is not found? |
|||
msg179365 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2013-01-08 18:56 | |
Ok, I fixed test_writer_callback() in the tulip repo. Indeed the error I reported first is gone now. I logged the value of kev on failure, and got this output: ERROR:root:kev = <select.kevent ident=83 filter=-2 flags=0x1 fflags=0x0 data=0x131750 udata=0x0> Traceback (most recent call last): File "/Users/guido/tulip/tulip/selectors.py", line 178, in _key_from_fd return self._fd_to_key[fd] KeyError: 83 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/guido/tulip/tulip/selectors.py", line 330, in select key = self._key_from_fd(fd) File "/Users/guido/tulip/tulip/selectors.py", line 180, in _key_from_fd raise RuntimeError("No key found for fd {}".format(fd)) RuntimeError: No key found for fd 83 The full traceback printed by the unittest framework was: ERROR: test_sock_client_ops (tulip.events_test.KqueueEventLoopTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/guido/tulip/tulip/selectors.py", line 178, in _key_from_fd return self._fd_to_key[fd] KeyError: 83 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/guido/tulip/tulip/events_test.py", line 168, in test_sock_client_ops el.run_until_complete(el.sock_sendall(sock, b'GET / HTTP/1.0\r\n\r\n')) File "/Users/guido/tulip/tulip/unix_events.py", line 146, in run_until_complete self.run() File "/Users/guido/tulip/tulip/unix_events.py", line 110, in run self._run_once() File "/Users/guido/tulip/tulip/unix_events.py", line 582, in _run_once event_list = self._selector.select(timeout) File "/Users/guido/tulip/tulip/selectors.py", line 330, in select key = self._key_from_fd(fd) File "/Users/guido/tulip/tulip/selectors.py", line 180, in _key_from_fd raise RuntimeError("No key found for fd {}".format(fd)) RuntimeError: No key found for fd 83 |
|||
msg179381 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2013-01-08 22:22 | |
Figured it out. KqueueSelector was missing the unregister() implementation. Here it is: def unregister(self, fileobj): key = super().unregister(fileobj) mask = 0 if key.events & SELECT_IN: mask |= KQ_FILTER_READ if key.events & SELECT_OUT: mask |= KQ_FILTER_WRITE kev = kevent(key.fd, mask, KQ_EV_DELETE) self._kqueue.control([kev], 0, 0) I also added a __repr__ to the _Key class: def __repr__(self): return '{}<fileobje={}, fd={}, events={:#x}, data={}>'.format( self.__class__.__name__, self.fileobj, self.fd, self.events, self.data) Attached a new patch for tulip (but no new patch for select.py). I think I may have left the debug logging in. I will submit this to the Tulip repo, but we still need the changes for SELECT_CONNECT (I can't really test that, still no Windows access). |
|||
msg179382 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-01-08 22:31 | |
This should fix some kqueue failures (the FDs were not unregistered, which could result in spurious FDs being reported at a later time). For the "negative FDs", all spurious events are now caught and logged (I think it's better than silently ignoring them as it is now). Hopefully we'll see an EV_ERROR with kev.data set to a sensible errno. The implementation should now be more or less complete, apart from the WSAPoll, which I'm unable to add/test. Richard, in Tulip's WSAPoll code, it reads: class WindowsPollPollster(PollPollster): """Pollster implementation using WSAPoll. WSAPoll is only available on Windows Vista and later. Python does not currently support WSAPoll, but there is a patch available at http://bugs.python.org/issue16507. """ Does this means that this code need the patch from issue #16507 to work? Also, I've read something about IOCP: is this a replacement for WSAPoll, are there plans to get it merged at some point to python (and if yes, would the select module be a proper home for this)? |
|||
msg179383 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2013-01-08 22:44 | |
Whoops, the unregister() function needs to return key. |
|||
msg179384 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2013-01-08 22:45 | |
That ought to be added to the docstrings for register()/unregister() in _BaseSelector. |
|||
msg179385 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2013-01-08 22:47 | |
A patch for making register()/unregister() return the key. |
|||
msg179387 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-01-08 22:52 | |
As it is now, _Key is actually an implementation detail, that's why it's not part of the methods signature. Oonly the base register()/unregister() return the key because it's an easy way for the derived classes to get the FD without calling _fileobj_to_fd again (it's not really elegant, I'm open to suggestions). We could make it part of the API though, and maybe return it also from get_info() instead of the (fileobj, data) tuple. |
|||
msg179389 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2013-01-08 23:08 | |
Please consider my patches instead; it seems our patches crossed. Merging is now difficult because I already submitted my version to Tulip. Your version still has a bug: it uses select.kevent(...) twice, where it should just use kevent(...). Also my version makes fewer syscalls when unregistering a FD that has both read and write events registered. Regarding the _Key return value: I think it's asking for trouble if the signature of the base class differs from that of the subclass. The return value may even be useful occasionally. Given that no spurious FD events are now reported by the unittests, I'm not sure that it is useful to log and ignore them; it may be better to have the exception be raised, as it might expose an app bug, and in my experience it usually ends up in an infinite busy-wait loop once it happens. |
|||
msg179392 - (view) | Author: Richard Oudkerk (sbt) * | Date: 2013-01-08 23:13 | |
> Richard, in Tulip's WSAPoll code, it reads: > > class WindowsPollPollster(PollPollster): > """Pollster implementation using WSAPoll. > > WSAPoll is only available on Windows Vista and later. Python > does not currently support WSAPoll, but there is a patch > available at http://bugs.python.org/issue16507. > """ > > Does this means that this code need the patch from issue #16507 to work? Yes. > Also, I've read something about IOCP: is this a replacement for > WSAPoll, are there plans to get it merged at some point to python (and > if yes, would the select module be a proper home for this)? IOCP is not a replacement for WSAPoll (or select). Among other things it is not possible to use the current ssl module with IOCP (although Twisted manages to use IOCP with PyOpenSSL). Tulip should have IOCP support, so presumable when tulip is merged some support for IOCP will also be available in python. But I am not convinced that the select module is the proper home. |
|||
msg179423 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-01-09 08:02 | |
> Please consider my patches instead; it seems our patches crossed. Merging is now difficult because I already submitted my version to Tulip. That's fine, I don't think there's a point into maintaining a standalone patch for now: we can see how it goes with Tulip, flesh out the API and merge it when it's ready (or when Tulip goes in)? > Also my version makes fewer syscalls when unregistering a FD that has both read and write events registered. It did it on purpose, because I wasn't sure whether kqueue ops can accept more than one filter at a time. If that's the case, then you can do the same thing for register(). > Regarding the _Key return value: I think it's asking for trouble if the signature of the base class differs from that of the subclass. The return value may even be useful occasionally. Agreed (but then it might be better to change _Key to Key if it's public). > Given that no spurious FD events are now reported by the unittests, I'm not sure that it is useful to log and ignore them; it may be better to have the exception be raised, as it might expose an app bug, and in my experience it usually ends up in an infinite busy-wait loop once it happens. Sounds good to me. |
|||
msg179481 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2013-01-09 18:04 | |
- How shall we go forward? I've made a variety of small changes to the Tulip version (selector.py) now. Can you work those into a fresh unified patch for CPython 3.4? - I tried making a single combined kqueue call in register(), and it caused the SSL test to hang, so I reverted that and changed my unregister code to make two syscalls as wel. (This makes me think however that KqueueSelector should override modify() to avoid redundant syscalls when flipping only one of the two bits.) - I'll change _Key to Key in the Tulip copy (though I wonder if maybe it should be a longer name -- 'Key' is rather generic). - Are you going to implement the SELECT_CONNECT flag? - Thanks for being so responsive! It's great to be able to factor the selector functionality out of Tulip. - Have you submitted a PSF contributor form yet? (Probably yes, years ago, just making sure. :-) |
|||
msg179486 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-01-09 20:47 | |
> - How shall we go forward? I've made a variety of small changes to the > Tulip version (selector.py) now. Can you work those into a fresh unified > patch for CPython 3.4? Yes, but I think we could wait a little, to make sure the API is good enough to fulfill all tulip's need? There are a couple things I'm not completely happy with: - if we add a CONNECT event, then I think SELECT_IN/SELECT_OUT aren't good names anymore: I think that EVENT_READ/EVENT_WRITE/EVENT_CONNECT would be more consistent (since it embeds the operation that should now succeed on the socket). Maybe even shorter names like EVT_READ? - right now, selectors don't handle EINTR, which means that upon reception of a signal, Selector.select() can fail with EINTR (the original tulip implementation also had this problem). I've been more or less midly advocating to change all Python syscalls wrappers to handle EINTR because it's really just a nuisance, but I think Selector.select() should definitely handle EINTR and retry the syscalls (with an updated timeout). Note: the test testAddSignalHandler() added recently doesn't demonstrate this problem because os.kill(os.getpid(), signal.SIGINT) because the signal is delivered synchronously (when kill() syscall returns to user-space), before select()/poll()/epoll() gets called. > - I'll change _Key to Key in the Tulip copy (though I wonder if maybe it > should be a longer name -- 'Key' is rather generic). SelectorKey, or something along those lines? > - Are you going to implement the SELECT_CONNECT flag? Unfortunately, I don't know anything about Windows, so I won't be able to. But I guess that porting Pollster.WSAPollster to selectors shouldn't be too hard. > - Have you submitted a PSF contributor form yet? (Probably yes, years ago, > just making sure. :-) Actually, I already have commit rights ;-) |
|||
msg179489 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2013-01-09 21:28 | |
On Wed, Jan 9, 2013 at 12:47 PM, Charles-François Natali <report@bugs.python.org> wrote: >> - How shall we go forward? I've made a variety of small changes to the >> Tulip version (selector.py) now. Can you work those into a fresh unified >> patch for CPython 3.4? > > Yes, but I think we could wait a little, to make sure the API is good > enough to fulfill all tulip's need? Sure. I think in terms of API, it is pretty good except for the 'connect' issue below (and the related issue of a WSAPoll() wrapper, which Tulip had before the switch). In terms of implementation, I expect there's plenty to do, but it will be a long time before we'll know. It's probably best to do this incrementally. There's also the question of IOCP support, but I expect that it is too unwieldy to add to the select module -- it'll be a separate project. Hopefully Richard can help with this. > There are a couple things I'm not completely happy with: > - if we add a CONNECT event, then I think SELECT_IN/SELECT_OUT aren't > good names anymore: I think that EVENT_READ/EVENT_WRITE/EVENT_CONNECT > would be more consistent (since it embeds the operation that should > now succeed on the socket). Maybe even shorter names like EVT_READ? I like the longer names, EVENT_READ; they aren't used enough to need abbreviating, and it's always hard to remember exactly what abbreviation is used. > - right now, selectors don't handle EINTR, which means that upon > reception of a signal, Selector.select() can fail with EINTR (the > original tulip implementation also had this problem). I've been more > or less midly advocating to change all Python syscalls wrappers to > handle EINTR because it's really just a nuisance, but I think > Selector.select() should definitely handle EINTR and retry the > syscalls (with an updated timeout). I'm not sure what the pros and cons would be of doing this for all syscall wrappers, but I do know I want Tulip to work with Python 3.3 out of the box, so we need to add this to the select() calls. I imagine it's a bit tricky to test... Maybe I could use alarm() or setitimer() for testing? > Note: the test testAddSignalHandler() added recently doesn't > demonstrate this problem because os.kill(os.getpid(), signal.SIGINT) > because the signal is delivered synchronously (when kill() syscall > returns to user-space), before select()/poll()/epoll() gets called. I know. :-( >> - I'll change _Key to Key in the Tulip copy (though I wonder if maybe it >> should be a longer name -- 'Key' is rather generic). > > SelectorKey, or something along those lines? Sounds good. >> - Are you going to implement the SELECT_CONNECT flag? > > Unfortunately, I don't know anything about Windows, so I won't be able to. > But I guess that porting Pollster.WSAPollster to selectors shouldn't > be too hard. Just check out the last rev of Tulip before I switched: http://code.google.com/p/tulip/source/browse/tulip/unix_events.py?spec=svn28b7efd465b8529cdaa34e6b710f291c835c657b&r=28b7efd465b8529cdaa34e6b710f291c835c657b >> - Have you submitted a PSF contributor form yet? (Probably yes, years ago, >> just making sure. :-) > > Actually, I already have commit rights ;-) Doesn't surprise me. :-) |
|||
msg179490 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2013-01-09 21:43 | |
I've fixed the signal handling issue. Please check Tulip revision 1c4db3d1a688: http://code.google.com/p/tulip/source/detail?r=1c4db3d1a68874dc22c84f9c1c376c5371037f09 |
|||
msg179491 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2013-01-09 21:45 | |
And the next Tulip rev renames Key to SelectorKey. |
|||
msg186286 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-04-08 10:07 | |
If this goes inside the select module, it could probably help issue #17552 (add socket.sendfile()) a bit. |
|||
msg186617 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-04-12 11:13 | |
> If this goes inside the select module, it could probably help issue #17552 (add socket.sendfile()) a bit. Yes, that's exactly the idea. See also the attached patches for telnetlib and multiprocessing (there would probably be other candidates like subprocess, etc). I guess we can merge this when tulip goes in. |
|||
msg194179 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-08-02 15:16 | |
Here's a patch, based on the version in tulip's repo. I've added doc update and tests. I also made the following changes: - BaseSelector is an abstract base class (so one could imagine user code implementing its own selector on top of it) - SelectorKey is a named tuple: I think that's better than a class, because SelectorKeys can be returned to the user (register(), get_info(), etc), and we don't want him to mess with them. Also, we get a nice repr() for free - since SelectorKey is immutable, I added a get_keys() method, which returns all the keys registered. Since it's a dict values, the user can't mess with it This get_keys() method superseedes the registered_count() method, which can be simply replaced with len(selector.get_keys()). We could probably remove it? It also sort of superseedes the get_info() method: I don't remember, why was it needed? Would it be possible to get it into 3.4? |
|||
msg194182 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2013-08-02 16:16 | |
Thanks! We should make sure this makes it into alpha 2. Could you also prepare a patch for Tulip itself? That way I can make sure Tulip works with either its own or the stdlib implementation. One thought: would it be better if the selectors ended up in a separate module rather than extending 'select'? |
|||
msg194184 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-08-02 16:22 | |
> One thought: would it be better if the selectors ended up in a > separate module rather than extending 'select'? IMHO they're fine in select. It's quite the natural place for them. |
|||
msg194190 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-08-02 17:59 | |
> Thanks! We should make sure this makes it into alpha 2. Could you also > prepare a patch for Tulip itself? That way I can make sure Tulip works with > either its own or the stdlib implementation. Sure, here's a patch (tulip_selector_stdlib.diff). It seems to work fine, but I'm getting random errors even with vanilla tulip: """ ====================================================================== FAIL: test_run_until_complete_timeout (events_test.EPollEventLoopTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "tests/events_test.py", line 194, in test_run_until_complete_timeout self.assertTrue(0.009 <= t2-t1 <= 0.018, t2-t1) AssertionError: False is not true : 0.0055361779959639534 """ Looks like the timeouts check may be a bit too tight? I'm also updating the patch a bit: - when a timeout < 0 is passed to select(), it's treated as timeout == 0, i.e. polling. It fixes a common kind of nasty bugs in user code where when calculating timeouts, if you end up passing a negative value, it can either be interpreted as an infinite wait, or can raise EINVAL - previously, upon EINTR (InterruptedError), select() would return an empty list. That can be a problem if the user specified a timeout, because you end up returning too early. Thanks to a decorator, select() now retries upon EINTR, re-calculating the timeout. I also added a test for this. I'm pretty happy with the state now, so I think it's ready for review. There are a couple points I'd like more specific feedback: - is the registered_count() method still useful, now that get_keys() can return all the keys? - same thing for get_info(), it feels redundant - the documentation ;-) |
|||
msg194218 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-08-02 22:47 | |
Version addressing Antoine and Christian's comments. |
|||
msg194236 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2013-08-03 04:21 | |
No time for a full review yet, but a comment on the timeout test failure you are observing in Tulip. This particular test schedules a task that sleeps for 20 msecs, and first calls run_until_complete() with a timeout of 10 msecs, and then calls it again without a timeout. It checks that the first call takes approximately 10 msec and the second call too. But for some reason it is okay if the first call takes quite a bit longer (up to 18 msec) and then the second check is wrong. Can you try again with the failing assert replaced with this? self.assertTrue(0.018 <= t2-t0 <= 0.028, t2-t0) That should be a better way to check that code works. |
|||
msg194243 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-08-03 10:22 | |
> Guido van Rossum added the comment: > Can you try again with the failing assert replaced with this? > > self.assertTrue(0.018 <= t2-t0 <= 0.028, t2-t0) > > That should be a better way to check that code works. I'm still getting - less frequent - failures: """ ====================================================================== FAIL: test_run_until_complete_timeout (events_test.SelectEventLoopTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "tests/events_test.py", line 194, in test_run_until_complete_timeout self.assertTrue(0.018 <= t2-t0 <= 0.028, t2-t0) AssertionError: False is not true : 0.029771103999337356 ---------------------------------------------------------------------- """ Looking at strace output: 11:00:47.383145 select(4, [3], [], [], {0, 9765}) = 0 (Timeout) <0.015713> select() takes an extra 5ms (15ms instead of 10ms). 5ms is quite low for a GPOS (the typical quantum is around 4ms for 250Hz timer, and on my machine I have high-resolution timers configured but probably a crappy hardware). I'd suggest increasing the timeouts (especially when Tulip gets merged, it'll likely fail on many buildbots). |
|||
msg194590 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2013-08-06 23:01 | |
Yeah, the timeouts are a bad idea anyway. I've filed http://code.google.com/p/tulip/issues/detail?id=49 to remind us to do something about this. I'll review your tulip patch next. |
|||
msg194592 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2013-08-06 23:33 | |
Actually there is a misunderstanding underlying you Tulip patch. Tulip needs to support Python 3.3 as well as Python 3.4, so it needs to do a little dance trying to import BaseSelector (or some other key class) from select and if that fails fall back to its own selectors.py. Since you aren't changing the BaseSelector API at all this is trivial and I can take care of it. |
|||
msg194593 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2013-08-06 23:55 | |
Sorry, replying to some direct questions in the bug: > I also made the following changes: > - BaseSelector is an abstract base class (so one could imagine user code > implementing its own selector on top of it) Fine (though I had to work around this for some tulip tests). > - SelectorKey is a named tuple: I think that's better than a class, > because SelectorKeys can be returned to the user (register(), get_info(), > etc), and we don't want him to mess with them. Also, we get a nice repr() > for free Fine. > - since SelectorKey is immutable, I added a get_keys() method, which > returns all the keys registered. Since it's a dict values, the user can't > mess with it Fine. (I missed this in the code review.) > This get_keys() method superseedes the registered_count() method, which > can be simply replaced with len(selector.get_keys()). We could probably > remove it? It also sort of superseedes the get_info() method: I don't > remember, why was it needed? registered_count() is only used internally so you can kill it. get_info() is important for Tulip, please keep it. > Would it be possible to get it into 3.4? I think so, if Antoine agrees. The next alpha would be at your service. |
|||
msg195722 - (view) | Author: STINNER Victor (vstinner) * | Date: 2013-08-20 22:21 | |
FYI I created the issue #18794: "select.devpoll objects have no close() method". If this method is added, it should also be called by the close() method of the selector. |
|||
msg195724 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2013-08-20 22:32 | |
I've lost track -- who is waiting for whom? I reviewed the patch selector-12.diff and received zero response AFAICT. (Maybe the email from Rietveld didn't make it?) I'd like to get this into alpha2, even if the proactor won't make it (though I'd love to see that as well). Maybe the main issue is that I would prefer if the selector classes were in a separate module (selectors.py) than the low-level select/poll/etc. calls. I think it makes sense if the selectors module *only* defines the high level classes and not the messy low-level stuff, since an app shouldn't be using both at the same time. |
|||
msg195726 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-08-20 22:37 | |
Well, I suppose it waits on Charles-François to update the patch. I'd prefer not to add a separate module for this, personally. I think this functionality is fine in the select module. People who know how to use a selector certainly have heard about select() or poll() already :-) (no strong feelings about it anyway) |
|||
msg195749 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-08-21 07:09 | |
> I've lost track -- who is waiting for whom? I reviewed the patch selector-12.diff and received zero response AFAICT. (Maybe the email from Rietveld didn't make it?) I was on vacation, and then had to cope with many emails :-) I'll update the patch with your comments soon. > Maybe the main issue is that I would prefer if the selector classes were in a separate module (selectors.py) than the low-level select/poll/etc. calls. I think it makes sense if the selectors module *only* defines the high level classes and not the messy low-level stuff, since an app shouldn't be using both at the same time. I agree with Antoine (i.e. I think select is OK), but if it really bothers you, I can change it. I just don't want to do it twice :-) |
|||
msg195760 - (view) | Author: STINNER Victor (vstinner) * | Date: 2013-08-21 09:37 | |
I like the idea of renaming select to _select, and add Lib/select.py for the high-level wrapper (Selector). In my opinion, adding a module just for 5 classes is overkill. Selector classes are a very thin abstraction over the low-level objects. A new module can be justified for something more evolved, something like an event loop with callbacks and a scheduler, something like the PEP 3156 (tulip) :-) This issue is for code written in Python. Is it a similar issue for code written in C? internal_select_ex() of Modules/socketmodule.c and check_socket_and_wait_for_timeout() of Modules/_ssl.c uses poll() and select(). |
|||
msg195801 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2013-08-21 16:15 | |
On Wed, Aug 21, 2013 at 2:37 AM, STINNER Victor <report@bugs.python.org>wrote: > > STINNER Victor added the comment: > > I like the idea of renaming select to _select, and add Lib/select.py > for the high-level wrapper (Selector). In my opinion, adding a module > just for 5 classes is overkill. Selector classes are a very thin > abstraction over the low-level objects. A new module can be justified > for something more evolved, something like an event loop with > callbacks and a scheduler, something like the PEP 3156 (tulip) :-) > OK, I'm just going to impose my decision. I want a separate selectors.py. The rename business feels unpleasant, *and* the interface abstraction level is different. > This issue is for code written in Python. Is it a similar issue for > code written in C? internal_select_ex() of Modules/socketmodule.c and > check_socket_and_wait_for_timeout() of Modules/_ssl.c uses poll() and > select(). > Let's leave those alone. AFAIK they are mostly to simulate timeouts in *blocking* code. |
|||
msg196075 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-08-24 12:04 | |
Alright, I've updated the patch to have a distinct selectors module, and with Guido's comments. Before I post it for final review, I have three more questions: 1) In the documentation, I don't know how to best refer to files object registered: is "file descriptor" OK, or is it too low-level? Otherwise I'd be tempted to use just "file", but then this doesn't include sockets, pipes, etc. Or maybe "file object"/"file-like object"? 2) currently, the select() method returns a list of (<file>, <event>, <data>) But the opaque "data" object is optional, which means that many user might end up doing: for file, event, data in selector.select(): if event & READ_EVENT: file.recv(1024) # don't use data i.e. have to unpack it for no reason. Would it make sense to return (<key>, <event>) instead? This way the user has all the interesting information, and can do: for key, event in selector.select(): if event & READ_EVENT: key.file.recv(1024) or os.read(key.fd, 1024) 3) Concerning get_info(): right now the signature is: get_info(): fileobj -> (events, data) Wouldn't it be better to just return the whole key instead, which contains all the relevant information (events, data, fd and fileobj). Then we should probably also rename the method to get_key()? |
|||
msg196092 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-08-24 19:27 | |
> Alright, I've updated the patch to have a distinct selectors module, > and with Guido's comments. Didn't you forget to upload it? |
|||
msg196098 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-08-24 20:33 | |
> Didn't you forget to upload it? I wanted to have answer to my questions first :-): > Before I post it for final review, I have three more questions: |
|||
msg196099 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-08-24 20:45 | |
> > Didn't you forget to upload it? > > I wanted to have answer to my questions first :-): Ah, indeed, sorry! > 1) In the documentation, I don't know how to best refer to files > object registered: is "file descriptor" OK, or is it too low-level? > Otherwise I'd be tempted to use just "file", but then this doesn't > include sockets, pipes, etc. Or maybe "file object"/"file-like > object"? "file object" and "file-like object" are the convention used in the docs: http://docs.python.org/dev/glossary.html#term-file-object That said, if the APIs also accept file descriptors, it must be mentioned. > i.e. have to unpack it for no reason. That doesn't seem much of a concern, does it? Perhaps there can be two separate methods, select() and select_with_data(), but that sounds a bit overkill. > 3) Concerning get_info(): right now the signature is: > get_info(): fileobj -> (events, data) > > Wouldn't it be better to just return the whole key instead [...] No opinion :-) |
|||
msg196240 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2013-08-26 19:49 | |
On Sat, Aug 24, 2013 at 5:04 AM, Charles-François Natali <report@bugs.python.org> wrote: > Before I post it for final review, I have three more questions: > 1) In the documentation, I don't know how to best refer to files > object registered: is "file descriptor" OK, or is it too low-level? > Otherwise I'd be tempted to use just "file", but then this doesn't > include sockets, pipes, etc. Or maybe "file object"/"file-like > object"? What Antoine said. And, moreover, it should be mentioned that on Windows it only accepts socket FDs; and on UNIX it should probably qualify this in some way to exclude disk FDs (which are always considered ready by select and friends). I don't really know what the correct UNIX-flavor-neutral terminology is for this case -- perhaps we have to enumerate the possibilities? (I know of FIFOs, character special devices, and sockets.) > 2) currently, the select() method returns a list of > (<file>, <event>, <data>) > > But the opaque "data" object is optional, which means that many user > might end up doing: > > for file, event, data in selector.select(): > if event & READ_EVENT: > file.recv(1024) > # don't use data > > i.e. have to unpack it for no reason. > > Would it make sense to return (<key>, <event>) instead? > This way the user has all the interesting information, and can do: > > for key, event in selector.select(): > if event & READ_EVENT: > key.file.recv(1024) > or > os.read(key.fd, 1024) I like this -- return the key plus the specific event. You must document SelectorKey, but that seems fine. (Looks like Antoine read too fast and didn't realize you proposed to return the key instead of the file and the data.) > 3) Concerning get_info(): right now the signature is: > get_info(): fileobj -> (events, data) > > Wouldn't it be better to just return the whole key instead, which > contains all the relevant information (events, data, fd and fileobj). > Then we should probably also rename the method to get_key()? Yes, and yes. |
|||
msg196403 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-08-28 18:33 | |
Alright, here's the patch, with all the changes discussed (and of cours a separate selectors.py). |
|||
msg196527 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * | Date: 2013-08-30 12:30 | |
Follows a couple of minor concerns. - What about Solaris' /dev/poll? - I'm not sure why in case of EINTR you retry with a different timeout; can't you just return []? - this is probably because I'm paranoid about performances but given that select() method will be called repeatedly I would not use a decorator. Also, similarly to what has been done elsewhere in the stdlib, for "critical" parts I would recommend localizing variable access in order to minimize overhead as in: def select(self, timeout=None): ... key_from_fd = self._key_from_fd ready_append = ready.append for fd in r | w: ... key = key_from_fd(fd) if key: ready_append((key, events & key.events)) |
|||
msg196536 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-08-30 13:34 | |
Hello, > - What about Solaris' /dev/poll? That should be pretty easy to add by someone who has access to a Solaris box: I could use the buildbots, but it'd take a couple iterations to get it right. > - I'm not sure why in case of EINTR you retry with a different timeout; can't you just return []? Because when I do: selector.select(10) I expect the selector to wait 10 seconds if no event occurs: the fact that a signal comes in shouldn't break the contract made by the API, i.e. that it will return when a FD is ready or the timeout expires. Early return can lead to spurious errors: just imagine you send a request to a busy server: it would be bad to raise a timeout error just because the user put the client in the background with CTRL-Z (which results in SIGSTOP). > - this is probably because I'm paranoid about performances but given that select() method will be called repeatedly I would not use a decorator. Also, similarly to what has been done elsewhere in the stdlib, for "critical" parts I would recommend localizing variable access in order to minimize overhead as in: > > def select(self, timeout=None): > ... > key_from_fd = self._key_from_fd > ready_append = ready.append > for fd in r | w: > ... > key = key_from_fd(fd) > if key: > ready_append((key, events & key.events)) I find that localizing variables leads to unreadable code, and is tied to the current CPython interpreter implementation: such optimizations belong to the interpreter, not user code. As for the decorator performance overhead, I don't think it weights much compared to the cost of a syscall (+ GIL acquire/release): with decorator: $ ./python -m timeit -s "from selectors import DefaultSelector, EVENT_WRITE; import os; s = DefaultSelector(); s.register(os.pipe()[1], EVENT_WRITE)" "s.select()" 100000 loops, best of 3: 3.69 usec per loop without decorator: $ ./python -m timeit -s "from selectors import DefaultSelector, EVENT_WRITE; import os; s = DefaultSelector(); s.register(os.pipe()[1], EVENT_WRITE)" "s.select()" 100000 loops, best of 3: 3.52 usec per loop That's a 4% overhead, with a single FD that's always ready (and I suspect that most of the overhead is due to the call to time(), not the decorator per se). Also, I'll shortly propose a patch to handle EINTR within C code, so those EINTR wrappers won't be needed anymore. |
|||
msg196537 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * | Date: 2013-08-30 13:39 | |
> That should be pretty easy to add by someone who has access to a > Solaris box: I could use the buildbots, but it'd take a couple > iterations to get it right. I can probably help with that. |
|||
msg196575 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * | Date: 2013-08-30 21:04 | |
I'm playing with devpoll on Open Solaris and it seems it works exactly as poll(). Basically a copy & paste of PollSelector class with a different self._poll attribute will suffice. With dev/poll support you now have 3 classes (PollSelector, EpollSelector, DevpollSelector) which look basically the same except for very little details. It looks like there's space for some code reuse / refactoring. |
|||
msg196596 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2013-08-30 23:22 | |
On Fri, Aug 30, 2013 at 6:34 AM, Charles-François Natali <report@bugs.python.org> wrote: [Giampaolo Rodola'] > > - I'm not sure why in case of EINTR you retry with a different timeout; can't you just return []? > > Because when I do: > selector.select(10) > > I expect the selector to wait 10 seconds if no event occurs: the fact > that a signal comes in shouldn't break the contract made by the API, > i.e. that it will return when a FD is ready or the timeout expires. I disagree. In this case it is almost certain that there is already some kind of retry loop around the Selector.select() call, so there is no need for the retry. (Certainly this is how it happens in Tulip.) In the case of select() the contract is actually quite clear, and there shouldn't be any problem with returning an empty list early. (This is quite different from e.g. an interrupted read() or recv() call, where there is no existing return value that you can return safely, since their users don't expect None and will usually treat an empty string as an EOF indicator.) > Early return can lead to spurious errors: just imagine you send a > request to a busy server: it would be bad to raise a timeout error > just because the user put the client in the background with CTRL-Z > (which results in SIGSTOP). Yeah, and Tulip's retry handles this just fine. I also noticed that you are hardcoding a dependency on time.monotonic() if it exists. If it weren't for your desire to "fix" this issue you wouldn't have needed a time function at all! > > - this is probably because I'm paranoid about performances but given that select() method will be called repeatedly I would not use a decorator. Also, similarly to what has been done elsewhere in the stdlib, for "critical" parts I would recommend localizing variable access in order to minimize overhead as in: > > > > def select(self, timeout=None): > > ... > > key_from_fd = self._key_from_fd > > ready_append = ready.append > > for fd in r | w: > > ... > > key = key_from_fd(fd) > > if key: > > ready_append((key, events & key.events)) > > I find that localizing variables leads to unreadable code, and is tied > to the current CPython interpreter implementation: such optimizations > belong to the interpreter, not user code. > > As for the decorator performance overhead, I don't think it weights > much compared to the cost of a syscall (+ GIL acquire/release): > with decorator: > $ ./python -m timeit -s "from selectors import DefaultSelector, > EVENT_WRITE; import os; s = DefaultSelector(); > s.register(os.pipe()[1], EVENT_WRITE)" "s.select()" > 100000 loops, best of 3: 3.69 usec per loop > without decorator: > $ ./python -m timeit -s "from selectors import DefaultSelector, > EVENT_WRITE; import os; s = DefaultSelector(); > s.register(os.pipe()[1], EVENT_WRITE)" "s.select()" > 100000 loops, best of 3: 3.52 usec per loop > > That's a 4% overhead, with a single FD that's always ready (and I > suspect that most of the overhead is due to the call to time(), not > the decorator per se). But still I agree with Giampaolo that the decorator doesn't feel right. It's like a try/except around the entire body of your function, which is usually a code smell (even though I realize the exception it catches can only be raised by the syscall). Please go back to the explicit try/except returning []. > Also, I'll shortly propose a patch to handle EINTR within C code, so > those EINTR wrappers won't be needed anymore. Hm, it may be a while before everyone is comfortable with that change. Who knows what might break? |
|||
msg196620 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-08-31 07:32 | |
2013/8/31 Guido van Rossum <report@bugs.python.org>: > But still I agree with Giampaolo that the decorator doesn't feel > right. It's like a try/except around the entire body of your function, > which is usually a code smell (even though I realize the exception it > catches can only be raised by the syscall). Please go back to the > explicit try/except returning []. Well, I'll change it, but: We have to change the select() docstring to "Wait until some registered file objects become ready, or for *at most* timeout seconds" All user code that wants to handle EINTR or correct timeout (which should be "all user code" in a perfect world) will have to wrap the call to select() inside a loop equivalent to this decorator, and I personally like the a rule of "never let the user do what the library can do for him". This includes for example multiprocessing and telnetlib which currently implement this EINTR handling loop with timeout computation, see e.g.: http://hg.python.org/cpython/file/acc7439b1406/Lib/telnetlib.py#l297 And other places in the stdlib either don't handle EINTR properly, or don't re-compute timeout (see e.g. http://bugs.python.org/issue12338). Also, I don't really see the problem with retrying upon EINTR, since: - the signal handler will be called right away anyway - in case where the select() is a part of an event loop, the event loop registers a wakup file descriptor with the selector, which means that upon interesting signal delivery, the select() call will return to the schedulor So in short, I don't see how that could be a nuisance, but I can certainly see how this could trick user code into raising spurious timeout-related errors, or be much more complicated than it ought to be (the above telnetlib example is IMO a great example of why EINTR and timeout computation should be handled at the selector level). |
|||
msg196625 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-08-31 07:54 | |
Oh, and FWIW, that's also what Java does: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5100121 http://hg.openjdk.java.net/jdk6/jdk6-gate/jdk/file/6daa81bdfd18/src/solaris/native/sun/nio/ch/PollArrayWrapper.c (I know some people don't like Java, but I've found that they have some really competent people and an extensive real-life exposure and feedback). |
|||
msg196632 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * | Date: 2013-08-31 12:09 | |
I may be missing something here but isn't the whole point of EINTR to interrupt a potentially long running syscall? Why protract function return time as a consequence? All the event loops I'm familiar with (twisted, tornado and asyncore) just 'return' on EINTR, supposedly because that was the best compromise to support syscall interruption without forcing users to handle EINTR themselves. |
|||
msg196633 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-08-31 12:23 | |
> I may be missing something here but isn't the whole point of EINTR to > interrupt a potentially long running syscall? Not exactly. The point is to signal (!) that a signal was received. The received signal is not necessarily expected. Also, from what I understand, it is more reliable to use a wakeup fd (using e.g. signal.set_wakeup_fd) rather than expect EINTR to be returned: http://mail.python.org/pipermail/python-dev/2013-August/128204.html |
|||
msg196634 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-08-31 12:26 | |
> All the event loops I'm familiar with (twisted, tornado and asyncore) > just 'return' on EINTR, supposedly because that was the best > compromise to support syscall interruption without forcing users to > handle EINTR themselves. What do you mean? Tornado simply retries on EINTR: https://github.com/facebook/tornado/blob/master/tornado/ioloop.py#L652 |
|||
msg196635 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-08-31 12:28 | |
> I may be missing something here but isn't the whole point of EINTR to interrupt a potentially long running syscall? No. EINTR is an artifact of the early Unix design, because failing when a signal was delivered was simpler than restarting the syscall. Nowadays, most syscalls are restarted by default (that's was true on BSD, and nowadays Linux also follows this trend). See e.g. http://lkml.indiana.edu/hypermail/linux/kernel/0104.0/0743.html and http://man7.org/linux/man-pages/man7/signal.7.html One reason why EINTR can occur so often in CPython is because we set up signal handlers without SA_RESTART (which is set by default by signal(3)). The reason is likely that we want the syscall to return to be able to call PyCheckSignals() (since we cannot run Python code from a signal handler). But since in this case, PyCheckSignals() is called anyway from the main eval loop, signal handlers will still be run in a timely manner (and as noted, the proper way to handle signals is to use a wakeup file descriptor which can be used by select()). |
|||
msg196641 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2013-08-31 15:14 | |
> Charles-François Natali added the comment: > Well, I'll change it, but: > We have to change the select() docstring to "Wait until some > registered file objects become ready, or for *at most* timeout > seconds" Doesn't sound so bad to me. It's about the same guarantee as sleep(). > All user code that wants to handle EINTR or correct timeout (which > should be "all user code" in a perfect world) will have to wrap the > call to select() inside a loop equivalent to this decorator, and I > personally like the a rule of "never let the user do what the library > can do for him". No, they'll use a framework with an event loop like Tulip that takes care of all of this for them. Such frameworks all have logic in them to handle early returns from the low-level select/poll operation. I find it easier to reason about the correctness of the framework's code without your decorator: (1) I have to convince myself that the code wrapped by your decorator doesn't change any global state, *or* that there is a guarantee that the exception caught is only raised by the select()/poll()/etc. syscall, not anywhere else in the wrapped method. (2) I have to remember that if a signal handler is called that modifies the event loop's deadline, the selector will return immediately anyway (so the event loop can recalculate its deadline) because of the self-pipe. (3) I have to prove that your decorator uses the same clock as my framework. (4) I have to prove that your code does the same thing if the process is suspended for a really long time and the system clock changes in the meantime. > This includes for example multiprocessing and telnetlib which > currently implement this EINTR handling loop with timeout computation, > see e.g.: > http://hg.python.org/cpython/file/acc7439b1406/Lib/telnetlib.py#l297 > > And other places in the stdlib either don't handle EINTR properly, or > don't re-compute timeout (see e.g. http://bugs.python.org/issue12338). > > Also, I don't really see the problem with retrying upon EINTR, since: > - the signal handler will be called right away anyway > - in case where the select() is a part of an event loop, the event > loop registers a wakup file descriptor with the selector, which means > that upon interesting signal delivery, the select() call will return > to the schedulor > > So in short, I don't see how that could be a nuisance, but I can > certainly see how this could trick user code into raising spurious > timeout-related errors, or be much more complicated than it ought to > be (the above telnetlib example is IMO a great example of why EINTR > and timeout computation should be handled at the selector level). The selector is a helper for higher-level frameworks. I think the EINTR handling belongs in the framework. |
|||
msg196649 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-08-31 17:06 | |
Here's a patch returning [] on EINTR. I've tested it on Linux, Windows, FreeBSD, OpenSolaris and OS-X. |
|||
msg196651 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-08-31 17:15 | |
> Here's a patch returning [] on EINTR. Hmm... Shouldn't it simply let the InterruptedError bubble up? I know there's the sleep() precedent, but I find it a bit embarassing actually: the call can return early, but there's no way for the caller to know that it returned early. |
|||
msg196654 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2013-08-31 17:26 | |
Hm... I like the [] return just fine, it matches what Tulip's selector did. The framework should be re-checking the clock anyway -- it's possible that select() returned late too... |
|||
msg196926 - (view) | Author: Roundup Robot (python-dev) | Date: 2013-09-04 17:05 | |
New changeset e4d45315c38c by Charles-François Natali in branch 'default': Issue #16853: Add new selectors module. http://hg.python.org/cpython/rev/e4d45315c38c |
|||
msg196930 - (view) | Author: STINNER Victor (vstinner) * | Date: 2013-09-04 18:27 | |
> New changeset e4d45315c38c by Charles-François Natali in branch 'default': > Issue #16853: Add new selectors module. > http://hg.python.org/cpython/rev/e4d45315c38c Great! Congrats Charles-François for the new module! I tried to implement such module once but I gave up because it was more complex than what I expected. |
|||
msg196931 - (view) | Author: Roundup Robot (python-dev) | Date: 2013-09-04 18:30 | |
New changeset 6b14ebe0f7ac by Victor Stinner in branch 'default': Issue #16853: Mention the new selectors module in What's New in Python 3.4 http://hg.python.org/cpython/rev/6b14ebe0f7ac |
|||
msg196932 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2013-09-04 18:31 | |
Agreed, this was a great feat of implementation and API design. Is there anything left to do or can we close this issue? (I looked at the Proactor code in Tulip again and I think it is not quite as easy to isolate it, so I'm not requesting that Proactors be added to the stdlib at this time.) |
|||
msg196933 - (view) | Author: STINNER Victor (vstinner) * | Date: 2013-09-04 18:35 | |
> Is there anything left to do or can we close this issue? It would be nice to mention the new selectors module in the documentation of the select module. |
|||
msg196934 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-09-04 18:37 | |
> > Is there anything left to do or can we close this issue? > > It would be nice to mention the new selectors module in the documentation of the select module. Feel free to make documentation commits :) They don't necessarily have to pass a review stage. |
|||
msg196935 - (view) | Author: Roundup Robot (python-dev) | Date: 2013-09-04 18:40 | |
New changeset 142ff216e4b4 by Victor Stinner in branch 'default': Issue #16853: Mention the new selectors module in the select module http://hg.python.org/cpython/rev/142ff216e4b4 |
|||
msg196938 - (view) | Author: STINNER Victor (vstinner) * | Date: 2013-09-04 18:45 | |
> Feel free to make documentation commits :) Ok, done. I also created the issue #18923: "Use the new selectors module in the subprocess module". |
|||
msg196939 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-09-04 18:47 | |
I feel this can be closed then :) Thanks, Charles-François! |
|||
msg196940 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * | Date: 2013-09-04 18:48 | |
> Is there anything left to do or can we close this issue? Add support for select.devpoll on Solaris. |
|||
msg196941 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-09-04 18:49 | |
> Is there anything left to do or can we close this issue? I usually wait until the test is run on the buildbots to check I didn't break anything: apparently that's OK, so we can close. > It would be nice to mention the new selectors module in the documentation of the select module. Indeed (I mentioned the select module in selectors' doc, but the contrary is certainly useful to point users to this new module), thanks for your commit. Where were you when I asked for documentation review ;-) |
|||
msg196957 - (view) | Author: Giampaolo Rodola' (giampaolo.rodola) * | Date: 2013-09-04 21:10 | |
I have realized just now that at least epoll() and kqueue() selectors could take advantage and define their own modify() method (there was even a TODO I totally missed). Anyway, from now on I'm gonna file separate issues. |
|||
msg196984 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-09-05 08:27 | |
> Giampaolo Rodola' added the comment: > > I have realized just now that at least epoll() and kqueue() selectors could take advantage and define their own modify() method (there was even a TODO I totally missed). As a side note, in the general case, there's more than a performance optimization: the problem with unregister() + register() vs a real modify (e.g. EPOLL_CTL_MOD) is that it's subject to a race condition, if an event fires during the short window where the FD isn't registered anymore. But since selectors use level-triggered notification, that's not an issue: I'll probably add a comment in the code explaining why it's safe, though. |
|||
msg196996 - (view) | Author: Richard Oudkerk (sbt) * | Date: 2013-09-05 13:09 | |
On 05/09/2013 9:28am, Charles-François Natali wrote: > As a side note, in the general case, there's more than a performance > optimization: the problem with unregister() + register() vs a real > modify (e.g. EPOLL_CTL_MOD) is that it's subject to a race condition, > if an event fires during the short window where the FD isn't > registered anymore. Even with edge-triggered notification, doesn't registering an fd check the current readiness: >>> import os, select >>> r, w = os.pipe() >>> p = select.epoll() >>> os.write(w, b"h") >>> p.register(r, select.EPOLLIN | select.EPOLLET) >>> p.poll(0) [(3, 1)] >>> p.poll(0) [] Otherwise it would be very difficult to edge-triggered notification. |
|||
msg197010 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-09-05 16:42 | |
> Richard Oudkerk added the comment: > > Even with edge-triggered notification, doesn't registering an fd check > the current readiness: Hum... Looks like I forgot to turn my brain on this morning :-) |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:40 | admin | set | github: 61057 |
2014-03-17 19:47:21 | jcea | set | nosy:
+ jcea |
2013-09-05 16:42:34 | neologix | set | messages: + msg197010 |
2013-09-05 13:09:02 | sbt | set | messages: + msg196996 |
2013-09-05 08:28:00 | neologix | set | messages: + msg196984 |
2013-09-04 21:10:27 | giampaolo.rodola | set | messages: + msg196957 |
2013-09-04 18:49:26 | neologix | set | messages: + msg196941 |
2013-09-04 18:48:10 | giampaolo.rodola | set | messages: + msg196940 |
2013-09-04 18:47:05 | pitrou | set | status: open -> closed resolution: fixed messages: + msg196939 stage: patch review -> resolved |
2013-09-04 18:45:33 | vstinner | set | messages: + msg196938 |
2013-09-04 18:40:27 | python-dev | set | messages: + msg196935 |
2013-09-04 18:37:14 | pitrou | set | messages: + msg196934 |
2013-09-04 18:35:44 | vstinner | set | messages: + msg196933 |
2013-09-04 18:31:46 | gvanrossum | set | messages: + msg196932 |
2013-09-04 18:30:48 | python-dev | set | messages: + msg196931 |
2013-09-04 18:27:17 | vstinner | set | messages: + msg196930 |
2013-09-04 17:05:45 | python-dev | set | nosy:
+ python-dev messages: + msg196926 |
2013-08-31 17:26:47 | gvanrossum | set | messages: + msg196654 |
2013-08-31 17:15:46 | pitrou | set | messages: + msg196651 |
2013-08-31 17:06:54 | neologix | set | files:
+ selectors-15.diff messages: + msg196649 |
2013-08-31 15:14:18 | gvanrossum | set | messages: + msg196641 |
2013-08-31 12:28:39 | neologix | set | messages: + msg196635 |
2013-08-31 12:26:00 | pitrou | set | messages: + msg196634 |
2013-08-31 12:23:07 | pitrou | set | messages: + msg196633 |
2013-08-31 12:09:35 | giampaolo.rodola | set | messages: + msg196632 |
2013-08-31 07:54:36 | neologix | set | messages: + msg196625 |
2013-08-31 07:32:07 | neologix | set | messages: + msg196620 |
2013-08-30 23:22:21 | gvanrossum | set | messages: + msg196596 |
2013-08-30 21:04:13 | giampaolo.rodola | set | messages: + msg196575 |
2013-08-30 13:39:46 | giampaolo.rodola | set | messages: + msg196537 |
2013-08-30 13:34:09 | neologix | set | messages: + msg196536 |
2013-08-30 12:30:49 | giampaolo.rodola | set | messages: + msg196527 |
2013-08-28 18:33:38 | neologix | set | files:
+ selectors-14.diff messages: + msg196403 |
2013-08-26 19:49:17 | gvanrossum | set | messages: + msg196240 |
2013-08-24 20:45:12 | pitrou | set | messages: + msg196099 |
2013-08-24 20:33:24 | neologix | set | messages: + msg196098 |
2013-08-24 19:27:10 | pitrou | set | messages: + msg196092 |
2013-08-24 12:04:15 | neologix | set | messages: + msg196075 |
2013-08-21 16:15:14 | gvanrossum | set | messages: + msg195801 |
2013-08-21 09:37:26 | vstinner | set | messages: + msg195760 |
2013-08-21 07:09:50 | neologix | set | messages: + msg195749 |
2013-08-20 22:37:44 | pitrou | set | messages: + msg195726 |
2013-08-20 22:32:21 | gvanrossum | set | messages: + msg195724 |
2013-08-20 22:21:31 | vstinner | set | nosy:
+ vstinner messages: + msg195722 |
2013-08-06 23:55:39 | gvanrossum | set | messages: + msg194593 |
2013-08-06 23:33:16 | gvanrossum | set | messages: + msg194592 |
2013-08-06 23:01:41 | gvanrossum | set | messages: + msg194590 |
2013-08-03 10:22:39 | neologix | set | messages: + msg194243 |
2013-08-03 04:21:36 | gvanrossum | set | messages: + msg194236 |
2013-08-02 22:47:55 | neologix | set | files:
+ selector-12.diff messages: + msg194218 |
2013-08-02 17:59:46 | neologix | set | files:
+ selector-11.diff, tulip_selector_stdlib.diff messages: + msg194190 |
2013-08-02 16:22:38 | pitrou | set | messages: + msg194184 |
2013-08-02 16:16:14 | gvanrossum | set | messages: + msg194182 |
2013-08-02 15:16:53 | neologix | set | keywords:
+ needs review files: + selector-9.diff messages: + msg194179 |
2013-04-12 11:13:11 | neologix | set | messages: + msg186617 |
2013-04-08 10:07:12 | pitrou | set | messages: + msg186286 |
2013-01-09 21:45:04 | gvanrossum | set | messages: + msg179491 |
2013-01-09 21:43:32 | gvanrossum | set | messages: + msg179490 |
2013-01-09 21:28:28 | gvanrossum | set | messages: + msg179489 |
2013-01-09 20:47:31 | neologix | set | messages: + msg179486 |
2013-01-09 18:04:46 | gvanrossum | set | messages: + msg179481 |
2013-01-09 08:02:21 | neologix | set | messages: + msg179423 |
2013-01-08 23:13:55 | sbt | set | messages: + msg179392 |
2013-01-08 23:08:37 | gvanrossum | set | messages: + msg179389 |
2013-01-08 22:52:52 | neologix | set | messages: + msg179387 |
2013-01-08 22:47:49 | gvanrossum | set | files:
+ return_key_fix.diff messages: + msg179385 |
2013-01-08 22:45:27 | gvanrossum | set | messages: + msg179384 |
2013-01-08 22:44:20 | gvanrossum | set | messages: + msg179383 |
2013-01-08 22:31:38 | neologix | set | files:
+ tulip-selectors-3.diff messages: + msg179382 |
2013-01-08 22:22:25 | gvanrossum | set | files:
+ tulip-selectors-3.diff messages: + msg179381 |
2013-01-08 18:56:12 | gvanrossum | set | messages: + msg179365 |
2013-01-08 10:49:28 | neologix | set | messages: + msg179328 |
2013-01-08 00:45:47 | gvanrossum | set | messages: + msg179300 |
2013-01-08 00:36:58 | felipecruz | set | messages: + msg179299 |
2013-01-07 22:12:52 | neologix | set | files: - tulip-selectors-2.diff |
2013-01-07 22:12:42 | neologix | set | files: - selector-8.diff |
2013-01-07 22:12:32 | neologix | set | files: + tulip-selectors-2.diff |
2013-01-07 22:12:18 | neologix | set | files: + selector-8.diff |
2013-01-07 19:38:43 | neologix | set | files: - selector-data.diff |
2013-01-07 19:38:34 | neologix | set | files: - selector-3.diff |
2013-01-07 19:38:24 | neologix | set | files: + tulip-selectors-2.diff |
2013-01-07 19:37:44 | neologix | set | files: - selector-5.diff |
2013-01-07 19:37:19 | neologix | set | files: - tulip-selectors-2.diff |
2013-01-07 19:33:03 | neologix | set | files: + tulip-selectors-2.diff |
2013-01-07 19:32:47 | neologix | set | files: + selector-8.diff |
2013-01-07 19:32:10 | neologix | set | files: - selector-6.diff |
2013-01-07 19:32:01 | neologix | set | files: - selector_poll_events.diff |
2013-01-07 19:31:52 | neologix | set | files: - tulip-selectors-2.diff |
2013-01-07 19:31:42 | neologix | set | files: - selector-8.diff |
2013-01-07 19:06:37 | neologix | set | files:
+ tulip-selectors-2.diff, selector-8.diff messages: + msg179275 |
2013-01-07 10:50:13 | neologix | set | files:
+ selector_poll_events.diff messages: + msg179255 |
2013-01-07 09:31:26 | neologix | set | messages: + msg179252 |
2013-01-07 03:00:36 | gvanrossum | set | nosy:
+ sbt messages: + msg179242 |
2013-01-07 02:59:13 | gvanrossum | set | messages: + msg179241 |
2013-01-07 02:46:45 | gvanrossum | set | files:
+ tulip-selectors-1.diff hgrepos: + hgrepo167 messages: + msg179240 |
2013-01-06 21:36:13 | neologix | set | files:
+ selector-6.diff messages: + msg179231 |
2013-01-06 00:59:10 | gvanrossum | set | messages: + msg179161 |
2013-01-06 00:28:54 | neologix | set | messages: + msg179156 |
2013-01-05 21:05:59 | meador.inge | set | nosy:
+ meador.inge messages: + msg179144 |
2013-01-05 20:23:27 | neologix | set | files:
+ selector-5.diff messages: + msg179142 |
2013-01-05 19:20:23 | gvanrossum | set | messages: + msg179135 |
2013-01-05 16:28:53 | giampaolo.rodola | set | messages: + msg179129 |
2013-01-05 15:30:10 | neologix | set | messages: + msg179127 |
2013-01-05 15:22:39 | giampaolo.rodola | set | messages: + msg179126 |
2013-01-05 14:16:40 | giampaolo.rodola | set | messages: + msg179122 |
2013-01-05 12:40:12 | neologix | set | files:
+ selector-data.diff messages: + msg179120 |
2013-01-04 21:35:01 | rosslagerwall | set | messages: + msg179082 |
2013-01-04 20:34:29 | neologix | set | files: - selector-2.diff |
2013-01-04 20:34:18 | neologix | set | files:
+ selector-3.diff messages: + msg179078 |
2013-01-04 19:57:42 | neologix | set | messages: + msg179075 |
2013-01-04 19:53:07 | rosslagerwall | set | messages: + msg179074 |
2013-01-04 19:11:39 | gvanrossum | set | messages: + msg179070 |
2013-01-04 18:44:06 | giampaolo.rodola | set | messages: + msg179066 |
2013-01-04 17:06:20 | pitrou | set | messages: + msg179046 |
2013-01-04 16:48:11 | gvanrossum | set | messages: + msg179042 |
2013-01-04 13:28:04 | christian.heimes | set | nosy:
+ christian.heimes |
2013-01-04 13:18:57 | neologix | set | files: - selector-1.diff |
2013-01-04 13:18:47 | neologix | set | files: + selector_bench.py |
2013-01-04 13:18:11 | neologix | set | files:
+ selector-2.diff messages: + msg179018 |
2013-01-04 01:37:06 | gvanrossum | set | messages: + msg179001 |
2013-01-03 22:26:58 | neologix | set | files: - selector.diff |
2013-01-03 22:26:45 | neologix | set | files:
+ selector-1.diff messages: + msg178992 |
2013-01-03 19:57:23 | felipecruz | set | messages: + msg178981 |
2013-01-03 19:57:16 | neologix | set | messages: + msg178980 |
2013-01-03 19:53:04 | neologix | set | messages: + msg178979 |
2013-01-03 19:32:29 | rosslagerwall | set | nosy:
+ rosslagerwall |
2013-01-03 19:18:27 | pitrou | set | nosy:
+ gvanrossum messages: + msg178974 stage: patch review |
2013-01-03 18:44:45 | felipecruz | set | nosy:
+ felipecruz messages: + msg178973 |
2013-01-03 17:38:33 | neologix | set | files: + selector_telnetlib.diff |
2013-01-03 17:38:18 | neologix | set | files: + selector_multiprocessing.diff |
2013-01-03 17:38:06 | neologix | create |