classification
Title: selectors: provide a helper to choose a selector using constraints
Type: Stage:
Components: Documentation Versions: Python 3.4
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: docs@python Nosy List: docs@python, gvanrossum, neologix, pitrou, vstinner
Priority: normal Keywords:

Created on 2013-10-31 22:36 by vstinner, last changed 2014-02-11 18:36 by gvanrossum. This issue is now closed.

Messages (14)
msg201855 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-10-31 22:36
multiprocess, telnetlib (and subprocess in a near future, see #18923) use the following code to select the best selector:

# poll/select have the advantage of not requiring any extra file descriptor,
# contrarily to epoll/kqueue (also, they require a single syscall).
if hasattr(selectors, 'PollSelector'):
    _TelnetSelector = selectors.PollSelector
else:
    _TelnetSelector = selectors.SelectSelector

I don't like the principle of "a default selector", selectors.DefaultSelector should be removed in my opinion.

I would prefer a function returning the best selector using constraints. Example:

def get_selector(use_fd=True) -> BaseSelector:
  ...

By default, it would return the same than the current DefaultSelector. But if you set use_fd=False, the choice would be restricted to select() or poll().

I don't want to duplicate code like telnetlib uses in each module, it's harder to maintain. The selectors module may get new selectors in the future, see for example #18931.

Except use_fd, I don't have other ideas of constraints. I read somewhere that differenet selectors may have different limits on the number of file descriptors. I don't know if it's useful to use such constraint?
msg201856 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-10-31 22:45
What's the use case for not wanting to use an extra FD?

Nevertheless I'm fine with using a function to pick the default selector (but it requires some changes to asyncio too, which currently uses DefaultSelector).

Something I would find useful would be a way to override the selector choice on the command line.  I currently have to build this into the app's arg parser and main(), e.g. http://code.google.com/p/tulip/source/browse/examples/sink.py#64
msg201857 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-10-31 22:59
> What's the use case for not wanting to use an extra FD?

A selector may be used a few millisecond just to check if a socket is ready, and then destroyed. For such use case, select() is maybe enough (1 syscall). Epoll requires more system calls: create the epoll FD, register the socket, poll, destroy the epoll FD (4 syscalls).
msg201858 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-10-31 23:36
Hm... I'm trying to understand how you're using the selector in
telnetlib.py (currently the only example outside asyncio). It seems you're
always using it with a single file/object, which is always 'self' (which
wraps a socket), except one place where you're also selecting on stdin.
Sometimes you're using select(0) to check whether I/O is possible right
now, using select(0), and then throw away the selector; other times you've
got an actual loop.

I wonder if you could just create the selector when the Telnet class is
instantiated (or the first time you need the selector) and keep the socket
permanently registered; IIUC selectors are level-triggered, and no
resources are consumed when you're not calling its select() method. (I
think this means that if the socket was ready at some point in the past,
but you already read those bytes, and now you're calling select(), it won't
be considered ready even though it was registered the whole time.)

It still seems to me that this is pretty atypical use of selectors; the
extra FD used doesn't bother me much, since it doesn't really scale anyway
(that would require hooking multiple Telnet instances into the the same
selector, probably using an asyncio EventLoop).

If you insist on having a function that prefers poll and select over kqueue
or epoll, perhaps we can come up with a slightly higher abstraction for the
preference order? Maybe faster startup time vs. better scalability? (And I
wouldn't be surprised if on Windows you'd still be better off using
IocpProactor instead of SelectSelector -- but that of course has a
different API altogether.)
msg201859 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-10-31 23:37
Hm... I'm trying to understand how you're using the selector in
telnetlib.py (currently the only example outside asyncio). It seems you're
always using it with a single file/object, which is always 'self' (which
wraps a socket), except one place where you're also selecting on stdin.
Sometimes you're using select(0) to check whether I/O is possible right
now, using select(0), and then throw away the selector; other times you've
got an actual loop.

I wonder if you could just create the selector when the Telnet class is
instantiated (or the first time you need the selector) and keep the socket
permanently registered; IIUC selectors are level-triggered, and no
resources are consumed when you're not calling its select() method. (I
think this means that if the socket was ready at some point in the past,
but you already read those bytes, and now you're calling select(), it won't
be considered ready even though it was registered the whole time.)

It still seems to me that this is pretty atypical use of selectors; the
extra FD used doesn't bother me much, since it doesn't really scale anyway
(that would require hooking multiple Telnet instances into the the same
selector, probably using an asyncio EventLoop).

If you insist on having a function that prefers poll and select over kqueue
or epoll, perhaps we can come up with a slightly higher abstraction for the
preference order? Maybe faster startup time vs. better scalability? (And I
wouldn't be surprised if on Windows you'd still be better off using
IocpProactor instead of SelectSelector -- but that of course has a
different API altogether.)
msg201863 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-01 00:18
> It still seems to me that this is pretty atypical use of selectors

I already implemented something similar to subprocess.Popen.communicate() when I was working on old Python versions without the timeout parameter of communicate().
http://ufwi.org/projects/edw-svn/repository/revisions/master/entry/trunk/src/nucentral/nucentral/common/process.py#L222

IMO calling select with a few file descriptors (between 1 and 3) and destroying quickly the "selector" is no a rare use case.

If I would port my code to selectors, I don't want to rewrite it to keep the selector alive longer, just because selectors force me to use the super-powerful fast epoll/kqueue selector.

(To be honest, I will probably not notice any performance impact. But I like reducing the number of syscalls, not the opposite :-))
msg201865 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-11-01 00:58
OK. Let's have a function to select a default selector. Can you think of a
better name for the parameter? Or maybe there should be two functions?
msg201866 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-01 01:03
> OK. Let's have a function to select a default selector.
> Can you think of a better name for the parameter? Or
> maybe there should be two functions?

I prefer to leave the question to the author of the module, Charles-François :-)
msg201889 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-11-01 11:07
There are actually two reasons to choosing poll over epoll/kqueue
(i.e. no extra FD):
- it's a bit faster (1 syscall vs 3)
- but more importantly - and that's the main reason I did it in
telnetlib/multiprocessing/subprocess - sometimes, you really don't
want to use an extra FD: for example, if you're creating 300
telnet/subprocess instances, one more FD per instance can make you
reach RLIMIT_NOFILE, which makes some syscalls fail with EMFILE (at
work we have up to a 100 machines, and we spawn 1 subprocess per
machine when distributing files with bittorrent).

So I agree it would be nice to have a better way to get a selector not
requiring any extra FD.

The reason I didn't add such a method in the first place is that I
don't want to end up like many Java APIs:
Foo.getBarFactory().getInstance().initialize().provide() :-)

> I read somewhere that differenet selectors may have different limits on the number of file descriptors.

Apart from select(), all other selectors don't have an upper limit.

As for the performance profiles, depending on the application usage,
select() can be faster than poll(), poll() can be faster than epoll(),
etc. But since it's really highly usage-specific - and of course OS
specific - I think the current choice heuristic is fine: people with
specific needs can just use PollSelector/EpollSelector themselves.

To sum up, get_selector(use_fd=True) looks fine to me.
msg201906 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-11-01 14:41
Hm. If you really are going to create 300 instances, you should probably
use asyncio. Otherwise, how are you going to multiplex them? Create 300
threads each doing select() on 1 FD? That sounds like a poor architecture
and I don't want to bend over backwards to support or encourage that.
msg201910 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-11-01 15:13
Of course, when I have 300 connections to remote nodes, I use poll()
to multiplex between them.

But there are times when you can have a large number of threads
running concurrently, and if many of them call e.g.
subprocess.check_output() at the same time (which does call
subprocess.communicate() behind the scene, and thus calls
select/poll), then one extra FD per instance could be an issue.
For example, in http://bugs.python.org/issue18756, os.urandom() would
start failing when multiple threads called it at the same time.
msg204108 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-11-23 21:26
I think this is more of a documentation issue. People who don't want a new fd can hardcode PollSelector (poll has been POSIX for a long time).
msg204119 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-11-23 22:36
> Antoine Pitrou added the comment:
>
> I think this is more of a documentation issue. People who don't want a new fd can hardcode PollSelector (poll has been POSIX for a long time).

That's also what I now think.
I don't think that the use case is common enough to warrant a
"factory", a default selector is fine.
msg210994 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-02-11 17:53
It looks you rejected my idea, so I'm in favor of just closing the issue. Do you agree?
History
Date User Action Args
2014-02-11 18:36:59gvanrossumsetstatus: open -> closed
resolution: wont fix
2014-02-11 17:53:34vstinnersetmessages: + msg210994
2013-11-23 22:36:57pitrousetassignee: docs@python

components: + Documentation
nosy: + docs@python
2013-11-23 22:36:14neologixsetmessages: + msg204119
2013-11-23 21:26:43pitrousetnosy: + pitrou
messages: + msg204108
2013-11-01 15:13:03neologixsetmessages: + msg201910
2013-11-01 14:41:05gvanrossumsetmessages: + msg201906
2013-11-01 11:07:03neologixsetmessages: + msg201889
2013-11-01 01:03:46vstinnersetmessages: + msg201866
2013-11-01 00:58:39gvanrossumsetmessages: + msg201865
2013-11-01 00:18:07vstinnersetmessages: + msg201863
2013-10-31 23:37:13gvanrossumsetmessages: + msg201859
2013-10-31 23:36:33gvanrossumsetmessages: + msg201858
2013-10-31 22:59:13vstinnersetmessages: + msg201857
2013-10-31 22:45:54gvanrossumsetmessages: + msg201856
2013-10-31 22:36:21vstinnercreate