Issue 16873: increase epoll.poll() maxevents default value, and improve documentation

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/61077

classification

Title:	increase epoll.poll() maxevents default value, and improve documentation
Type:	enhancement	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.4

process

Status:	closed	Resolution:	not a bug
Dependencies:		Superseder:
Assigned To:		Nosy List:	neologix, pitrou, sbt
Priority:	normal	Keywords:

Created on 2013-01-06 00:33 by neologix, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
test_epoll.py	neologix, 2013-01-06 14:37
test_epoll_2.py	sbt, 2013-01-06 19:49

Messages (12)
msg179157 - (view)	Author: Charles-François Natali (neologix) *	Date: 2013-01-06 00:33
In issue #16853, it was noted that many several projects don't set epoll.poll() maxevents argument, which effectively limits the number of events retuend to FD_SETSIZE-1 (set in selectmodule.c). Also, the methode documentation can confuse users into thinking that by default, the number of events is unlimited: """ .. method:: epoll.poll(timeout=-1, maxevents=-1) Wait for events. timeout in seconds (float) """ It would probably make sense to use a larger default value for epoll max events (the only downside is increased memory consumption for the events array), maybe based on RLIMIT_NOFILE hard limit. And the documentation should probably be improved.
msg179180 - (view)	Author: Richard Oudkerk (sbt) *	Date: 2013-01-06 13:12
Is this actually a problem? If events are arranged in a queue and epoll_wait() just removes the oldest events (up to maxevents) from that queue then there would be no problem with using a small value for maxevents. I don't know if that is the case, but I would consider epoll to be broken if it does not do something similar.
msg179183 - (view)	Author: Charles-François Natali (neologix) *	Date: 2013-01-06 14:37
The implementation can't simply drain a queue, because it can be level-triggered (which is the case by default), so you want to keep events around. The kernel actually uses a red-black tree, but I didn't really check how it's done (there's probably very good reasosns for that). Anyway, it can turn out to be a problem, for two reasons: - performance: by specifying a maxevents value too low, several calls to epoll_wait() must be made, instead of being able to process all events at once - the main problem that came to my mind is really starvation: let's say you have 2FD_SETSIZE client sockets registered in your poll object. The first call to epoll_wait() returns sockets from 0 to FD_SETSIZE-1: you process them, so they're not ready anymore. The next call returns the clients from FD_SETSIZE to 2FD_SETSIZE, same thing. But by the time you call epoll_wait() for the third time, if the first FD_SETSIZE clients are ready again, they will be returned, etc. So the 2*FD_SETSIZE th client may very well never be reported ready: that's starvation. I actually wrote a script to reproduce this issue: """ $ ./python /home/cf/test_epoll.py Working with 4080 FDs, -1 maxevents Number of missing FDs:4080 Number of ready FDs: 1023 Number of missing FDs:3057 Number of ready FDs: 0 Number of missing FDs:3057 Number of ready FDs: 1023 Number of missing FDs:2034 Number of ready FDs: 0 Number of missing FDs:2034 Number of ready FDs: 1023 Number of missing FDs:2034 Number of ready FDs: 0 Number of missing FDs:2034 Number of ready FDs: 1023 Number of missing FDs:2034 Number of ready FDs: 0 [...] """ If you specify a large enough maxevents: """ $ ./python /home/cf/test_epoll.py 64000 Working with 4080 FDs, 64000 maxevents Number of missing FDs:4080 Number of ready FDs: 4080 """ Note that it's really a corner issue, but I stumpled upon this problem while writing a test in issue #16853, and several projects (Tulip, Tornado, pyftpdlib) fall into this trap. I see several options: 1) just keep it that way (i.e. with maxevents set to FD_SETSIZE), and add a note in the documentation. I think it's always better to handle problems in the library than let the users get bitten. 2) increase `maxevents` default value. The problem is that I don't like magic values, and a too large value could incur increased memory consumption (and with the current implementation reduced performance because the epoll events buffer is allocated at each poll(), see issue #16876. 3) use a simple heuristic: start with a reasonable value for `maxevents` (FD_SETSIZE seems like a good candidate), and if epoll_wait() ever returns `maxevents` events, double the value (that's what libevent does, with a capping to 32000).
msg179185 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2013-01-06 14:53
> 3) use a simple heuristic: start with a reasonable value for > `maxevents` (FD_SETSIZE seems like a good candidate), and if > epoll_wait() ever returns `maxevents` events, double the value (that's > what libevent does, with a capping to 32000). What if some events are edge-triggered?
msg179189 - (view)	Author: Richard Oudkerk (sbt) *	Date: 2013-01-06 15:45
> I actually wrote a script to reproduce this issue: The program does not demonstrate starvation because you are servicing the resource represented by the "starved" duplicate fds before calling poll() again. You are creating thousands of duplicate handles for the same resource and then complaining that they do not behave independently! I tried modifing your program by running poll() in a loop, exiting when no more unseen fds are reported as ready. This makes the program exit immediately. So ready_writers = set(fd for fd, evt in ep.poll(-1, MAXEVENTS) if fd != r) seen_writers \|= ready_writers becomes while True: ready_writers = set(fd for fd, evt in ep.poll(-1, MAXEVENTS) if fd != r) if ready_writers.issubset(seen_writers): break seen_writers \|= ready_writers I still cannot see a problem with epoll().
msg179201 - (view)	Author: Charles-François Natali (neologix) *	Date: 2013-01-06 18:14
> The program does not demonstrate starvation because you are servicing the resource represented by the "starved" duplicate fds before calling poll() again. No. What the program does is the following: while all the write FDs have not been returned by epoll.wait() at least once: 1) make all writer FDs ready by draining the pipe 2) fetch the list of event reported by epoll(), and add them to the list of seen FDs 3) make all the writers FDs not ready by filling the PIPE 4) fetch the list of event reported by epoll(), and add them to the list of seen FDs With the default 'maxevents' parameters, it never completes. This shows that the FDs returned at step 2 are actually a strict subset of all the ready FDs, and this forever: some FDs which are actually ready (they are all ready at step 2) are never reported. This is starvation. By increasing epoll.wait() maxevents, it completes immediately. > You are creating thousands of duplicate handles for the same resource and then complaining that they do not behave independently! The fact that that the FDs are duped shouldn't change anything to the events reported: it works while the number of FDs is less than FD_SETSIZE (epoll_wait() maxevents argument). See also epoll() documentation: """ Q0 What is the key used to distinguish the file descriptors registered in an epoll set? A0 The key is the combination of the file descriptor number and the open file description (also known as an "open file handle", the kernel's internal representation of an open file). Q1 What happens if you register the same file descriptor on an epoll instance twice? A1 You will probably get EEXIST. However, it is possible to add a duplicate (dup(2), dup2(2), fcntl(2) F_DUPFD) descriptor to the same epoll instance. This can be a useful technique for filtering events, if the duplicate file descriptors are registered with different events masks. """ I just used dup() to make it easier to test, but you'll probably get the same thing it your FDs were sockets connected to different endpoints. > I tried modifing your program by running poll() in a loop, exiting when no more unseen fds are reported as ready. This makes the program exit immediately. > > So > > ready_writers = set(fd for fd, evt in > ep.poll(-1, MAXEVENTS) if fd != r) > seen_writers \|= ready_writers > > becomes > > while True: > ready_writers = set(fd for fd, evt in > ep.poll(-1, MAXEVENTS) if fd != r) > if ready_writers.issubset(seen_writers): > break > seen_writers \|= ready_writers Of course it does, since the returned FDs are a subset of all the ready file descriptors. The point is precisely that, when there are more FDs ready than maxevents, some FDs will never be reported.
msg179207 - (view)	Author: Richard Oudkerk (sbt) *	Date: 2013-01-06 19:20
> The fact that that the FDs are duped shouldn't change anything to the > events reported: it works while the number of FDs is less than > FD_SETSIZE (epoll_wait() maxevents argument). That assumes that epoll_wait() is supposed to return all ready fds. But that is not possible because maxevents is finite. If you want all events then obviously you may need to call epoll_wait() multiple times. > I just used dup() to make it easier to test, but you'll probably get > the same thing it your FDs were sockets connected to different > endpoints. This is the part I disagree with -- I think it makes all the difference. Please try making such a modification. >> while True: >> ready_writers = set(fd for fd, evt in >> ep.poll(-1, MAXEVENTS) if fd != r) >> if ready_writers.issubset(seen_writers): >> break >> seen_writers \|= ready_writers > >Of course it does, since the returned FDs are a subset of all the >ready file descriptors. > >The point is precisely that, when there are more FDs ready than >maxevents, some FDs will never be reported. The program can only terminate when the outer while all_writers - seen_writers: ... loop terminates. So seen_writers == all_writers, and every fd has been reported.
msg179210 - (view)	Author: Charles-François Natali (neologix) *	Date: 2013-01-06 19:42
> That assumes that epoll_wait() is supposed to return all ready fds. But that is not possible because maxevents is finite. If you want all events then obviously you may need to call epoll_wait() multiple times. Yes, but the problem is that between two epoll_wait() calls, the readiness of the FDs can have changed: and if that happens, you'll get the same list over and over. > The program can only terminate when the outer > > while all_writers - seen_writers: > ... > > loop terminates. So seen_writers == all_writers, and every fd has been reported. Yes, but there are no event generated between calls to epoll_wait() in your inner loop. In a typical usage of select()/poll() epoll() will look like: while True: evts = poll() for evt in evts: do_something(fd) and between two calls to poll(), you can get new events (new connections, space in the socket buffer, etc). The pipe filling/draining is used to generate new events. Your modification doesn't take that into account. > This is the part I disagree with -- I think it makes all the difference. Please try making such a modification. Will do.
msg179211 - (view)	Author: Richard Oudkerk (sbt) *	Date: 2013-01-06 19:49
Here is a version which uses epoll to service a number of pipes which is larger than maxevents. (If NUM_WRITERS is too large then I get "OSError: [Errno 24] Too many open files".) All pipes get serviced and the output is: Working with 20 FDs, 5 maxevents [5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43] [15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43] [25, 27, 29, 31, 33, 35, 37, 39, 41, 43] [35, 37, 39, 41, 43] The lists show the (sorted) unseen writers at each loop.
msg179212 - (view)	Author: Charles-François Natali (neologix) *	Date: 2013-01-06 19:54
Of course it does, the write ends all get filled, so the number of ready writers drops to zero... That's not all at the problem I'm talking about (I may not be clear though, a script will make it more clear).
msg179213 - (view)	Author: Richard Oudkerk (sbt) *	Date: 2013-01-06 19:55
> Yes, but the problem is that between two epoll_wait() calls, the > readiness of the FDs can have changed: and if that happens, you'll get > the same list over and over. If an fd was ready but isn't anymore then why would you want to know about it? Trying to use the fd will fail with EAGAIN.
msg179222 - (view)	Author: Charles-François Natali (neologix) *	Date: 2013-01-06 20:37
You're right, I just saw my mistake. Sorry for the noise (the extra performance arguement is not a good enough motivation to tune this dynaically). Closing.

History
Date	User	Action	Args
2022-04-11 14:57:40	admin	set	github: 61077
2013-01-06 20:37:39	neologix	set	status: open -> closed resolution: not a bug messages: + msg179222 stage: needs patch -> resolved
2013-01-06 19:55:34	sbt	set	messages: + msg179213
2013-01-06 19:54:37	neologix	set	messages: + msg179212
2013-01-06 19:49:15	sbt	set	files: + test_epoll_2.py messages: + msg179211
2013-01-06 19:42:05	neologix	set	messages: + msg179210
2013-01-06 19:20:15	sbt	set	messages: + msg179207
2013-01-06 18:14:48	neologix	set	messages: + msg179201
2013-01-06 15:45:36	sbt	set	messages: + msg179189
2013-01-06 14:53:57	pitrou	set	nosy: + pitrou messages: + msg179185
2013-01-06 14:37:57	neologix	set	files: + test_epoll.py messages: + msg179183
2013-01-06 13:12:45	sbt	set	nosy: + sbt messages: + msg179180
2013-01-06 00:33:24	neologix	create