msg89989 - (view) |
Author: Jesús Cea Avión (jcea) * |
Date: 2009-07-01 16:49 |
In Python 2.6 we added support for Linux "epoll" and *BSD "kqueue" in
the select module. I think we should add support for Solaris "poll"
interface too.
What do you think?.
I volunteer to do the work, if you agree this is a feature we want to
have. I think so.
|
msg89991 - (view) |
Author: Jean-Paul Calderone (exarkun) * |
Date: 2009-07-01 17:23 |
Solaris 10 introduced "The Event Completion Framework". I am not
particularly familiar with Solaris, so I couldn't say whether it would
be better to target this or the older /dev/poll. Some documentation
suggests that "The Event Completion Framework" is somewhat preferred:
http://developers.sun.com/solaris/articles/event_completion.html
It suggests that /dev/poll is not as performant, but I'm not sure I
believe it. One feature which it does seem to have which puts it ahead
of /dev/poll is the support of various non-fd event sources (eg, timers).
|
msg146041 - (view) |
Author: Jesús Cea Avión (jcea) * |
Date: 2011-10-20 18:13 |
I want to move this forward.
Apparently, "/dev/poll" could be actually used transparently in python "select.poll()" implementation. The semantics seems to be the same, so we could use the "poll" syscall or "/dev/poll" statically at compiling time, or dinamically at poll object creation time (try to open /dev/poll and go to "poll" syscall if that fails).
Some details:
http://developers.sun.com/solaris/articles/using_devpoll.html
http://developers.sun.com/solaris/articles/polling_efficient.html
I agree that Solaris 10 event framework would be nice to support too, but that would be another feature request.
|
msg146445 - (view) |
Author: Jesús Cea Avión (jcea) * |
Date: 2011-10-26 17:20 |
First version of the patch. Review 0ee4386d8f51.diff http://bugs.python.org/file23526/0ee4386d8f51.diff
Details:
1. Current code aliases "devpoll" in platforms with "/dev/poll" (Solaris and derivatives). Considering all the other points, I think that providing a separate "select.devpoll" object would be a better idea. Opinions?.
2. I am not providing the exception contract documented in "select.poll". For instance, "poll.unregister" of a "fd" not previously registered doesn't raise an exception. Idem "poll.modify", etc. I could do it, but is pointless extra code.
3. I have added a boolean "select.isdevpoll" to indicate if "selec.poll" uses "poll()" or "/dev/poll". Ugly.
4. I release the GIL when waiting for the fds, but not when registering/unregistering fds, etc. I guess that the syscall would be very fast, but I haven't actually measure it.
5. The internal REMOVE is because if you "register" several times the same fd, the events are ORed. To avoid that I do an internal REMOVE first. I should scan the pollfds internally and update the events inplace. Or provide a separate "devpoll" and document this fact...
6. If the number of active fds is bigger that SIZE_DEVPOLL, only SIZE_DEVPOLL fds are returned. If you "poll" several times, you get the SAME fds, if they are still active. So, others active fds can suffer of starvation. Solutions: selftunning (if the module provide 20 slots and the OS provide 20 active fds, call again with 40 free slots, etc) or provide a separate "devpoll" and document this fact, posibly providing a "maxsize" parameter. "select.epoll" uses FD_SETSIZE, directly.
With this, I am starting to think that providing a separate "select.devpoll" is actually a better idea.
Opinions?.
|
msg146457 - (view) |
Author: Jesús Cea Avión (jcea) * |
Date: 2011-10-26 21:22 |
I have decided to segregate "select.devpoll" to a separate object, like "select.epoll".
|
msg146459 - (view) |
Author: Jesús Cea Avión (jcea) * |
Date: 2011-10-26 21:26 |
Solved points 1, 3 and 4.
2 will be solved with the documentation.
5 and 6 still pending.
|
msg146461 - (view) |
Author: Jesús Cea Avión (jcea) * |
Date: 2011-10-26 23:20 |
Documentation added. That solves 2 and 5.
I still have to solve 6.
|
msg147226 - (view) |
Author: Jesús Cea Avión (jcea) * |
Date: 2011-11-07 14:56 |
Please, review.
With current code, each devpoll object has capacity for managing 256 fds, by default. This is about 2048 bytes. The cost seems reasonable, since a normal program will have only a few devpoll objects around. I have considered an optional parameter to tune this, but interaction with rlimit is messy. Even if we manage 65536 fds, the memory cost is about 512Kbytes per devpoll, and you surely can affort it if you are actually managing 65536 descriptors...
The code is not threadsafe. It doesn't crash, but concurrent use of a devpoll has undefined results.
Please, review for integration.
|
msg147316 - (view) |
Author: Jesús Cea Avión (jcea) * |
Date: 2011-11-08 20:58 |
Please, check the new changeset, with all your feedback. Thanks!.
|
msg147350 - (view) |
Author: Jesús Cea Avión (jcea) * |
Date: 2011-11-09 12:12 |
Another changeset. Hopefully, final :-).
Please, review.
|
msg147354 - (view) |
Author: Ross Lagerwall (rosslagerwall) |
Date: 2011-11-09 13:39 |
+ increases this value, c:func:`devpoll` will return a possible
+ incomplete list of active file descriptors.
I think this should change to:
+ increases this value, c:func:`devpoll` will return a possibly
+ incomplete list of active file descriptors.
or even better:
+ increases this value, c:func:`devpoll` may return an
+ incomplete list of active file descriptors.
Cheers
|
msg147355 - (view) |
Author: Jesús Cea Avión (jcea) * |
Date: 2011-11-09 14:44 |
Thanks, Ross. Your suggestion has been committed in my branch.
Waiting for more feedback.
|
msg147358 - (view) |
Author: Ross Lagerwall (rosslagerwall) |
Date: 2011-11-09 16:52 |
Is write()ing a devpoll fd a blocking operation in the kernel?
Does it need to have Py_BEGIN_ALLOW_THREADS around it?
The same question applies for open()ing it.
Obviously, the ioctl() call *is* blocking :-)
|
msg147359 - (view) |
Author: Ross Lagerwall (rosslagerwall) |
Date: 2011-11-09 17:38 |
Also, you can use Py_RETURN_NONE instead of:
+ Py_INCREF(Py_None);
+ return Py_None;
|
msg147365 - (view) |
Author: Jesús Cea Avión (jcea) * |
Date: 2011-11-09 19:14 |
They are operations potentially slow, there are not realtime specifications.
My machine is quite capable (16 CPUs), but let's see what a bit of DTRACE scripting tells us:
First, let's time the open:
"""
syscall::open*:entry
/copyinstr(arg0)=="/dev/poll"/
{
self->ts = timestamp;
}
syscall::open*:return
/self->ts/
{
@stats = quantize(timestamp-self->ts);
self->ts = 0;
}
"""
The result, times in nanoseconds:
"""
value ------------- Distribution ------------- count
2048 | 0
4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4497743
8192 | 17358
16384 | 1592
32768 | 2046
65536 | 812
131072 | 374
262144 | 0
"""
Most "open"s finish in 4 microseconds, but we have a handful of "slow" openings of hundred of microseconds.
Anyway, argueing with the GIL here is a nonissue, since the "open" is only done at object creation, and a sane program will only create a handful of devpoll objets.
Let's see now the write:
"""
syscall::open*:entry
/copyinstr(arg0)=="/dev/poll"/
{
self->ts = timestamp;
}
syscall::open*:return
/self->ts/
{
@stats = quantize(timestamp-self->ts);
self->ts = 0;
}
"""
The results for a single descriptor registered:
"""
value ------------- Distribution ------------- count
256 | 0
512 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4102701
1024 | 28514
2048 | 11269
4096 | 537
8192 | 245
16384 | 377
32768 | 193
65536 | 134
131072 | 71
262144 | 0
"""
Most writes are really fast, half a microsecond, but we have sporadic latencies of a hundred of microseconds.
Re-registering 200 sockets per loop, I have:
"""
value ------------- Distribution ------------- count
512 | 0
1024 | 50
2048 | 94
4096 | 157
8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1112314
16384 |@ 22977
32768 | 1584
65536 | 842
131072 | 168
262144 | 0
"""
Most writes takes around 8 microseconds.
So now the problem is to estimate how much time I need to release & readquire the GIL. For that, and while we don't have DTRACE integration in python (my next project, I hope), I change sourcecode to add a manual probe using the Solaris High Resolution timers, I get a median time of around 550 nanoseconds (half a microsecond), with a few spikes, when the GIL is not contested.
So, unlocking the GIL is adding around half microsecond to the syscalls. This is fast, but comparable with the syscall actual duration. Liberating the GIL is doubling the time to register&unregister a socket (supposely we are doing a single socket per iteration, something realistic in real code).
Being realistic, any work we do with the descriptors (like actually reading or writing it) is going to make this 0.5 microsecond gain pointless. Freeing the GIL, we are protected of any kernel contention, and playing for the rules :-).
Anyway I am open to feedback.
PS: I also checked with GIL contention, and results are comparable. That is surprising, maybe related to the GIL improvements in 3.3. I haven't investigated the issue further.
|
msg147366 - (view) |
Author: Jesús Cea Avión (jcea) * |
Date: 2011-11-09 19:45 |
New changeset, after Ross feedback. Thanks!.
|
msg147367 - (view) |
Author: Ross Lagerwall (rosslagerwall) |
Date: 2011-11-09 19:46 |
That was thorough :-) Seems OK though.
+ if (n < size) {
+ PyErr_SetString(PyExc_IOError, "failed to write all pollfds. "
+ "Please, report in http://bugs.python.org/");
If n < size, it's not a Python error is it? I would say it's the OS's fault.
Otherwise, it looks good... although I don't have currently access to a Solaris box to test it.
|
msg147368 - (view) |
Author: Jesús Cea Avión (jcea) * |
Date: 2011-11-09 19:51 |
The timing for the GIL I am providing is for release&acquiring. That is, all the work. In fact I am having in count too the syscall inside the release&acquire, to account for cache issues.
That is, between the release and the acquire, there is a real syscall. Of course, I am not timing it.
So, the timing I am providing is accurate.
|
msg147369 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2011-11-09 20:01 |
> That was thorough :-) Seems OK though.
>
> + if (n < size) {
> + PyErr_SetString(PyExc_IOError, "failed to write all pollfds. "
> + "Please, report in http://bugs.python.org/");
>
> If n < size, it's not a Python error is it? I would say it's the OS's fault.
No, but it's a Python error if Python wrongly assumes that write() won't
return a partial result. Ideally write() should be retried in a loop
until everything is written out.
|
msg147377 - (view) |
Author: Jesús Cea Avión (jcea) * |
Date: 2011-11-09 21:50 |
The problem with partial writes is that the data is not an unstructured stream of bytes, but a concrete binary dump. You can not simply retry again.
My bet is that "/dev/poll" neves does partial writes.
If I am mistaken, I will bug the Illumos people to help me to solve it. So far, this seems a theorical problema.
Just in case it is not, we raise an exception and urge the programmer to report us the issue.
The other option would be to try to be clever and do things like retrying if the partial write is a multiple of the struct size, but what if it not so?.
Instead of guessing and play games, I rather prefer to know if this is actually a problem in the wild. In my tests, with 65530 sockets, I never saw this "impossible" exception.
|
msg147589 - (view) |
Author: Jesús Cea Avión (jcea) * |
Date: 2011-11-14 12:46 |
New changeset. The only change is:
"""
diff --git a/Modules/selectmodule.c b/Modules/selectmodule.c
--- a/Modules/selectmodule.c
+++ b/Modules/selectmodule.c
@@ -685,8 +685,16 @@
return -1;
}
if (n < size) {
- PyErr_SetString(PyExc_IOError, "failed to write all pollfds. "
- "Please, report in http://bugs.python.org/");
+ /*
+ ** Data writed to /dev/poll is a binary data structure. It is not
+ ** clear what to do if a partial write occurred. For now, raise
+ ** an exception and see if we actually found this problem in
+ ** the wild.
+ */
+ PyErr_Format(PyExc_IOError, "failed to write all pollfds. "
+ "Please, report at http://bugs.python.org/. "
+ "Data to report: Size tried: %d, actual size written: %d",
+ size, n);
return -1;
}
return 0;
"""
If there are no disagreements, I will commit to default (3.3) soon (in a few hours/a day).
Thanks!.
|
msg147626 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2011-11-14 18:18 |
New changeset 8f7ab4bf7ad9 by Jesus Cea in branch 'default':
Issue #6397: Support '/dev/poll' polling objects in select module, under Solaris & derivatives.
http://hg.python.org/cpython/rev/8f7ab4bf7ad9
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:50 | admin | set | github: 50646 |
2011-11-14 18:18:54 | jcea | set | status: open -> closed resolution: fixed |
2011-11-14 18:18:04 | python-dev | set | nosy:
+ python-dev messages:
+ msg147626
|
2011-11-14 12:46:41 | jcea | set | messages:
+ msg147589 |
2011-11-14 12:41:32 | jcea | set | files:
+ 11f08326afd0.diff |
2011-11-09 21:50:34 | jcea | set | messages:
+ msg147377 |
2011-11-09 20:01:17 | pitrou | set | messages:
+ msg147369 |
2011-11-09 19:51:26 | jcea | set | messages:
+ msg147368 |
2011-11-09 19:46:59 | rosslagerwall | set | messages:
+ msg147367 |
2011-11-09 19:45:15 | jcea | set | messages:
+ msg147366 |
2011-11-09 19:39:33 | jcea | set | files:
+ 9d687fdd924d.diff |
2011-11-09 19:14:20 | jcea | set | messages:
+ msg147365 |
2011-11-09 17:40:39 | pitrou | set | nosy:
+ pitrou
|
2011-11-09 17:38:30 | rosslagerwall | set | messages:
+ msg147359 |
2011-11-09 16:52:11 | rosslagerwall | set | messages:
+ msg147358 |
2011-11-09 14:44:12 | jcea | set | messages:
+ msg147355 |
2011-11-09 13:39:11 | rosslagerwall | set | messages:
+ msg147354 |
2011-11-09 12:12:02 | jcea | set | messages:
+ msg147350 |
2011-11-09 12:11:25 | jcea | set | files:
+ 2506e49b9f71.diff |
2011-11-08 20:58:01 | jcea | set | messages:
+ msg147316 |
2011-11-08 20:57:30 | jcea | set | files:
- c47a846bed1a.diff |
2011-11-08 20:56:15 | jcea | set | files:
+ 518b32ce893e.diff |
2011-11-08 17:58:46 | jcea | set | files:
+ c47a846bed1a.diff |
2011-11-08 17:18:54 | giampaolo.rodola | set | nosy:
+ giampaolo.rodola
|
2011-11-07 19:09:25 | rosslagerwall | set | nosy:
+ rosslagerwall
|
2011-11-07 15:05:24 | pitrou | set | nosy:
+ neologix
|
2011-11-07 14:56:48 | jcea | set | messages:
+ msg147226 |
2011-11-07 14:49:03 | jcea | set | files:
+ 528fdd816160.diff |
2011-11-07 14:48:07 | jcea | set | files:
- 0b701eb5e9e3.diff |
2011-10-26 23:24:47 | jcea | set | title: Implementing Solaris "poll" in the "select" module -> Implementing Solaris "/dev/poll" in the "select" module |
2011-10-26 23:20:34 | jcea | set | messages:
+ msg146461 |
2011-10-26 23:18:50 | jcea | set | files:
+ 0b701eb5e9e3.diff |
2011-10-26 23:18:27 | jcea | set | files:
- 6becc4e3eece.diff |
2011-10-26 21:28:04 | jcea | set | files:
+ 6becc4e3eece.diff |
2011-10-26 21:27:30 | jcea | set | files:
- 0ee4386d8f51.diff |
2011-10-26 21:26:37 | jcea | set | messages:
+ msg146459 |
2011-10-26 21:22:06 | jcea | set | messages:
+ msg146457 |
2011-10-26 17:21:07 | jcea | set | keywords:
+ needs review stage: needs patch -> patch review |
2011-10-26 17:20:30 | jcea | set | messages:
+ msg146445 |
2011-10-26 17:05:24 | jcea | set | files:
+ 0ee4386d8f51.diff |
2011-10-26 17:01:56 | jcea | set | files:
- 6ea157b9d110.diff |
2011-10-26 16:43:58 | jcea | set | files:
+ 6ea157b9d110.diff |
2011-10-26 15:35:23 | jcea | set | files:
- d014fd90a487.diff |
2011-10-26 15:33:16 | jcea | set | files:
+ d014fd90a487.diff keywords:
+ patch |
2011-10-26 15:32:11 | jcea | set | hgrepos:
+ hgrepo86 |
2011-10-20 18:13:26 | jcea | set | versions:
+ Python 3.3, - Python 2.7, Python 3.2 |
2011-10-20 18:13:17 | jcea | set | messages:
+ msg146041 |
2009-07-01 17:23:59 | exarkun | set | nosy:
+ exarkun messages:
+ msg89991
|
2009-07-01 16:49:13 | jcea | create | |