Message147365
They are operations potentially slow, there are not realtime specifications.
My machine is quite capable (16 CPUs), but let's see what a bit of DTRACE scripting tells us:
First, let's time the open:
"""
syscall::open*:entry
/copyinstr(arg0)=="/dev/poll"/
{
self->ts = timestamp;
}
syscall::open*:return
/self->ts/
{
@stats = quantize(timestamp-self->ts);
self->ts = 0;
}
"""
The result, times in nanoseconds:
"""
value ------------- Distribution ------------- count
2048 | 0
4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4497743
8192 | 17358
16384 | 1592
32768 | 2046
65536 | 812
131072 | 374
262144 | 0
"""
Most "open"s finish in 4 microseconds, but we have a handful of "slow" openings of hundred of microseconds.
Anyway, argueing with the GIL here is a nonissue, since the "open" is only done at object creation, and a sane program will only create a handful of devpoll objets.
Let's see now the write:
"""
syscall::open*:entry
/copyinstr(arg0)=="/dev/poll"/
{
self->ts = timestamp;
}
syscall::open*:return
/self->ts/
{
@stats = quantize(timestamp-self->ts);
self->ts = 0;
}
"""
The results for a single descriptor registered:
"""
value ------------- Distribution ------------- count
256 | 0
512 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4102701
1024 | 28514
2048 | 11269
4096 | 537
8192 | 245
16384 | 377
32768 | 193
65536 | 134
131072 | 71
262144 | 0
"""
Most writes are really fast, half a microsecond, but we have sporadic latencies of a hundred of microseconds.
Re-registering 200 sockets per loop, I have:
"""
value ------------- Distribution ------------- count
512 | 0
1024 | 50
2048 | 94
4096 | 157
8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1112314
16384 |@ 22977
32768 | 1584
65536 | 842
131072 | 168
262144 | 0
"""
Most writes takes around 8 microseconds.
So now the problem is to estimate how much time I need to release & readquire the GIL. For that, and while we don't have DTRACE integration in python (my next project, I hope), I change sourcecode to add a manual probe using the Solaris High Resolution timers, I get a median time of around 550 nanoseconds (half a microsecond), with a few spikes, when the GIL is not contested.
So, unlocking the GIL is adding around half microsecond to the syscalls. This is fast, but comparable with the syscall actual duration. Liberating the GIL is doubling the time to register&unregister a socket (supposely we are doing a single socket per iteration, something realistic in real code).
Being realistic, any work we do with the descriptors (like actually reading or writing it) is going to make this 0.5 microsecond gain pointless. Freeing the GIL, we are protected of any kernel contention, and playing for the rules :-).
Anyway I am open to feedback.
PS: I also checked with GIL contention, and results are comparable. That is surprising, maybe related to the GIL improvements in 3.3. I haven't investigated the issue further. |
|
Date |
User |
Action |
Args |
2011-11-09 19:14:21 | jcea | set | recipients:
+ jcea, exarkun, pitrou, giampaolo.rodola, neologix, rosslagerwall |
2011-11-09 19:14:21 | jcea | set | messageid: <1320866061.15.0.282892962692.issue6397@psf.upfronthosting.co.za> |
2011-11-09 19:14:20 | jcea | link | issue6397 messages |
2011-11-09 19:14:19 | jcea | create | |
|