Author jcea
Recipients exarkun, giampaolo.rodola, jcea, neologix, pitrou, rosslagerwall
Date 2011-11-09.19:14:19
SpamBayes Score 3.87301e-13
Marked as misclassified No
Message-id <1320866061.15.0.282892962692.issue6397@psf.upfronthosting.co.za>
In-reply-to
Content
They are operations potentially slow, there are not realtime specifications.

My machine is quite capable (16 CPUs), but let's see what a bit of DTRACE scripting tells us:

First, let's time the open:

"""
syscall::open*:entry
/copyinstr(arg0)=="/dev/poll"/
{
    self->ts = timestamp;
}

syscall::open*:return
/self->ts/
{
    @stats = quantize(timestamp-self->ts);
    self->ts = 0;
}
"""

The result, times in nanoseconds:

"""
           value  ------------- Distribution ------------- count    
            2048 |                                         0        
            4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4497743  
            8192 |                                         17358    
           16384 |                                         1592     
           32768 |                                         2046     
           65536 |                                         812      
          131072 |                                         374      
          262144 |                                         0 
"""

Most "open"s finish in 4 microseconds, but we have a handful of "slow" openings of hundred of microseconds.

Anyway, argueing with the GIL here is a nonissue, since the "open" is only done at object creation, and a sane program will only create a handful of devpoll objets.

Let's see now the write:

"""
syscall::open*:entry
/copyinstr(arg0)=="/dev/poll"/
{
    self->ts = timestamp;
}

syscall::open*:return
/self->ts/
{
    @stats = quantize(timestamp-self->ts);
    self->ts = 0;
}

"""

The results for a single descriptor registered:

"""
           value  ------------- Distribution ------------- count    
             256 |                                         0        
             512 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4102701  
            1024 |                                         28514    
            2048 |                                         11269    
            4096 |                                         537      
            8192 |                                         245      
           16384 |                                         377      
           32768 |                                         193      
           65536 |                                         134      
          131072 |                                         71       
          262144 |                                         0  
"""

Most writes are really fast, half a microsecond, but we have sporadic latencies of a hundred of microseconds.

Re-registering 200 sockets per loop, I have:

"""
           value  ------------- Distribution ------------- count    
             512 |                                         0        
            1024 |                                         50       
            2048 |                                         94       
            4096 |                                         157      
            8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  1112314  
           16384 |@                                        22977    
           32768 |                                         1584     
           65536 |                                         842      
          131072 |                                         168      
          262144 |                                         0 
"""

Most writes takes around 8 microseconds.

So now the problem is to estimate how much time I need to release & readquire the GIL. For that, and while we don't have DTRACE integration in python (my next project, I hope), I change sourcecode to add a manual probe using the Solaris High Resolution timers, I get a median time of around 550 nanoseconds (half a microsecond), with a few spikes, when the GIL is not contested. 

So, unlocking the GIL is adding around half microsecond to the syscalls. This is fast, but comparable with the syscall actual duration. Liberating the GIL is doubling the time to register&unregister a socket (supposely we are doing a single socket per iteration, something realistic in real code).

Being realistic, any work we do with the descriptors (like actually reading or writing it) is going to make this 0.5 microsecond gain pointless. Freeing the GIL, we are protected of any kernel contention, and playing for the rules :-).

Anyway I am open to feedback.

PS: I also checked with GIL contention, and results are comparable. That is surprising, maybe related to the GIL improvements in 3.3. I haven't investigated the issue further.
History
Date User Action Args
2011-11-09 19:14:21jceasetrecipients: + jcea, exarkun, pitrou, giampaolo.rodola, neologix, rosslagerwall
2011-11-09 19:14:21jceasetmessageid: <1320866061.15.0.282892962692.issue6397@psf.upfronthosting.co.za>
2011-11-09 19:14:20jcealinkissue6397 messages
2011-11-09 19:14:19jceacreate