Message 103922 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	kristjan.jonsson
Recipients	beazley, dabeaz, flox, kristjan.jonsson, loewis, pitrou, r.david.murray, techtonik, torsten
Date	2010-04-21.23:22:03
SpamBayes Score	1.1436755e-06
Marked as misclassified	No
Message-id	<1271892126.6.0.336904776906.issue8299@psf.upfronthosting.co.za>
In-reply-to

Content
David, trying to get some more realistic IO benchmarks I did some more tests. The idea is to have a threaded socket server, serving requests that take different amounts of time to process, and see how io response measures up for two classes of requests being serviced simultaneously. Please see the evalsrv.rar for the client and server scripts. The client uses multiprocessing to distance itself from the GIL issue. The results on my dual core windows box are as follows (LEGACY_GIL is the mac, unfair GIL, ROUNDROBIN_GIL is the same with the fairness fix. "with affinity" means that the server process is restricted to running on one core. label time avg time std.dev serial ((30, 500), (2.7145072907310848, 0.09047581466359553, 0.0041867466462554535)) ((300, 10), (0.46542703053041656, 0.0015481250643121787, 0.0002282114778449236)) 3.36s (3.18s) (total time, sum of individual classes) simultaneous ((30, 500), (2.8820070707310563, 0.09605833283280416, 0.004430198440914231)) ((300, 10), (3.2082978423235358, 0.010690429928943495, 0.014415958519681225)) 3.21s (6.09s) (for each test, you get the indvidual timing for each request class, and then a sum of total time and sum of individual times.) Please don't read too much into small differences, this is a roughly one-off test here and likely contains noise. A few things become apparent: 1) with LEGACY_GIL, affinity appears not to matter. The 300 fast requests take longer to complete than the 30 slow requests if done in parallel, even though their serial execution time is roughly 1/5th. 2) With ROUNDROBIN_GIL, serial performance appears not to be affected, but simultaneous performance is much better: end-to-end time is the same, but the sum of individual classes is lower. That means that the client had to wait less for their IO results. 3) With ROUNDROBIN_GIL, if we put affinity on, we get the same kind of performance as with the LEGACY_GIL. The most important points here are the two last ones, I think. The fact that the sum of the individual request waits goes down is significant, and it is by no small amount that it drops. But equally perplexing is the fact that forcing the server to one cpu, removes the "fairness" again. It would appear that the behaviour of the synchronization object (an windows Semaphore in this case) changes depending on the number of cores, just as you had previously mentioned. This is, however, a windows only effect, I think. I must try to find out what is going on.

David, trying to get some more realistic IO benchmarks I did some more tests.  The idea is to have a threaded socket server, serving requests that take different amounts of time to process, and see how io response measures up for two classes of requests being serviced simultaneously.

Please see the evalsrv.rar for the client and server scripts.  The client uses multiprocessing to distance itself from the GIL issue.  The results on my dual core windows box are as follows (LEGACY_GIL is the mac, unfair GIL, ROUNDROBIN_GIL is the same with the fairness fix.  "with affinity" means that the server process is restricted to running on one core.

label        time                avg time             std.dev 

serial
((30, 500), (2.7145072907310848, 0.09047581466359553, 0.0041867466462554535))
((300, 10), (0.46542703053041656, 0.0015481250643121787, 0.0002282114778449236))

3.36s (3.18s) (total time, sum of individual classes)
simultaneous
((30, 500), (2.8820070707310563, 0.09605833283280416, 0.004430198440914231))
((300, 10), (3.2082978423235358, 0.010690429928943495, 0.014415958519681225))
3.21s (6.09s)

(for each test, you get the indvidual timing for each request class, and then a sum of total time and sum of individual times.)
Please don't read too much into small differences, this is a roughly one-off test here and likely contains noise.
A few things become apparent:
1) with LEGACY_GIL, affinity appears not to matter.  The 300 fast requests take longer to complete than the 30 slow requests if done in parallel, even though their serial execution time is roughly 1/5th.
2) With ROUNDROBIN_GIL, serial performance appears not to be affected, but simultaneous performance is much better:  end-to-end time is the same, but the sum of individual classes is lower.  That means that the client had to wait less for their IO results.
3) With ROUNDROBIN_GIL, if we put affinity on, we get the same kind of performance as with the LEGACY_GIL.


The most important points here are the two last ones, I think.  The fact that the sum of the individual request waits goes down is significant, and it is by no small amount that it drops.  But equally perplexing is the fact that forcing the server to one cpu, removes the "fairness" again.  It would appear that the behaviour of the synchronization object (an windows Semaphore in this case) changes depending on the number of cores, just as you had previously mentioned.  This is, however, a windows only effect, I think.  I must try to find out what is going on.

History
Date	User	Action	Args
2010-04-21 23:22:07	kristjan.jonsson	set	recipients: + kristjan.jonsson, loewis, beazley, pitrou, techtonik, r.david.murray, flox, dabeaz, torsten
2010-04-21 23:22:06	kristjan.jonsson	set	messageid: <1271892126.6.0.336904776906.issue8299@psf.upfronthosting.co.za>
2010-04-21 23:22:04	kristjan.jonsson	link	issue8299 messages
2010-04-21 23:22:03	kristjan.jonsson	create