Author kristjan.jonsson
Recipients beazley, dabeaz, flox, kristjan.jonsson, loewis, pitrou, techtonik, torsten
Date 2010-04-11.11:09:01
SpamBayes Score 9.35896e-09
Marked as misclassified No
Message-id <1270984146.83.0.532372545466.issue8299@psf.upfronthosting.co.za>
In-reply-to
Content
I looked at ccbench.  It's a great tool.  I've added two features to it (see the attached patch)
-y option to turn off the "do_yield" option in throughput, and so measure thread scheduling without assistance, and the throughput option now also computes "balance", which is the standard deviation of the throughput of each thread normalized by the average.

I give you three results for throughput, to demonstrate the ROUNDROBIN_GIL implementation:
1) LEGACY_GIL, no forced switching
C:\pydev\python\trunk\PCbuild>python.exe ..\Tools\ccbench\ccbench.py -y -t
== CPython 2.7a4+.0 (trunk) ==
== AMD64 Windows on 'Intel64 Family 6 Model 23 Stepping 6, GenuineIntel' ==

--- Throughput ---

Pi calculation (Python)

threads= 1:   672 iterations/s. balance
threads= 2:   597 ( 88%)        0.4243
threads= 3:   603 ( 89%)        0.2475
threads= 4:   596 ( 88%)        0.4776

regular expression (C)

threads= 1:   571 iterations/s. balance
threads= 2:   565 ( 98%)        0.6203
threads= 3:   567 ( 99%)        1.6867
threads= 4:   570 ( 99%)        1.1670

SHA1 hashing (C)

threads= 1:  1269 iterations/s. balance
threads= 2:  1268 ( 99%)        1.1470
threads= 3:  1270 (100%)        0.6024
threads= 4:  1263 ( 99%)        0.7419

LEGACY_GIL, with forced switching
C:\pydev\python\trunk\PCbuild>python.exe ..\Tools\ccbench\ccbench.py -t
== CPython 2.7a4+.0 (trunk) ==
== AMD64 Windows on 'Intel64 Family 6 Model 23 Stepping 6, GenuineIntel' ==

--- Throughput ---

Pi calculation (Python)

threads= 1:   663 iterations/s. balance
threads= 2:   605 ( 91%)        0.0232
threads= 3:   599 ( 90%)        0.1988
threads= 4:   601 ( 90%)        0.4648

regular expression (C)

threads= 1:   568 iterations/s. balance
threads= 2:   562 ( 99%)        0.1737
threads= 3:   571 (100%)        0.3950
threads= 4:   566 ( 99%)        0.3158

SHA1 hashing (C)

threads= 1:  1275 iterations/s. balance
threads= 2:  1267 ( 99%)        0.7238
threads= 3:  1271 ( 99%)        0.2405
threads= 4:  1270 ( 99%)        0.1508

Using the forced "do_yield" helps balance things, but not much.  We still have a .7 balance in SHA1 hashing for two threads.

Now, for ROUNDROBIN_GIL, and no forced switching:
C:\pydev\python\trunk\PCbuild>python.exe ..\Tools\ccbench\ccbench.py -t -y
== CPython 2.7a4+.0 (trunk) ==
== AMD64 Windows on 'Intel64 Family 6 Model 23 Stepping 6, GenuineIntel' ==

--- Throughput ---

Pi calculation (Python)

threads= 1:   672 iterations/s. balance
threads= 2:   485 ( 72%)        0.0289
threads= 3:   448 ( 66%)        0.0737
threads= 4:   476 ( 70%)        0.0408

regular expression (C)

threads= 1:   569 iterations/s. balance
threads= 2:   551 ( 96%)        0.0505
threads= 3:   551 ( 96%)        0.1637
threads= 4:   551 ( 96%)        0.2020

SHA1 hashing (C)

threads= 1:  1271 iterations/s. balance
threads= 2:  1262 ( 99%)        0.0111
threads= 3:  1207 ( 94%)        0.0143
threads= 4:  1202 ( 94%)        0.0317

Notice the much better balance value, and this is without the forced sleep.
Also note a lower througput when computing pi with threads.  This is because yielding every 100 opcodes now actually works, and the aforementioned instruction cache problem kicks in.  Increasing the checkinterval to 1000 solves this:
C:\pydev\python\trunk\PCbuild>python.exe ..\Tools\ccbench\ccbench.py -t -y -i100
0
== CPython 2.7a4+.0 (trunk) ==
== AMD64 Windows on 'Intel64 Family 6 Model 23 Stepping 6, GenuineIntel' ==

--- Throughput ---

Pi calculation (Python)

threads= 1:   673 iterations/s. balance
threads= 2:   628 ( 93%)        0.0000
threads= 3:   603 ( 89%)        0.0284
threads= 4:   606 ( 90%)        0.0328

regular expression (C)

threads= 1:   570 iterations/s. balance
threads= 2:   569 ( 99%)        0.2729
threads= 3:   562 ( 98%)        0.6595
threads= 4:   560 ( 98%)        1.2440

SHA1 hashing (C)

threads= 1:  1265 iterations/s. balance
threads= 2:  1256 ( 99%)        0.0000
threads= 3:  1264 ( 99%)        0.0759
threads= 4:  1255 ( 99%)        0.1309

If no one objects, I'd like to submit this changed ccbench.py to the trunk.
History
Date User Action Args
2010-04-11 11:09:07kristjan.jonssonsetrecipients: + kristjan.jonsson, loewis, beazley, pitrou, techtonik, flox, dabeaz, torsten
2010-04-11 11:09:06kristjan.jonssonsetmessageid: <1270984146.83.0.532372545466.issue8299@psf.upfronthosting.co.za>
2010-04-11 11:09:04kristjan.jonssonlinkissue8299 messages
2010-04-11 11:09:03kristjan.jonssoncreate