Message 102823 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	kristjan.jonsson
Recipients	beazley, dabeaz, flox, kristjan.jonsson, loewis, pitrou, techtonik, torsten
Date	2010-04-11.11:09:01
SpamBayes Score	9.358963e-09
Marked as misclassified	No
Message-id	<1270984146.83.0.532372545466.issue8299@psf.upfronthosting.co.za>
In-reply-to

Content
I looked at ccbench. It's a great tool. I've added two features to it (see the attached patch) -y option to turn off the "do_yield" option in throughput, and so measure thread scheduling without assistance, and the throughput option now also computes "balance", which is the standard deviation of the throughput of each thread normalized by the average. I give you three results for throughput, to demonstrate the ROUNDROBIN_GIL implementation: 1) LEGACY_GIL, no forced switching C:\pydev\python\trunk\PCbuild>python.exe ..\Tools\ccbench\ccbench.py -y -t == CPython 2.7a4+.0 (trunk) == == AMD64 Windows on 'Intel64 Family 6 Model 23 Stepping 6, GenuineIntel' == --- Throughput --- Pi calculation (Python) threads= 1: 672 iterations/s. balance threads= 2: 597 ( 88%) 0.4243 threads= 3: 603 ( 89%) 0.2475 threads= 4: 596 ( 88%) 0.4776 regular expression (C) threads= 1: 571 iterations/s. balance threads= 2: 565 ( 98%) 0.6203 threads= 3: 567 ( 99%) 1.6867 threads= 4: 570 ( 99%) 1.1670 SHA1 hashing (C) threads= 1: 1269 iterations/s. balance threads= 2: 1268 ( 99%) 1.1470 threads= 3: 1270 (100%) 0.6024 threads= 4: 1263 ( 99%) 0.7419 LEGACY_GIL, with forced switching C:\pydev\python\trunk\PCbuild>python.exe ..\Tools\ccbench\ccbench.py -t == CPython 2.7a4+.0 (trunk) == == AMD64 Windows on 'Intel64 Family 6 Model 23 Stepping 6, GenuineIntel' == --- Throughput --- Pi calculation (Python) threads= 1: 663 iterations/s. balance threads= 2: 605 ( 91%) 0.0232 threads= 3: 599 ( 90%) 0.1988 threads= 4: 601 ( 90%) 0.4648 regular expression (C) threads= 1: 568 iterations/s. balance threads= 2: 562 ( 99%) 0.1737 threads= 3: 571 (100%) 0.3950 threads= 4: 566 ( 99%) 0.3158 SHA1 hashing (C) threads= 1: 1275 iterations/s. balance threads= 2: 1267 ( 99%) 0.7238 threads= 3: 1271 ( 99%) 0.2405 threads= 4: 1270 ( 99%) 0.1508 Using the forced "do_yield" helps balance things, but not much. We still have a .7 balance in SHA1 hashing for two threads. Now, for ROUNDROBIN_GIL, and no forced switching: C:\pydev\python\trunk\PCbuild>python.exe ..\Tools\ccbench\ccbench.py -t -y == CPython 2.7a4+.0 (trunk) == == AMD64 Windows on 'Intel64 Family 6 Model 23 Stepping 6, GenuineIntel' == --- Throughput --- Pi calculation (Python) threads= 1: 672 iterations/s. balance threads= 2: 485 ( 72%) 0.0289 threads= 3: 448 ( 66%) 0.0737 threads= 4: 476 ( 70%) 0.0408 regular expression (C) threads= 1: 569 iterations/s. balance threads= 2: 551 ( 96%) 0.0505 threads= 3: 551 ( 96%) 0.1637 threads= 4: 551 ( 96%) 0.2020 SHA1 hashing (C) threads= 1: 1271 iterations/s. balance threads= 2: 1262 ( 99%) 0.0111 threads= 3: 1207 ( 94%) 0.0143 threads= 4: 1202 ( 94%) 0.0317 Notice the much better balance value, and this is without the forced sleep. Also note a lower througput when computing pi with threads. This is because yielding every 100 opcodes now actually works, and the aforementioned instruction cache problem kicks in. Increasing the checkinterval to 1000 solves this: C:\pydev\python\trunk\PCbuild>python.exe ..\Tools\ccbench\ccbench.py -t -y -i100 0 == CPython 2.7a4+.0 (trunk) == == AMD64 Windows on 'Intel64 Family 6 Model 23 Stepping 6, GenuineIntel' == --- Throughput --- Pi calculation (Python) threads= 1: 673 iterations/s. balance threads= 2: 628 ( 93%) 0.0000 threads= 3: 603 ( 89%) 0.0284 threads= 4: 606 ( 90%) 0.0328 regular expression (C) threads= 1: 570 iterations/s. balance threads= 2: 569 ( 99%) 0.2729 threads= 3: 562 ( 98%) 0.6595 threads= 4: 560 ( 98%) 1.2440 SHA1 hashing (C) threads= 1: 1265 iterations/s. balance threads= 2: 1256 ( 99%) 0.0000 threads= 3: 1264 ( 99%) 0.0759 threads= 4: 1255 ( 99%) 0.1309 If no one objects, I'd like to submit this changed ccbench.py to the trunk.

I looked at ccbench.  It's a great tool.  I've added two features to it (see the attached patch)
-y option to turn off the "do_yield" option in throughput, and so measure thread scheduling without assistance, and the throughput option now also computes "balance", which is the standard deviation of the throughput of each thread normalized by the average.

I give you three results for throughput, to demonstrate the ROUNDROBIN_GIL implementation:
1) LEGACY_GIL, no forced switching
C:\pydev\python\trunk\PCbuild>python.exe ..\Tools\ccbench\ccbench.py -y -t
== CPython 2.7a4+.0 (trunk) ==
== AMD64 Windows on 'Intel64 Family 6 Model 23 Stepping 6, GenuineIntel' ==

--- Throughput ---

Pi calculation (Python)

threads= 1:   672 iterations/s. balance
threads= 2:   597 ( 88%)        0.4243
threads= 3:   603 ( 89%)        0.2475
threads= 4:   596 ( 88%)        0.4776

regular expression (C)

threads= 1:   571 iterations/s. balance
threads= 2:   565 ( 98%)        0.6203
threads= 3:   567 ( 99%)        1.6867
threads= 4:   570 ( 99%)        1.1670

SHA1 hashing (C)

threads= 1:  1269 iterations/s. balance
threads= 2:  1268 ( 99%)        1.1470
threads= 3:  1270 (100%)        0.6024
threads= 4:  1263 ( 99%)        0.7419

LEGACY_GIL, with forced switching
C:\pydev\python\trunk\PCbuild>python.exe ..\Tools\ccbench\ccbench.py -t
== CPython 2.7a4+.0 (trunk) ==
== AMD64 Windows on 'Intel64 Family 6 Model 23 Stepping 6, GenuineIntel' ==

--- Throughput ---

Pi calculation (Python)

threads= 1:   663 iterations/s. balance
threads= 2:   605 ( 91%)        0.0232
threads= 3:   599 ( 90%)        0.1988
threads= 4:   601 ( 90%)        0.4648

regular expression (C)

threads= 1:   568 iterations/s. balance
threads= 2:   562 ( 99%)        0.1737
threads= 3:   571 (100%)        0.3950
threads= 4:   566 ( 99%)        0.3158

SHA1 hashing (C)

threads= 1:  1275 iterations/s. balance
threads= 2:  1267 ( 99%)        0.7238
threads= 3:  1271 ( 99%)        0.2405
threads= 4:  1270 ( 99%)        0.1508

Using the forced "do_yield" helps balance things, but not much.  We still have a .7 balance in SHA1 hashing for two threads.

Now, for ROUNDROBIN_GIL, and no forced switching:
C:\pydev\python\trunk\PCbuild>python.exe ..\Tools\ccbench\ccbench.py -t -y
== CPython 2.7a4+.0 (trunk) ==
== AMD64 Windows on 'Intel64 Family 6 Model 23 Stepping 6, GenuineIntel' ==

--- Throughput ---

Pi calculation (Python)

threads= 1:   672 iterations/s. balance
threads= 2:   485 ( 72%)        0.0289
threads= 3:   448 ( 66%)        0.0737
threads= 4:   476 ( 70%)        0.0408

regular expression (C)

threads= 1:   569 iterations/s. balance
threads= 2:   551 ( 96%)        0.0505
threads= 3:   551 ( 96%)        0.1637
threads= 4:   551 ( 96%)        0.2020

SHA1 hashing (C)

threads= 1:  1271 iterations/s. balance
threads= 2:  1262 ( 99%)        0.0111
threads= 3:  1207 ( 94%)        0.0143
threads= 4:  1202 ( 94%)        0.0317

Notice the much better balance value, and this is without the forced sleep.
Also note a lower througput when computing pi with threads.  This is because yielding every 100 opcodes now actually works, and the aforementioned instruction cache problem kicks in.  Increasing the checkinterval to 1000 solves this:
C:\pydev\python\trunk\PCbuild>python.exe ..\Tools\ccbench\ccbench.py -t -y -i100
0
== CPython 2.7a4+.0 (trunk) ==
== AMD64 Windows on 'Intel64 Family 6 Model 23 Stepping 6, GenuineIntel' ==

--- Throughput ---

Pi calculation (Python)

threads= 1:   673 iterations/s. balance
threads= 2:   628 ( 93%)        0.0000
threads= 3:   603 ( 89%)        0.0284
threads= 4:   606 ( 90%)        0.0328

regular expression (C)

threads= 1:   570 iterations/s. balance
threads= 2:   569 ( 99%)        0.2729
threads= 3:   562 ( 98%)        0.6595
threads= 4:   560 ( 98%)        1.2440

SHA1 hashing (C)

threads= 1:  1265 iterations/s. balance
threads= 2:  1256 ( 99%)        0.0000
threads= 3:  1264 ( 99%)        0.0759
threads= 4:  1255 ( 99%)        0.1309

If no one objects, I'd like to submit this changed ccbench.py to the trunk.

History
Date	User	Action	Args
2010-04-11 11:09:07	kristjan.jonsson	set	recipients: + kristjan.jonsson, loewis, beazley, pitrou, techtonik, flox, dabeaz, torsten
2010-04-11 11:09:06	kristjan.jonsson	set	messageid: <1270984146.83.0.532372545466.issue8299@psf.upfronthosting.co.za>
2010-04-11 11:09:04	kristjan.jonsson	link	issue8299 messages
2010-04-11 11:09:03	kristjan.jonsson	create