Author nirai
Recipients DazWorrall, aconrad, alex, andrix, brian.curtin, carljm, coderanger, cool-RR, dabeaz, djc, donaldjeo, durin42, eric.araujo, eric.smith, flox, gregory.p.smith, jcea, jhylton, karld, kevinwatters, konryd, larry, loewis, mahmoudimus, movement, neologix, nirai, pitrou, rcohen, rh0dium, tarek, thouis, ysj.ray
Date 2010-05-14.07:19:20
SpamBayes Score 0.00932192
Marked as misclassified No
Message-id <>
Duck, here comes another update to bfs.patch.

This one with some cleanups which simplify the code and improve behavior (on Windows XP), shutdown code, comments, and "experimental" use of TSC for timestamps, which eliminates timestamp reading overhead.

TSC ( is a fast way to get high precision timing read. On some systems this is what gettimeofday() uses under the hood while on other systems it will use HPET or another source which is slower, typically ~1usec, but can be higher (e.g. my core 2 duo laptop occasionally goes into a few hours of charging 3usec per HPET gettimeofday() call - god knows why)

This overhead is incurred twice for every GIL release/acquire pair and can be eliminated with:
1) Hack the scheduler not to call gettimeofday() when no other threads are waiting to run, or
2) Use TSC on platforms it is available (the Linux BFS scheduler uses TSC).

I took cycle.h pointed by the Wikipedia article on TSC for a spin and it works well on my boxes. It is BSD, (un)maintained? and includes implementation for a gazillion of platforms (I did not yet modify as it recommends). 

If it breaks on your system please ping with details.

Some benchmarks running (Florent's) on Core 2 Quad q9400 Ubuntu 64bit:

bfs.patch - 35K context switches per second, threads balanced, runtime is 3 times that of running IO thread alone:

~/dev/python$ ~/build/python/bfs/python
t1 1.60293507576 1
t2 1.78533816338 1
t1 2.88939499855 2
t2 3.19518113136 2
t1 4.38062310219 3
t2 4.70725703239 3
t1 6.26874804497 4
t2 6.4078810215 4
t1 7.83273100853 5
t2 7.92976212502 5
t1 9.4341750145 6
t2 9.57891893387 6
t1 11.077393055 7
t2 11.164755106 7
t2 12.8495900631 8
t1 12.8979620934 8
t1 14.577999115 9
t2 14.5791089535 9
t1 15.9246580601 10
t2 16.1618289948 10
t1 17.365830183 11
t2 17.7345991135 11
t1 18.9782481194 12
t2 19.2790091038 12
t1 20.4994370937 13
t2 20.5710251331 13

dabeaz_gil.patch - sometimes runs well but sometimes goes into high level of context switches (250K/s) and produces output such as this:

~/dev/python$ ~/build/python/dabeaz/python 
t1 0.742760896683 1
t1 7.50052189827 2
t2 8.63794493675 1
t1 10.1924870014 3

gilinter2.patch - 300K context switches per second, bg threads starved:

~/dev/python$ ~/build/python/gilinter/python 
t2 6.1153190136 1
t2 11.7834780216 2
Date User Action Args
2010-05-14 07:19:43niraisetrecipients: + nirai, loewis, jhylton, gregory.p.smith, jcea, pitrou, movement, larry, eric.smith, kevinwatters, tarek, djc, karld, carljm, coderanger, durin42, eric.araujo, alex, andrix, konryd, brian.curtin, flox, DazWorrall, cool-RR, rh0dium, rcohen, dabeaz, mahmoudimus, aconrad, ysj.ray, neologix, thouis, donaldjeo
2010-05-14 07:19:40niraisetmessageid: <>
2010-05-14 07:19:39nirailinkissue7946 messages
2010-05-14 07:19:37niraicreate