Author nirai
Recipients DazWorrall, alex, brian.curtin, carljm, coderanger, dabeaz, eric.smith, flox, jcea, jhylton, karld, kevinwatters, loewis, mahmoudimus, nirai, pitrou, rcohen, rh0dium, tarek
Date 2010-03-25.13:51:54
SpamBayes Score 1.66533e-16
Marked as misclassified No
Message-id <1269525127.79.0.860393046214.issue7946@psf.upfronthosting.co.za>
In-reply-to
Content
I upload an updated bfs.patch. Apply to updated py32 and ignore the error with:

$ patch -fp1 < bfs.patch
$ ./configure


> Please give understandable benchmark numbers, including an explicit comparison with baseline 3.2, and patched 3.2 (e.g. gilinter.patch)

Below.

> Please also measure single-thread performance, because it looks like you are adding significant work inside the core eval loop

Removed most of it now. last bit will be removed soon.

> Do you need a hi-res clock? gettimeofday() already gives you microseconds. It looks like a bit of imprecision shouldn't be detrimental.

I use clock_gettime() to get the thread running time to calculate slice depletion. Wall clock can not help with that.

> The magic number DEADLINE_FACTOR looks gratuitous (why 1.1^20 ?) 

To my understanding it controls the CPU load (~6) beyond which threads tend to expire. Since expired threads are handled in FIFO order, IO threads do not preempt them (IO threads are chronically expired). So beyond that load IO threads become less responsive.

> By the way, I would put COND_SIGNAL inside the LOCK_MUTEX / UNLOCK_MUTEX pair in bfs_yield().

Done.

Here are benchmark results of the UDP test as timed with ipython, where client.work() is a single run of the client:

System: Core 2 Duo (locked at 2.4 GHz) with Ubuntu Karmic 64 bit.

Vanilla Python 3.2: 

* Note on my system the original problem discussed in this issue report does not manifest since conditions wake up threads according to OS scheduling policy.

In [28]: %timeit -n3 client.work()
1.290 seconds (8127084.435 bytes/sec)
1.488 seconds (7045285.926 bytes/sec)
2.449 seconds (4281485.217 bytes/sec)
1.874 seconds (5594303.222 bytes/sec)
1.853 seconds (5659626.496 bytes/sec)
0.872 seconds (12023425.779 bytes/sec)
4.951 seconds (2117942.079 bytes/sec)
0.728 seconds (14409157.126 bytes/sec)
1.743 seconds (6016999.707 bytes/sec)
3 loops, best of 3: 1.53 s per loop

gilinter.patch:

In [31]: %timeit -n3 client.work()
5.192 seconds (2019676.396 bytes/sec)
1.613 seconds (6500071.475 bytes/sec)
3.057 seconds (3429689.199 bytes/sec)
3.486 seconds (3007596.468 bytes/sec)
4.324 seconds (2424791.868 bytes/sec)
0.964 seconds (10872708.606 bytes/sec)
3.510 seconds (2987722.960 bytes/sec)
1.362 seconds (7698999.458 bytes/sec)
1.013 seconds (10353913.920 bytes/sec)
3 loops, best of 3: 1.96 s per loop

PyCON patch:

In [32]: %timeit -n3 client.work()
2.483 seconds (4223256.889 bytes/sec)
1.330 seconds (7882880.263 bytes/sec)
1.737 seconds (6036251.315 bytes/sec)
1.348 seconds (7778296.679 bytes/sec)
0.983 seconds (10670811.638 bytes/sec)
1.419 seconds (7387226.333 bytes/sec)
1.057 seconds (9919412.977 bytes/sec)
2.483 seconds (4223205.791 bytes/sec)
2.121 seconds (4944231.292 bytes/sec)
3 loops, best of 3: 1.25 s per loop

bfs.patch:

In [33]: %timeit -n3 client.work()
0.289 seconds (36341875.356 bytes/sec)
0.271 seconds (38677439.991 bytes/sec)
0.476 seconds (22033958.947 bytes/sec)
0.329 seconds (31872974.070 bytes/sec)
0.478 seconds (21925125.894 bytes/sec)
0.242 seconds (43386204.271 bytes/sec)
0.213 seconds (49195701.418 bytes/sec)
0.309 seconds (33967467.196 bytes/sec)
0.256 seconds (41008076.688 bytes/sec)
3 loops, best of 3: 259 ms per loop


Output of cpued.py test:

Vanilla Python 3.2, gilinter.patch and PyCON patch all starve the pure Python threads and output the following:

$ ~/build/python/python32/python cpued.py 
t0 0 False
t1 0 False
t2-interactive 0 True
t2-interactive 1 True
t2-interactive 2 True
t2-interactive 3 True
t2-interactive 4 True
t2-interactive 5 True
t2-interactive 6 True
t2-interactive 7 True
.
.
.


Output from bfs.patch run:

$ ~/build/python/bfs/python cpued.py 
t0 0 False
t1 0 False
t2-interactive 0 True
t0 1 False
t1 1 False
t2-interactive 1 True
t0 2 False
t1 2 False
t2-interactive 2 True
t0 3 False
t1 3 False
t2-interactive 3 True
.
.
.

Note: I have not tested on other Posix systems, and expect to have some complications on Windows, since its thread timers are low resolution (10ms+), and there are issues with its high-precision wall clock. ...will soon know better.
History
Date User Action Args
2010-03-25 13:52:10niraisetrecipients: + nirai, loewis, jhylton, jcea, pitrou, eric.smith, kevinwatters, tarek, karld, carljm, coderanger, alex, brian.curtin, flox, DazWorrall, rh0dium, rcohen, dabeaz, mahmoudimus
2010-03-25 13:52:07niraisetmessageid: <1269525127.79.0.860393046214.issue7946@psf.upfronthosting.co.za>
2010-03-25 13:52:05nirailinkissue7946 messages
2010-03-25 13:52:02niraicreate