Message 104194 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	dabeaz
Recipients	DazWorrall, aconrad, alex, andrix, brian.curtin, carljm, coderanger, cool-RR, dabeaz, djc, durin42, eric.araujo, eric.smith, flox, jcea, jhylton, karld, kevinwatters, konryd, larry, loewis, mahmoudimus, movement, neologix, nirai, pitrou, rcohen, rh0dium, salgado, tarek, thouis, ysj.ray
Date	2010-04-26.04:13:28
SpamBayes Score	7.2164497e-16
Marked as misclassified	No
Message-id	<1272255210.74.0.106498514844.issue7946@psf.upfronthosting.co.za>
In-reply-to

Content
The attached patch makes two simple refinements to the new GIL implemented in Python 3.2. Each is briefly described below. 1. Changed mechanism for thread time expiration In the current implementation, threads perform a timed-wait on a condition variable. If time expires and no thread switches have occurred, the currently running thread is forced to drop the GIL. In the patch, timeouts are now performed by a special "GIL monitor" thread. This thread runs independently of Python and simply handles time expiration. Basically, it records the number of thread switches, sleeps for a specified interval (5ms), and then looks at the number of thread switches again. If no switches occurred, it forces the currently running thread to drop the GIL. With this monitor thread, it is no longer necessary to perform any timed condition variable waits. This approach has a few subtle benefits. First, threads no longer sit in a wait/timeout cycle when trying to get the GIL (so, there is less overhead). Second, you get FIFO scheduling of threads. When time expires, the thread that has been waiting the longest on the condition variable runs next. Generally, you want this. 2. A very simple two-level priority mechanism A new attribute 'cpu_bound' is added to the PyThreadState structure. If a thread is ever forced to drop the GIL, this attribute is simply set True (1). If a thread gives up the GIL voluntarily, it is set back to False (0). This attribute is used to set up simple scheduling (described next). There are now two separate condition variables (gil_cpu_cond) and (gil_io_cond) that separate waiting threads according to their cpu_bound attribute setting. CPU-bound threads wait on gil_cpu_cond whereas I/O-bound threads wait on gil_io_cond. Using the two condition variables, the following scheduling rules are enforced: - If there are any waiting I/O bound threads, they are always signaled first, before any CPU-bound threads. - If an I/O bound thread wants the GIL, but a CPU-bound thread is running, the CPU-bound thread is immediately forced to drop the GIL. - If a CPU-bound thread wants the GIL, but another CPU-bound thread is running, the running thread is immediately forced to drop the GIL if its time period has already expired. Results ------- This patch gives excellent results for both the ccbench test and all of my previous I/O bound tests. Here is the output: == CPython 3.2a0.0 (py3k:80470:80497M) == == i386 Darwin on 'i386' == --- Throughput --- Pi calculation (Python) threads=1: 871 iterations/s. threads=2: 844 ( 96 %) threads=3: 838 ( 96 %) threads=4: 826 ( 94 %) regular expression (C) threads=1: 367 iterations/s. threads=2: 345 ( 94 %) threads=3: 339 ( 92 %) threads=4: 327 ( 89 %) bz2 compression (C) threads=1: 384 iterations/s. threads=2: 728 ( 189 %) threads=3: 695 ( 180 %) threads=4: 707 ( 184 %) --- Latency --- Background CPU task: Pi calculation (Python) CPU threads=0: 0 ms. (std dev: 0 ms.) CPU threads=1: 0 ms. (std dev: 0 ms.) CPU threads=2: 1 ms. (std dev: 2 ms.) CPU threads=3: 0 ms. (std dev: 1 ms.) CPU threads=4: 0 ms. (std dev: 1 ms.) Background CPU task: regular expression (C) CPU threads=0: 0 ms. (std dev: 0 ms.) CPU threads=1: 2 ms. (std dev: 1 ms.) CPU threads=2: 1 ms. (std dev: 1 ms.) CPU threads=3: 1 ms. (std dev: 1 ms.) CPU threads=4: 2 ms. (std dev: 1 ms.) Background CPU task: bz2 compression (C) CPU threads=0: 0 ms. (std dev: 0 ms.) CPU threads=1: 0 ms. (std dev: 2 ms.) CPU threads=2: 2 ms. (std dev: 3 ms.) CPU threads=3: 0 ms. (std dev: 1 ms.) CPU threads=4: 0 ms. (std dev: 1 ms.) --- I/O bandwidth --- Background CPU task: Pi calculation (Python) CPU threads=0: 5850.9 packets/s. CPU threads=1: 5246.8 ( 89 %) CPU threads=2: 4228.9 ( 72 %) CPU threads=3: 4222.8 ( 72 %) CPU threads=4: 2959.5 ( 50 %) Particular attention should be given to tests involving I/O performance. In particular, here are the results of the I/O bandwidth test using the unmodified GIL: --- I/O bandwidth --- Background CPU task: Pi calculation (Python) CPU threads=0: 6007.1 packets/s. CPU threads=1: 189.0 ( 3 %) CPU threads=2: 19.7 ( 0 %) CPU threads=3: 19.7 ( 0 %) CPU threads=4: 5.1 ( 0 %) Other Benefits -------------- This patch does not involve any complicated libraries, platform specific functionality, low-level lock twiddling, or mathematically complex priority scheduling algorithms. Emphasize: The code is simple. Negative Aspects ---------------- This modification might introduce a starvation effect where CPU-bound threads never get to run if there is an extremely heavy load of I/O-bound threads competing for the GIL. Comparison to BFS ----------------- Still need to test. Would be curious.

The attached patch makes two simple refinements to the new GIL implemented in Python 3.2.   Each is briefly described below.

1. Changed mechanism for thread time expiration

In the current implementation, threads perform a timed-wait on a condition variable.  If time expires and no thread switches have occurred, the currently running thread is forced to drop the GIL.

In the patch, timeouts are now performed by a special "GIL monitor" thread.  This thread runs independently of Python and simply handles time expiration.  Basically, it records the number of thread switches, sleeps for a specified interval (5ms), and then looks at the number of thread switches again.  If no switches occurred, it forces the currently running thread to drop the GIL.

With this monitor thread, it is no longer necessary to perform any timed condition variable waits.  This approach has a few subtle benefits.  First, threads no longer sit in a wait/timeout cycle when trying to get the GIL (so, there is less overhead).   Second, you get FIFO scheduling of threads.  When time expires, the thread that has been waiting the longest on the condition variable runs next.  Generally, you want this.

2. A very simple two-level priority mechanism

A new attribute 'cpu_bound' is added to the PyThreadState structure.  If a thread is ever forced to drop the GIL, this attribute is simply set True (1).  If a thread gives up the GIL voluntarily, it is set back to False (0).  This attribute is used to set up simple scheduling (described next).

There are now two separate condition variables (gil_cpu_cond) and (gil_io_cond) that separate waiting threads according to their cpu_bound attribute setting.  CPU-bound threads wait on gil_cpu_cond whereas I/O-bound threads wait on gil_io_cond. 

Using the two condition variables, the following scheduling rules are enforced:

   - If there are any waiting I/O bound threads, they are always signaled first, before any CPU-bound threads.
   - If an I/O bound thread wants the GIL, but a CPU-bound thread is running, the CPU-bound thread is immediately forced to drop the GIL.
   - If a CPU-bound thread wants the GIL, but another CPU-bound thread is running, the running thread is immediately forced to drop the GIL if its time period has already expired.

Results
-------
This patch gives excellent results for both the ccbench test and all of my previous I/O bound tests.  Here is the output:

== CPython 3.2a0.0 (py3k:80470:80497M) ==
== i386 Darwin on 'i386' ==

--- Throughput ---

Pi calculation (Python)

threads=1: 871 iterations/s.
threads=2: 844 ( 96 %)
threads=3: 838 ( 96 %)
threads=4: 826 ( 94 %)

regular expression (C)

threads=1: 367 iterations/s.
threads=2: 345 ( 94 %)
threads=3: 339 ( 92 %)
threads=4: 327 ( 89 %)

bz2 compression (C)

threads=1: 384 iterations/s.
threads=2: 728 ( 189 %)
threads=3: 695 ( 180 %)
threads=4: 707 ( 184 %)

--- Latency ---

Background CPU task: Pi calculation (Python)

CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 0 ms. (std dev: 0 ms.)
CPU threads=2: 1 ms. (std dev: 2 ms.)
CPU threads=3: 0 ms. (std dev: 1 ms.)
CPU threads=4: 0 ms. (std dev: 1 ms.)

Background CPU task: regular expression (C)

CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 2 ms. (std dev: 1 ms.)
CPU threads=2: 1 ms. (std dev: 1 ms.)
CPU threads=3: 1 ms. (std dev: 1 ms.)
CPU threads=4: 2 ms. (std dev: 1 ms.)

Background CPU task: bz2 compression (C)

CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 0 ms. (std dev: 2 ms.)
CPU threads=2: 2 ms. (std dev: 3 ms.)
CPU threads=3: 0 ms. (std dev: 1 ms.)
CPU threads=4: 0 ms. (std dev: 1 ms.)

--- I/O bandwidth ---

Background CPU task: Pi calculation (Python)

CPU threads=0: 5850.9 packets/s.
CPU threads=1: 5246.8 ( 89 %)
CPU threads=2: 4228.9 ( 72 %)
CPU threads=3: 4222.8 ( 72 %)
CPU threads=4: 2959.5 ( 50 %)

Particular attention should be given to tests involving I/O performance.  In particular, here are the results of the I/O bandwidth test using the unmodified GIL:

--- I/O bandwidth ---

Background CPU task: Pi calculation (Python)

CPU threads=0: 6007.1 packets/s.
CPU threads=1: 189.0 ( 3 %)
CPU threads=2: 19.7 ( 0 %)
CPU threads=3: 19.7 ( 0 %)
CPU threads=4: 5.1 ( 0 %)

Other Benefits
--------------
This patch does not involve any complicated libraries, platform specific functionality, low-level lock twiddling, or mathematically complex priority scheduling algorithms.  Emphasize: The code is simple.

Negative Aspects
----------------
This modification might introduce a starvation effect where CPU-bound threads never get to run if there is an extremely heavy load of I/O-bound threads competing for the GIL.

Comparison to BFS
-----------------
Still need to test. Would be curious.

History
Date	User	Action	Args
2010-04-26 04:13:32	dabeaz	set	recipients: + dabeaz, loewis, jhylton, jcea, pitrou, movement, larry, eric.smith, kevinwatters, tarek, djc, karld, carljm, coderanger, durin42, eric.araujo, nirai, alex, andrix, konryd, brian.curtin, flox, DazWorrall, salgado, cool-RR, rh0dium, rcohen, mahmoudimus, aconrad, ysj.ray, neologix, thouis
2010-04-26 04:13:30	dabeaz	set	messageid: <1272255210.74.0.106498514844.issue7946@psf.upfronthosting.co.za>
2010-04-26 04:13:29	dabeaz	link	issue7946 messages
2010-04-26 04:13:29	dabeaz	create