classification
Title: Implement the GIL with critical sections in Windows
Type: performance Stage:
Components: Interpreter Core, Windows Versions: Python 3.2, Python 2.7
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: collinwinter, davidfraser, ishimoto, kevinwatters, loewis, mhammond, pitrou, sitbon
Priority: normal Keywords: patch

Created on 2009-05-27 22:09 by sitbon, last changed 2013-08-15 20:32 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
LockingTest.py sitbon, 2009-05-27 22:11 Very un-scientific GIL testing script
thread_nt.h.patch sitbon, 2009-05-29 19:43 Patch for making the GIL a critical section (breaks thread locks!) review
Messages (7)
msg88452 - (view) Author: Phillip Sitbon (sitbon) Date: 2009-05-27 22:09
At the suggestion of others on the Python-Dev list, I'm going to outline
what I've been doing with the GIL and where I see possiblity for
improvement.

Discussion:
http://mail.python.org/pipermail/python-dev/2009-May/089746.html


Detail:

Currently, Python's global interpreter lock is implemented with the same
mechanism that provides locks for the threading module. Moving the GIL's
low-level handling to a separate mechanism would free core developers
from the feature requirements imposed upon thread locks, specifically:

1. Blocking recursion

A thread can acquire a lock when it has already been acquired, and this
action will block.

2. Release-anywhere

Any thread not owning a lock can release it (provided it is currently
owned), thus preventing permanent deadlock in #1.

These semantics are that of a semaphore with an initial value of 1,
where acquisitions decrease its value and releases increment its value.
This behavior cannot change without changing the fundamental behavior of
the threading.Lock object that many have probably come to expect.

I have been looking into ways to minimize GIL overhead in high
concurrency situations. At the moment, I'm only looking at Windows,
partly because it was an easy target and I have experience with its
synchronization objects. While I have come up with a faster alternative
for GIL handling, it breaks the expectations of thread lock objects. I
then got to wondering:

- Does anyone actually use the GIL-locking/unlocking functions in a
manner that requires the features listed above?

- If so, could it be accomplished a different way if the GIL were
expected to behave more like a recursive mutex?

I'm not going to make any assumptions about what extensions do with the
GIL, but I can hope at least they do use the API functions. If so, the
question is if we can safely apply different locking semantics to the GIL:

- Non-blocking recursion

A thread that already owns the lock can re-acquire it without blocking,
increasing a recursion count.

- Owner-only releasing

Only the thread owning the lock can release it, and only as many times
as it has acquired it.

In Windows, this is how critical sections behave. I'm not sure about the
Linux futex, but I know pthread mutexes and others like it can be made
recursive. I'm mainly interested in separating the GIL from thread locks
so we can say "do x and y with it but not z" without changing the
threading module's lock mechanism.

I decided to make a case of this when I was testing my ISAPI extension
under high load. The current mechanism is as fast as it can possibly be
for a single thread, but does not scale well at all. This has a lot to
do with its use of event synchronization objects. Critical sections, on
the other hand, scale well even when contention is unavoidable. My first
experiment involved modifying the thread lock functions in thread_nt.h
to use critical sections. Yes, it broke the behavior of threading.Lock,
but it still worked for the GIL. The performance improvement can vary,
but it's always noticeable in the presence of concurrency. I saw the
performance of my server extension nearly double when serving up Django
pages from 10-12 threads concurrently.

Obviously we can't just slap on a critical section and call it done,
unless nobody cares about changing what threading.Lock does. That's what
led me believe that a separate GIL mechanism is very important to have.

For your hacking pleasure, I've included a patch for thread_nt.h that
will facilitate GIL performance testing. It includes the critical
section implementation and a semaphore implementation (which is about as
fast as the current behavior). I've also included a _very_ basic script
for some ballpark GIL contention measurements. The numbers will be a bit
exaggerated, especially with a _Py_CheckInterval of 1, but you will see
the difference between the current method and the critical section
method right away. Especially in terms of scalability.

If you want a baseline cost for the iteration loops, change ThreadCount
to 1 and make CheckInterval something big (>1 million). There are other
costs included with the reported numbers, but going between the two lock
implementations should make it clear that any big difference you see is
due to locking/unlocking/contention. Please feel free to correct me if I
did it all wrong - I'm sure there are other/better ways to test GIL
performance.

My testing was done with Python 2.6.2 x86 on Vista x64. I can provide
measurements if it's not convenient for anyone to try out the patch.

So where to go from here?

It'd be good to know if changing the restrictions on GIL usage will
break anything. Also, whether the same performance increase can be had
for Linux/Mac/others.

I'll continue to look into giving the GIL its own locking API, and
hopefully my efforts are not for naught.
msg88459 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-05-28 00:53
I'm not competent to review Windows-specific stuff, but some style notes:
- your indentation is inconsistent with the original file (you should
use tabs)
- please don't use any C++-style comments
msg88480 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-05-28 21:44
> Obviously we can't just slap on a critical section and call it done,
> unless nobody cares about changing what threading.Lock does.

This isn't obvious to me. I do care about what threading.Lock does,
but still fail to see why we can't just slap on a criticial section
and call it done.
msg88488 - (view) Author: Phillip Sitbon (sitbon) Date: 2009-05-28 23:26
> I'm not competent to review Windows-specific stuff, but 
> some style notes:
> - your indentation is inconsistent with the original file
> (you should use tabs)
> - please don't use any C++-style comments

Thanks- I was only supplying the code for testing purposes, but I will
clean it up anyway... and read the style guide closer of course :)

> This isn't obvious to me. I do care about what threading.Lock does,
> but still fail to see why we can't just slap on a criticial section
> and call it done.

If it were up to me, I'd say move forward with breaking the current
functionality. I guess my statement of "obviously" was that breaking it
outright would cause people to complain. After I got to thinking about
it that way, making GIL locking free from any other API seemed like it
would give more leeway for performance improvement.

If anyone is using the semaphore-like features of threading.Lock, they'd
likely be better off using threading.Semaphore anyway :) There are some
places where the core code would have to change in order to reflect the
differences, but otherwise it would be a significant improvement to the
GIL under high concurrency and not much of a change for a single thread.

On the positive side, a global change means that other core code using
locks will also see a speedup with critical sections as the default
thread lock. So I guess my main point is that somehow, whether separate
from or within the thread lock API, the GIL should be a critical section.
msg88526 - (view) Author: Phillip Sitbon (sitbon) Date: 2009-05-29 19:43
Tabified new code and removed one C++-style comment.
msg95180 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-11-13 15:56
Phillip, the GIL implementation has changed completely in the current
py3k (3.2) branch. If you have any specific benchmark to test it, it
would be nice to give it a try.

In any case, using a critical section in thread_nt.h could still be
beneficial for things other than the GIL, assuming it doesn't change the
semantics.
msg195286 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-15 20:32
I'm closing this as out-of-date (due to the new GIL in 3.2 and upwards).
History
Date User Action Args
2013-08-15 20:32:17pitrousetstatus: open -> closed
resolution: out of date
messages: + msg195286
2012-07-30 15:18:24ishimotosetnosy: + ishimoto
2009-11-13 15:56:14pitrousetmessages: + msg95180
versions: + Python 3.2, - Python 3.1
2009-11-11 15:29:01davidfrasersetnosy: + davidfraser
2009-06-01 22:40:32mhammondsetnosy: + mhammond
2009-05-29 19:43:13sitbonsetfiles: + thread_nt.h.patch

messages: + msg88526
2009-05-29 19:41:39sitbonsetfiles: - thread_nt.h.patch
2009-05-28 23:26:45sitbonsetmessages: + msg88488
2009-05-28 23:18:24kevinwatterssetnosy: + kevinwatters
2009-05-28 21:44:20loewissetnosy: + loewis
messages: + msg88480
2009-05-28 00:53:04pitrousetnosy: + pitrou
messages: + msg88459
2009-05-28 00:03:06collinwintersetnosy: + collinwinter
2009-05-27 22:11:13sitbonsetfiles: + LockingTest.py
2009-05-27 22:09:54sitboncreate