Issue6132
Created on 2009-05-27 22:09 by sitbon, last changed 2009-11-13 15:56 by pitrou.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| LockingTest.py | sitbon, 2009-05-27 22:11 | Very un-scientific GIL testing script | ||
| thread_nt.h.patch | sitbon, 2009-05-29 19:43 | Patch for making the GIL a critical section (breaks thread locks!) | review | |
| Messages (6) | |||
|---|---|---|---|
| msg88452 - (view) | Author: Phillip Sitbon (sitbon) | Date: 2009-05-27 22:09 | |
At the suggestion of others on the Python-Dev list, I'm going to outline what I've been doing with the GIL and where I see possiblity for improvement. Discussion: http://mail.python.org/pipermail/python-dev/2009-May/089746.html Detail: Currently, Python's global interpreter lock is implemented with the same mechanism that provides locks for the threading module. Moving the GIL's low-level handling to a separate mechanism would free core developers from the feature requirements imposed upon thread locks, specifically: 1. Blocking recursion A thread can acquire a lock when it has already been acquired, and this action will block. 2. Release-anywhere Any thread not owning a lock can release it (provided it is currently owned), thus preventing permanent deadlock in #1. These semantics are that of a semaphore with an initial value of 1, where acquisitions decrease its value and releases increment its value. This behavior cannot change without changing the fundamental behavior of the threading.Lock object that many have probably come to expect. I have been looking into ways to minimize GIL overhead in high concurrency situations. At the moment, I'm only looking at Windows, partly because it was an easy target and I have experience with its synchronization objects. While I have come up with a faster alternative for GIL handling, it breaks the expectations of thread lock objects. I then got to wondering: - Does anyone actually use the GIL-locking/unlocking functions in a manner that requires the features listed above? - If so, could it be accomplished a different way if the GIL were expected to behave more like a recursive mutex? I'm not going to make any assumptions about what extensions do with the GIL, but I can hope at least they do use the API functions. If so, the question is if we can safely apply different locking semantics to the GIL: - Non-blocking recursion A thread that already owns the lock can re-acquire it without blocking, increasing a recursion count. - Owner-only releasing Only the thread owning the lock can release it, and only as many times as it has acquired it. In Windows, this is how critical sections behave. I'm not sure about the Linux futex, but I know pthread mutexes and others like it can be made recursive. I'm mainly interested in separating the GIL from thread locks so we can say "do x and y with it but not z" without changing the threading module's lock mechanism. I decided to make a case of this when I was testing my ISAPI extension under high load. The current mechanism is as fast as it can possibly be for a single thread, but does not scale well at all. This has a lot to do with its use of event synchronization objects. Critical sections, on the other hand, scale well even when contention is unavoidable. My first experiment involved modifying the thread lock functions in thread_nt.h to use critical sections. Yes, it broke the behavior of threading.Lock, but it still worked for the GIL. The performance improvement can vary, but it's always noticeable in the presence of concurrency. I saw the performance of my server extension nearly double when serving up Django pages from 10-12 threads concurrently. Obviously we can't just slap on a critical section and call it done, unless nobody cares about changing what threading.Lock does. That's what led me believe that a separate GIL mechanism is very important to have. For your hacking pleasure, I've included a patch for thread_nt.h that will facilitate GIL performance testing. It includes the critical section implementation and a semaphore implementation (which is about as fast as the current behavior). I've also included a _very_ basic script for some ballpark GIL contention measurements. The numbers will be a bit exaggerated, especially with a _Py_CheckInterval of 1, but you will see the difference between the current method and the critical section method right away. Especially in terms of scalability. If you want a baseline cost for the iteration loops, change ThreadCount to 1 and make CheckInterval something big (>1 million). There are other costs included with the reported numbers, but going between the two lock implementations should make it clear that any big difference you see is due to locking/unlocking/contention. Please feel free to correct me if I did it all wrong - I'm sure there are other/better ways to test GIL performance. My testing was done with Python 2.6.2 x86 on Vista x64. I can provide measurements if it's not convenient for anyone to try out the patch. So where to go from here? It'd be good to know if changing the restrictions on GIL usage will break anything. Also, whether the same performance increase can be had for Linux/Mac/others. I'll continue to look into giving the GIL its own locking API, and hopefully my efforts are not for naught. |
|||
| msg88459 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2009-05-28 00:53 | |
I'm not competent to review Windows-specific stuff, but some style notes: - your indentation is inconsistent with the original file (you should use tabs) - please don't use any C++-style comments |
|||
| msg88480 - (view) | Author: Martin v. Löwis (loewis) * ![]() |
Date: 2009-05-28 21:44 | |
> Obviously we can't just slap on a critical section and call it done, > unless nobody cares about changing what threading.Lock does. This isn't obvious to me. I do care about what threading.Lock does, but still fail to see why we can't just slap on a criticial section and call it done. |
|||
| msg88488 - (view) | Author: Phillip Sitbon (sitbon) | Date: 2009-05-28 23:26 | |
> I'm not competent to review Windows-specific stuff, but > some style notes: > - your indentation is inconsistent with the original file > (you should use tabs) > - please don't use any C++-style comments Thanks- I was only supplying the code for testing purposes, but I will clean it up anyway... and read the style guide closer of course :) > This isn't obvious to me. I do care about what threading.Lock does, > but still fail to see why we can't just slap on a criticial section > and call it done. If it were up to me, I'd say move forward with breaking the current functionality. I guess my statement of "obviously" was that breaking it outright would cause people to complain. After I got to thinking about it that way, making GIL locking free from any other API seemed like it would give more leeway for performance improvement. If anyone is using the semaphore-like features of threading.Lock, they'd likely be better off using threading.Semaphore anyway :) There are some places where the core code would have to change in order to reflect the differences, but otherwise it would be a significant improvement to the GIL under high concurrency and not much of a change for a single thread. On the positive side, a global change means that other core code using locks will also see a speedup with critical sections as the default thread lock. So I guess my main point is that somehow, whether separate from or within the thread lock API, the GIL should be a critical section. |
|||
| msg88526 - (view) | Author: Phillip Sitbon (sitbon) | Date: 2009-05-29 19:43 | |
Tabified new code and removed one C++-style comment. |
|||
| msg95180 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2009-11-13 15:56 | |
Phillip, the GIL implementation has changed completely in the current py3k (3.2) branch. If you have any specific benchmark to test it, it would be nice to give it a try. In any case, using a critical section in thread_nt.h could still be beneficial for things other than the GIL, assuming it doesn't change the semantics. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2009-11-13 15:56:14 | pitrou | set | messages:
+ msg95180 versions: + Python 3.2, - Python 3.1 |
| 2009-11-11 15:29:01 | davidfraser | set | nosy:
+ davidfraser |
| 2009-06-01 22:40:32 | mhammond | set | nosy:
+ mhammond |
| 2009-05-29 19:43:13 | sitbon | set | files:
+ thread_nt.h.patch messages: + msg88526 |
| 2009-05-29 19:41:39 | sitbon | set | files: - thread_nt.h.patch |
| 2009-05-28 23:26:45 | sitbon | set | messages: + msg88488 |
| 2009-05-28 23:18:24 | kevinwatters | set | nosy:
+ kevinwatters |
| 2009-05-28 21:44:20 | loewis | set | nosy:
+ loewis messages: + msg88480 |
| 2009-05-28 00:53:04 | pitrou | set | nosy:
+ pitrou messages: + msg88459 |
| 2009-05-28 00:03:06 | collinwinter | set | nosy:
+ collinwinter |
| 2009-05-27 22:11:13 | sitbon | set | files: + LockingTest.py |
| 2009-05-27 22:09:54 | sitbon | create | |
