Issue 6132: Implement the GIL with critical sections in Windows

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/50382

classification

Title:	Implement the GIL with critical sections in Windows
Type:	performance	Stage:
Components:	Interpreter Core, Windows	Versions:	Python 3.2, Python 2.7

process

Status:	closed	Resolution:	out of date
Dependencies:		Superseder:
Assigned To:		Nosy List:	collinwinter, davidfraser, ishimoto, kevinwatters, loewis, mhammond, pitrou, sitbon
Priority:	normal	Keywords:	patch

Created on 2009-05-27 22:09 by sitbon, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
LockingTest.py	sitbon, 2009-05-27 22:11	Very un-scientific GIL testing script
thread_nt.h.patch	sitbon, 2009-05-29 19:43	Patch for making the GIL a critical section (breaks thread locks!)	review

Messages (7)
msg88452 - (view)	Author: Phillip Sitbon (sitbon)	Date: 2009-05-27 22:09
At the suggestion of others on the Python-Dev list, I'm going to outline what I've been doing with the GIL and where I see possiblity for improvement. Discussion: http://mail.python.org/pipermail/python-dev/2009-May/089746.html Detail: Currently, Python's global interpreter lock is implemented with the same mechanism that provides locks for the threading module. Moving the GIL's low-level handling to a separate mechanism would free core developers from the feature requirements imposed upon thread locks, specifically: 1. Blocking recursion A thread can acquire a lock when it has already been acquired, and this action will block. 2. Release-anywhere Any thread not owning a lock can release it (provided it is currently owned), thus preventing permanent deadlock in #1. These semantics are that of a semaphore with an initial value of 1, where acquisitions decrease its value and releases increment its value. This behavior cannot change without changing the fundamental behavior of the threading.Lock object that many have probably come to expect. I have been looking into ways to minimize GIL overhead in high concurrency situations. At the moment, I'm only looking at Windows, partly because it was an easy target and I have experience with its synchronization objects. While I have come up with a faster alternative for GIL handling, it breaks the expectations of thread lock objects. I then got to wondering: - Does anyone actually use the GIL-locking/unlocking functions in a manner that requires the features listed above? - If so, could it be accomplished a different way if the GIL were expected to behave more like a recursive mutex? I'm not going to make any assumptions about what extensions do with the GIL, but I can hope at least they do use the API functions. If so, the question is if we can safely apply different locking semantics to the GIL: - Non-blocking recursion A thread that already owns the lock can re-acquire it without blocking, increasing a recursion count. - Owner-only releasing Only the thread owning the lock can release it, and only as many times as it has acquired it. In Windows, this is how critical sections behave. I'm not sure about the Linux futex, but I know pthread mutexes and others like it can be made recursive. I'm mainly interested in separating the GIL from thread locks so we can say "do x and y with it but not z" without changing the threading module's lock mechanism. I decided to make a case of this when I was testing my ISAPI extension under high load. The current mechanism is as fast as it can possibly be for a single thread, but does not scale well at all. This has a lot to do with its use of event synchronization objects. Critical sections, on the other hand, scale well even when contention is unavoidable. My first experiment involved modifying the thread lock functions in thread_nt.h to use critical sections. Yes, it broke the behavior of threading.Lock, but it still worked for the GIL. The performance improvement can vary, but it's always noticeable in the presence of concurrency. I saw the performance of my server extension nearly double when serving up Django pages from 10-12 threads concurrently. Obviously we can't just slap on a critical section and call it done, unless nobody cares about changing what threading.Lock does. That's what led me believe that a separate GIL mechanism is very important to have. For your hacking pleasure, I've included a patch for thread_nt.h that will facilitate GIL performance testing. It includes the critical section implementation and a semaphore implementation (which is about as fast as the current behavior). I've also included a _very_ basic script for some ballpark GIL contention measurements. The numbers will be a bit exaggerated, especially with a _Py_CheckInterval of 1, but you will see the difference between the current method and the critical section method right away. Especially in terms of scalability. If you want a baseline cost for the iteration loops, change ThreadCount to 1 and make CheckInterval something big (>1 million). There are other costs included with the reported numbers, but going between the two lock implementations should make it clear that any big difference you see is due to locking/unlocking/contention. Please feel free to correct me if I did it all wrong - I'm sure there are other/better ways to test GIL performance. My testing was done with Python 2.6.2 x86 on Vista x64. I can provide measurements if it's not convenient for anyone to try out the patch. So where to go from here? It'd be good to know if changing the restrictions on GIL usage will break anything. Also, whether the same performance increase can be had for Linux/Mac/others. I'll continue to look into giving the GIL its own locking API, and hopefully my efforts are not for naught.
msg88459 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-05-28 00:53
I'm not competent to review Windows-specific stuff, but some style notes: - your indentation is inconsistent with the original file (you should use tabs) - please don't use any C++-style comments
msg88480 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2009-05-28 21:44
> Obviously we can't just slap on a critical section and call it done, > unless nobody cares about changing what threading.Lock does. This isn't obvious to me. I do care about what threading.Lock does, but still fail to see why we can't just slap on a criticial section and call it done.
msg88488 - (view)	Author: Phillip Sitbon (sitbon)	Date: 2009-05-28 23:26
> I'm not competent to review Windows-specific stuff, but > some style notes: > - your indentation is inconsistent with the original file > (you should use tabs) > - please don't use any C++-style comments Thanks- I was only supplying the code for testing purposes, but I will clean it up anyway... and read the style guide closer of course :) > This isn't obvious to me. I do care about what threading.Lock does, > but still fail to see why we can't just slap on a criticial section > and call it done. If it were up to me, I'd say move forward with breaking the current functionality. I guess my statement of "obviously" was that breaking it outright would cause people to complain. After I got to thinking about it that way, making GIL locking free from any other API seemed like it would give more leeway for performance improvement. If anyone is using the semaphore-like features of threading.Lock, they'd likely be better off using threading.Semaphore anyway :) There are some places where the core code would have to change in order to reflect the differences, but otherwise it would be a significant improvement to the GIL under high concurrency and not much of a change for a single thread. On the positive side, a global change means that other core code using locks will also see a speedup with critical sections as the default thread lock. So I guess my main point is that somehow, whether separate from or within the thread lock API, the GIL should be a critical section.
msg88526 - (view)	Author: Phillip Sitbon (sitbon)	Date: 2009-05-29 19:43
Tabified new code and removed one C++-style comment.
msg95180 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-11-13 15:56
Phillip, the GIL implementation has changed completely in the current py3k (3.2) branch. If you have any specific benchmark to test it, it would be nice to give it a try. In any case, using a critical section in thread_nt.h could still be beneficial for things other than the GIL, assuming it doesn't change the semantics.
msg195286 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2013-08-15 20:32
I'm closing this as out-of-date (due to the new GIL in 3.2 and upwards).

History
Date	User	Action	Args
2022-04-11 14:56:49	admin	set	github: 50382
2013-08-15 20:32:17	pitrou	set	status: open -> closed resolution: out of date messages: + msg195286
2012-07-30 15:18:24	ishimoto	set	nosy: + ishimoto
2009-11-13 15:56:14	pitrou	set	messages: + msg95180 versions: + Python 3.2, - Python 3.1
2009-11-11 15:29:01	davidfraser	set	nosy: + davidfraser
2009-06-01 22:40:32	mhammond	set	nosy: + mhammond
2009-05-29 19:43:13	sitbon	set	files: + thread_nt.h.patch messages: + msg88526
2009-05-29 19:41:39	sitbon	set	files: - thread_nt.h.patch
2009-05-28 23:26:45	sitbon	set	messages: + msg88488
2009-05-28 23:18:24	kevinwatters	set	nosy: + kevinwatters
2009-05-28 21:44:20	loewis	set	nosy: + loewis messages: + msg88480
2009-05-28 00:53:04	pitrou	set	nosy: + pitrou messages: + msg88459
2009-05-28 00:03:06	collinwinter	set	nosy: + collinwinter
2009-05-27 22:11:13	sitbon	set	files: + LockingTest.py
2009-05-27 22:09:54	sitbon	create