classification
Title: New GIL: improve condition variable emulation on NT
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: pitrou Nosy List: brian.curtin, jyasskin, kristjan.jonsson, pitrou, terry.reedy, tim.golden
Priority: normal Keywords: needs review, patch

Created on 2010-04-15 20:35 by kristjan.jonsson, last changed 2010-08-13 15:13 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
nt_cond.patch kristjan.jonsson, 2010-04-15 20:35
nt_cond2.patch kristjan.jonsson, 2010-08-08 20:43 Updated version of the patch
Messages (12)
msg103251 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-04-15 20:35
As per Antoine's suggestion, here is a patch to improve condition variable emulation on windows.

By using the windows Semaphore (which hasn't always been available) all of the problems in emulating condition variables using Events disappear.  in particular, the "lost wakeup" bug can be elimintated, and the same object easily supports bot the "signal" and "broadcast" mode of waking up threads.  We can also use the CRICITAL_SECTION objects as the associated mutex which is much lighter weight than using a kernel object.

I do think I´ve caught all the corner cases.  It is also simpler to prove because this construct is simpler than the previous one.  If, however, i have missed a spot, I'd be interested to know about it.  Not completely impossible since race conditions are devious beasts.  Ive tested this extensively on Windows.

See also issue 8299 where this implementation was presented.
msg113029 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-08-05 19:29
This patches ceval_gil.h. Can these changes be unit tested?
Or is this intended to be an internal performance, invisible-to-the-user, code improvement?
Antoine, are you going to look at this?
msg113089 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-08-06 09:50
The latter.  But I'd really like Antoine to look this over.
msg113301 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-08-08 19:41
Questions:

- why does _cond_timed_wait() decrement n_waiting *twice*?

- why does _cond_timed_wait() use InterlockedDecrement()? it doesn't protect against regular accesses to the same memory location, since it isn't guarded by the mutex

- why don't you simply pass NULL as the third parameter to ReleaseSemaphore() in _cond_signal()?

- I don't understand what you call "possible race condition" in _cond_signal(). _cond_signal() is currently always called with the corresponding mutex held, by the way. We could add a comment to make it mandatory.
msg113303 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-08-08 19:45
Oh, and I would call CreateSemaphore() with a max count of 1, precisely to guard us against bugs.
msg113314 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-08-08 20:27
Thanks for your comments.
The problem with _cond_timed_wait indicates that the timed wait code hadn't been thoroughly tested at the time of submission.

1) The InterlockedDecrement was left in by mistake.  It must be removed
2) The LONG was argument was a vestige from trying to prevent the benign race condition with the _cond_timed_wait.

As for the race condition, it is as follows:
a) Thread A is waiting in _cond_timed_wait(), n_waiting == 1
b) The wait times out and it tries to acquire the lock to decrement n_waiting.  This is a critical lock, because no one is waiting for the semaphore, yet n_waiting == 1
c) Thread B, that has the lock, calls _cond_signal().  It still sees n_waiting == 1 and does a ReleaseSemaphore.  It then also decrements n_waiting, and n_waiting is now 0.  The Semaphore count is now 1
d) Thread A finally gets the lock and decrements n_waiting.  n_waiting is now -1.

n_waiting has thus been decremented twice.  But this is ok, since there was also an extra ReleaseSemaphore call and the semaphore count is now 1.  The upshot of this is that this race condition is benign, and what happens is that the next time someone does _cond_wait(), the semaphore is reduced to 0, n_waiting goes to 0, and the thread passes right through.  Thus, should this event happen, the only upshot of it is that one _cond_wait call will be ineffective.

I'll upload a new patch with better explanation.
msg113315 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-08-08 20:28
"this is a critical lock" should read "this is a critical moment" :)
msg113318 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-08-08 20:43
Updated patch file
msg113400 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-08-09 08:40
You wan't to have the Semaphore with a max count of infinite because
a) of the "race condition" mentioned (actually, possible discrepancy between n_waiting and actual semaphore value), which can cause the semaphore count to temporarily go positive, and
b) If you implement _cond_broadcast(), which would have code like:
  int waiting = cond->n_waiting;
  if (waiting > 0) {
    cond->n_waiting = 0
    ReleaseSemaphore(cond->sem, waiting, 0);
}

The semaphore value going above 0 is not a "bug" but an implementation detail stemming from the fact that n_waiting cannot reliably reflect the semaphore state at each particular time.  Due to locking, it may lag behind the state a little bit.
msg113540 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-08-10 13:50
Thanks Kristján, I've committed the patch with modified comments in r83932.
Unfortunately, the Windows buildbots are in a wreck, so this won't be checked immediately (I only have a single-core XP VM under which the patch ran fine).
msg113559 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-08-10 21:24
Great.  I hope you agree that it is simpler.  I'm afraid my explanations can be somewhat long-winded so I hope you found a better way to document my "pseudo race condition".
msg113783 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-08-13 15:13
Closing as fixed, since nothing strange seems to have appeared so far.
History
Date User Action Args
2010-08-13 15:13:16pitrousetstatus: open -> closed
keywords: patch, patch, needs review
resolution: accepted -> fixed
messages: + msg113783
2010-08-10 21:24:42kristjan.jonssonsetkeywords: patch, patch, needs review
status: pending -> open
messages: + msg113559
2010-08-10 15:17:34ocean-citysetstatus: open -> pending
nosy: - ocean-city
keywords: patch, patch, needs review
2010-08-10 15:16:57ocean-citysetstatus: pending -> open

messages: - msg113544
2010-08-10 14:54:21ocean-citysetstatus: open -> pending
keywords: patch, patch, needs review
2010-08-10 14:53:28ocean-citysetstatus: pending -> open

nosy: + ocean-city
messages: + msg113544

keywords: patch, patch, needs review
2010-08-10 13:50:24pitrousetstatus: open -> pending

assignee: pitrou
title: Improve condition variable emulation on NT -> New GIL: improve condition variable emulation on NT
keywords: patch, patch, needs review
resolution: accepted
messages: + msg113540
stage: patch review -> resolved
2010-08-09 08:40:17kristjan.jonssonsetkeywords: patch, patch, needs review

messages: + msg113400
2010-08-08 20:43:09kristjan.jonssonsetkeywords: patch, patch, needs review
files: + nt_cond2.patch
messages: + msg113318
2010-08-08 20:28:24kristjan.jonssonsetkeywords: patch, patch, needs review

messages: + msg113315
2010-08-08 20:27:29kristjan.jonssonsetkeywords: patch, patch, needs review

messages: + msg113314
2010-08-08 19:45:59pitrousetkeywords: patch, patch, needs review

messages: + msg113303
2010-08-08 19:41:06pitrousetkeywords: patch, patch, needs review
nosy: + jyasskin
messages: + msg113301

2010-08-08 16:05:07pitrousetkeywords: patch, patch, needs review
nosy: + tim.golden, brian.curtin
2010-08-07 07:37:08ocean-cityunlinkissue9116 dependencies
2010-08-06 10:50:04ocean-citylinkissue9116 dependencies
2010-08-06 09:50:04kristjan.jonssonsetkeywords: patch, patch, needs review

messages: + msg113089
2010-08-05 19:29:31terry.reedysetnosy: + terry.reedy
messages: + msg113029

keywords: patch, patch, needs review
type: performance
stage: patch review
2010-04-15 20:35:42kristjan.jonssoncreate