Author kristjan.jonsson
Recipients beazley, dabeaz, flox, kristjan.jonsson, loewis, pitrou, techtonik, torsten
Date 2010-04-11.14:46:24
SpamBayes Score 1.73195e-14
Marked as misclassified No
Message-id <1270997187.54.0.0415144685384.issue8299@psf.upfronthosting.co.za>
In-reply-to
Content
David, I don't necessarily think it is reasonable to yield every 100 opcodes, but that is the _intent_ of the current code base. Checkinterval is set to 100.  If you don't want that, then set it higher.  Your statement is like saying: "Why would you want to have your windows fit tightly, it sounds like a horrible thing for the air quality indoors" (I actually got this when living in Germany).  The answer is, of course, that a snugly fitting window can still be opened if you want, but more importantly, you _can_ close it properly.

And because the condition variable isn't strictly FIFO, it actually doesn't switch every time (an observation.  The scheduler may decide o do its own things inside the condition variable / semaphore).  What the ROUNDROBIN_GIL ensures, however, is that the condition variable is _entered_ every checkinterval.  

What I'm trying to demonsrate to you is the brokenness of the legacy GIL (as observed by Antoine long ago) and how it is not broken on windows.  It is broken because the currently running thread is biased to reaquire the GIL immediately in an unpredictable fashion that is not being managed by the (OS) thread scheduler.  Because it doesn't enter the condition variable wait when others are competing for it, the scheduler has no means of providing "fairness" to the application.

So, to summarise this:  I'm not proposing that we context switch every 100 opcodes, but I am proposing that we context switch consistently according to whatever checkinterval is put in place.

Antoine, in case you misunderstood:  I´m saying that the ROUNDROBIN_GIL and the Windows GIL are the same.  If you don't believe me, take a look at the NonRecursiveLock implementation for windows.  I'm also starting to think that you didn't actually bother to look at the patch.  Please compare PyLock_gil_acquire() for LEGACY_GIL and ROUNDROBIN_GIL and see if you can spot the difference.  Really, it's just two lines of code.

Maybe it needs restating. The bug is this (python pseudocode)
with gil.cond:
  while not gil.locked: #this line is the bug
    gil.cond.wait()
  gil.locked = True

vs.

with gil.cond:
  if gil.n_waiting or gil.locked:
    gil.n_waiting += 1
    while True:
      gil.cond.wait() #always wait at least once
      if not gil.locked:
        break
    gil.n_waiting -= 1
  gil.locked = True

 The cond.wait() is where fairness ensues, where the OS can decide to serve threads roughly on a first come, first serve basis. If you are biased towards not entering it at all (when yielding the GIL), then you have taken away the OS' chance of scheduling. 

Antoine (2):  The need to have do_yield is a symptom of the brokenness of the GIL.  You have a checkinterval of 100, which elapses some 1000 times per second, and yet you have to put in place special fudge code to ensure that we do get switches every few seconds?  The whole point of the checkinterval is for you _not_ to have to dot the code with sleep() calls.  Surely you don't expect the average application developer to do that if he wants his two cpu bound threads to compete fairly for the GIL?  This is why I added the -y switch:  To emulate normal application code.

Also, the 0.7 imbalance observed in the SHA1 disappears on windows, (and using ROUNDROBIN_GIL).  It is not due to the windows scheduler, it is due to the broken legacy_gil.


This last slew of comments has been about the ROUNDROBIN_GIL only.  I haven't dazzled you yet with PRIORITY_GIL, but that solves both problems because it is _fair_, and it allows us to increase the checkinterval to 10000, thus elimintating the rapid switching overhead, and yet gives fast response to IO.
History
Date User Action Args
2010-04-11 14:46:27kristjan.jonssonsetrecipients: + kristjan.jonsson, loewis, beazley, pitrou, techtonik, flox, dabeaz, torsten
2010-04-11 14:46:27kristjan.jonssonsetmessageid: <1270997187.54.0.0415144685384.issue8299@psf.upfronthosting.co.za>
2010-04-11 14:46:26kristjan.jonssonlinkissue8299 messages
2010-04-11 14:46:24kristjan.jonssoncreate