Author gdb
Recipients Albert.Strasheim, asksol, gdb, jnoller, vlasovskikh
Date 2010-08-27.17:59:45
SpamBayes Score 1.91513e-14
Marked as misclassified No
Message-id <1282931987.14.0.321106902103.issue9205@psf.upfronthosting.co.za>
In-reply-to
Content
Hmm, a few notes.  I have a bunch of nitpicks, but those can wait for a later iteration.  (Just one style nit: I noticed a few unneeded whitespace changes... please try not to do that, as it makes the patch harder to read.)

- Am I correct that you handle a crashed worker by aborting all running jobs?  If so:
  - Is this acceptable for your use case?  I'm fine with it, but had been under the impression that we would rather this did not happen.
  - If you're going to the effort of ACKing, why not record the mapping of tasks to workers so you can be more selective in your termination?  Otherwise, what does the ACKing do towards fixing this particular issue?
- I think in the final version you'd need to introduce some interthread locking, because otherwise you're going to have weird race conditions.  I haven't thought too hard about whether you can get away with just catching unexpected exceptions, but it's probably better to do the locking.
- I'm getting hangs infrequently enough to make debugging annoying, and I don't have time to track down the bug right now.  Why don't you strip out any changes that are not needed (e.g. AFAICT, the ACK logic), make sure there aren't weird race conditions, and if we start converging on a patch that looks right from a high level we can try to make it work on all the corner cases?
History
Date User Action Args
2010-08-27 17:59:47gdbsetrecipients: + gdb, jnoller, asksol, vlasovskikh, Albert.Strasheim
2010-08-27 17:59:47gdbsetmessageid: <1282931987.14.0.321106902103.issue9205@psf.upfronthosting.co.za>
2010-08-27 17:59:46gdblinkissue9205 messages
2010-08-27 17:59:45gdbcreate