Author gdb
Recipients asksol, gdb, jnoller
Date 2010-07-27.09:05:34
SpamBayes Score 0.0155398
Marked as misclassified No
Message-id <1280221538.41.0.58357192877.issue9205@psf.upfronthosting.co.za>
In-reply-to
Content
> You can't have a sensible default timeout, because the worker may be
> processing something important...
In my case, the jobs are either functional or idempotent anyway, so aborting halfway through isn't a problem.  In general though, I'm not sure what kinds of use cases would tolerate silently-dropped jobs.  And for example, if an OOM kill has just occurred, then you're already in a state where a job was unexpectedly terminated... you wouldn't be violating any more contracts by aborting.

In general, I can't help but feel that the approach of "ignore errors and keep going" leads to rather unexpected bugs (and in this case, it leads to infinite hangs).  But even in languages where errors are ignored by default (e.g. sh), there are mechanisms for turning on abort-on-error handlers (e.g. set -e).

So my response is yes, you're right that there's no great default here.  However, I think it'd be worth (at least) letting the user specify "if something goes wrong, then abort".  Keep in mind that this will only happen in very exceptional circumstances anyway.

> Not everything can be simple.
Sure, but given the choice between a simple solution and a complex one, all else being equal the simple one is desirable.  And in this case, the more complicated mechanism seems to introduce subtle race conditions and failures modes.

Anyway, Jesse, it's been a while since we've heard anything from you... do you have thoughts on these issues?  It would probably be useful to get a fresh opinion :).
History
Date User Action Args
2010-07-27 09:05:38gdbsetrecipients: + gdb, jnoller, asksol
2010-07-27 09:05:38gdbsetmessageid: <1280221538.41.0.58357192877.issue9205@psf.upfronthosting.co.za>
2010-07-27 09:05:37gdblinkissue9205 messages
2010-07-27 09:05:34gdbcreate