Author gdb
Recipients asksol, gdb, jnoller
Date 2010-07-14.15:01:18
SpamBayes Score 0.000125988
Marked as misclassified No
Message-id <1279119680.82.0.0572801240655.issue9205@psf.upfronthosting.co.za>
In-reply-to
Content
Before I forget, looks like we also need to deal with the result from a worker being un-unpickleable:
"""
#!/usr/bin/env python
import multiprocessing
def foo(x):
  global bar
  def bar(x):
    pass
  return bar
p = multiprocessing.Pool(1)
p.apply(foo, [1])
"""

This shouldn't require much more work, but I'll hold off on submitting a patch until we have a better idea of where we're going in this arena.

> Instead of restarting crashed worker processes it will simply bring down
> the pool, right?
Yep.  Again, as things stand, once you've lost an worker, you've lost a task, and you can't really do much about it.  I guess that depends on your application though... is your use-case such that you can lose a task without it mattering?  If tasks are idempotent, one could have the task handler resubmit them, etc..  But really, thinking about the failure modes I've seen (OOM kills/user-initiated interrupt) I'm not sure under what circumstances I'd like the pool to try to recover.

The idea of recording the mapping of tasks -> workers seems interesting.  Getting all of the corner cases could be hard (e.g. making removing a task from the queue and recording which worker did the removing atomic, detecting if the worker crashed while still holding the queue lock) and doing this would require extra mechanism.  This feature does seem to be useful for pools running many different jobs, because that way a crashed worker need only terminate one job.

Anyway, I'd be curious to know more about the kinds of crashes you've encountered from which you'd like to be able to recover.  Is it just Unpickleable exceptions, or are there others?
History
Date User Action Args
2010-07-14 15:01:20gdbsetrecipients: + gdb, jnoller, asksol
2010-07-14 15:01:20gdbsetmessageid: <1279119680.82.0.0572801240655.issue9205@psf.upfronthosting.co.za>
2010-07-14 15:01:19gdblinkissue9205 messages
2010-07-14 15:01:18gdbcreate