Message 110136 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	gdb
Recipients	asksol, gdb, jnoller
Date	2010-07-12.20:47:55
SpamBayes Score	0.031711493
Marked as misclassified	No
Message-id	<1278967687.18.0.513692146617.issue9205@psf.upfronthosting.co.za>
In-reply-to

Content
Thanks much for taking a look at this! > why are you terminating the second pass after finding a failed > process? Unfortunately, if you've lost a worker, you are no longer guaranteed that cache will eventually be empty. In particular, you may have lost a task, which could result in an ApplyResult waiting forever for a _set call. More generally, my chief assumption that went into this is that the unexpected death of a worker process is unrecoverable. It would be nice to have a better workaround than just aborting everything, but I couldn't see a way to do that. > Unpickleable errors and other errors occurring in the worker body are > not exceptional cases, at least not now that the pool is supervised > by _handle_workers. I could be wrong, but that's not what my experiments were indicating. In particular, if an unpickleable error occurs, then a task has been lost, which means that the relevant map, apply, etc. will wait forever for completion of the lost task. > I think the result should be set also in this case, so the user can > inspect the exception after the fact. That does sound useful. Although, how can you determine the job (and the value of i) if it's an unpickleable error? It would be nice to be able to retrieve job/i without having to unpickle the rest. > For shutdown.patch, I thought this only happened in the worker > handler, but you've enabled this for the result handler too? I don't > care about the worker handler, but with the result handler I'm > worried that I don't know what ignoring these exceptions actually > means. You have a good point. I didn't think about the patch very hard. I've only seen these exceptions from the worker handler, but AFAICT there's no guarantee that bad luck with the scheduler wouldn't result in the same problem in the result handler. One option would be to narrow the breadth of the exceptions caught by _make_shutdown_safe (do we need to catch anything but TypeErrors?). Another option would be to enable only for the worker handler. I don't have a particularly great sense of what the Right Thing to do here is.

Thanks much for taking a look at this!

> why are you terminating the second pass after finding a failed 
> process?
Unfortunately, if you've lost a worker, you are no longer guaranteed that cache will eventually be empty.  In particular, you may have lost a task, which could result in an ApplyResult waiting forever for a _set call.

More generally, my chief assumption that went into this is that the unexpected death of a worker process is unrecoverable.  It would be nice to have a better workaround than just aborting everything, but I couldn't see a way to do that.

> Unpickleable errors and other errors occurring in the worker body are
> not exceptional cases, at least not now that the pool is supervised
> by _handle_workers.
I could be wrong, but that's not what my experiments were indicating.  In particular, if an unpickleable error occurs, then a task has been lost, which means that the relevant map, apply, etc. will wait forever for completion of the lost task.

> I think the result should be set also in this case, so the user can
> inspect the exception after the fact.
That does sound useful.  Although, how can you determine the job (and the value of i) if it's an unpickleable error?  It would be nice to be able to retrieve job/i without having to unpickle the rest.

> For shutdown.patch, I thought this only happened in the worker 
> handler, but you've enabled this for the result handler too? I don't 
> care about the worker handler, but with the result handler I'm 
> worried that I don't know what ignoring these exceptions actually 
> means.
You have a good point.  I didn't think about the patch very hard.  I've only seen these exceptions from the worker handler, but AFAICT there's no guarantee that bad luck with the scheduler wouldn't result in the same problem in the result handler.  One option would be to narrow the breadth of the exceptions caught by _make_shutdown_safe (do we need to catch anything but TypeErrors?).  Another option would be to enable only for the worker handler.  I don't have a particularly great sense of what the Right Thing to do here is.

History
Date	User	Action	Args
2010-07-12 20:48:07	gdb	set	recipients: + gdb, jnoller, asksol
2010-07-12 20:48:07	gdb	set	messageid: <1278967687.18.0.513692146617.issue9205@psf.upfronthosting.co.za>
2010-07-12 20:47:56	gdb	link	issue9205 messages
2010-07-12 20:47:56	gdb	create