Message 135896 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	pitrou
Recipients	Albert.Strasheim, aljungberg, asksol, bquinlan, brian.curtin, gdb, gkcn, hongqn, jnoller, neologix, pitrou, vlasovskikh, vstinner
Date	2011-05-13.10:57:53
SpamBayes Score	1.133129e-08
Marked as misclassified	No
Message-id	<1305284268.3561.7.camel@localhost.localdomain>
In-reply-to	<1305283204.11.0.0887437824261.issue9205@psf.upfronthosting.co.za>

Content
> Antoine, I've got a couple questions concerning your patch: > - IIUC, the principle is to create a pipe for each worker process, so > that when the child exits the read-end - sentinel - becomes readable > (EOF) from the parent, so you know that a child exited. Then, before > reading from the the result queue, you perform a select on the list of > sentinels to check that all workers are alive. Am I correct? Not exactly. The select is done on the queue's pipe and on the workers' fds at the same time. Thus there's no race condition. > - have you done some benchmarking to measure the performance impact of > calling select at every get (I'm not saying it will necessary be > noticeable, I'm just curious)? No, but the implementation is not meant to be blazingly fast anyway (after all, it has just been rewritten in Python from C). > - is there a distinction between a normal exit and an abnormal one? Not at that level. In concurrent.futures, a process exiting normally first sends its pid on the result queue. The parent then dequeues the pid and knows the process has ended cleanly. This approach could work for multiprocessing.Pool as well. However, the patch only caters with concurrent.futures indeed. > Finally, I might be missing something completely obvious, but I have > the feeling that POSIX already provides something that could help > solve this issue: process groups. > We could create a new process group for a process pool, and checking > whether children are still alive would be as simple as waitpid(-group, > os.WNOHANG) waitpid() doesn't allow for a timeout, and it doesn't allow to check a pipe concurrently, does it?

> Antoine, I've got a couple questions concerning your patch:
> - IIUC, the principle is to create a pipe for each worker process, so
> that when the child exits the read-end - sentinel - becomes readable
> (EOF) from the parent, so you know that a child exited. Then, before
> reading from the the result queue, you perform a select on the list of
> sentinels to check that all workers are alive. Am I correct?

Not exactly. The select is done on the queue's pipe and on the workers'
fds *at the same time*. Thus there's no race condition.

> - have you done some benchmarking to measure the performance impact of
> calling select at every get (I'm not saying it will necessary be
> noticeable, I'm just curious)?

No, but the implementation is not meant to be blazingly fast anyway
(after all, it has just been rewritten in Python from C).

> - is there a distinction between a normal exit and an abnormal one?

Not at that level. In concurrent.futures, a process exiting normally
first sends its pid on the result queue. The parent then dequeues the
pid and knows the process has ended cleanly.

This approach could work for multiprocessing.Pool as well. However, the
patch only caters with concurrent.futures indeed. 

> Finally, I might be missing something completely obvious, but I have
> the feeling that POSIX already provides something that could help
> solve this issue: process groups.
> We could create a new process group for a process pool, and checking
> whether children are still alive would be as simple as waitpid(-group,
> os.WNOHANG)

waitpid() doesn't allow for a timeout, and it doesn't allow to check a
pipe concurrently, does it?

History
Date	User	Action	Args
2011-05-13 10:57:55	pitrou	set	recipients: + pitrou, bquinlan, vstinner, jnoller, hongqn, brian.curtin, asksol, vlasovskikh, neologix, gdb, Albert.Strasheim, aljungberg, gkcn
2011-05-13 10:57:53	pitrou	link	issue9205 messages
2011-05-13 10:57:53	pitrou	create