This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Option to kill "stuck" workers in a multiprocessing pool
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.3, Python 3.4
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: bquinlan, neologix, paul.moore, pitrou
Priority: normal Keywords:

Created on 2012-02-28 11:42 by paul.moore, last changed 2022-04-11 14:57 by admin.

Messages (3)
msg154549 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2012-02-28 11:42
I have an application which fires off a number of database connections via a multiprocessing pool. Unfortunately, the database software occasionally gets "stuck" and a connection request hangs indefinitely. This locks up the whole process doing the connection, and cannot be interrupted except by killing the process.

It would be useful to have a facility to restart "stuck" workers in this case.

As an interface, I would suggest an additional argument to the AsyncResult.get method, kill_on_timeout. If this argument is true, and the get times out, the worker servicing the result will be killed and restarted.

Alternatively, provide a method on an AsyncResult to access the worker process that is servicing the request. I could then wait on the result and kill the worker manually if it does not respond in time.

Without a facility like this, there is a potential for the pool to get starved of workers if multiple connections hang.
msg154573 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-02-28 21:42
The problem is that queues and other synchronization objects can end up in an inconsistent state when a worker crashes, hangs or gets killed.
That's why, in concurrent.futures, a crashed worker makes the ProcessPoolExecutor become "broken". A similar thing should be done for multiprocessing.Pool but it's a more complex object.
msg154575 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2012-02-28 22:24
As an alternative, maybe leave the "stuck" worker, but allow the pool
to recognise when a worker has not processed new messages for a long
period and spawn an extra worker to replace it. That would avoid the
starvation issue, and the stuck workers would die when the pool is
terminated.
History
Date User Action Args
2022-04-11 14:57:27adminsetgithub: 58356
2012-03-07 20:12:38bquinlansetnosy: + bquinlan
2012-02-28 22:24:47paul.mooresetmessages: + msg154575
2012-02-28 21:42:13pitrousetnosy: + neologix, pitrou
messages: + msg154573
2012-02-28 11:42:59paul.moorecreate