This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Add a timeout to multiprocessing's Pool.join
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Will Starms, davin, pitrou
Priority: normal Keywords: patch

Created on 2017-10-13 19:23 by Will Starms, last changed 2022-04-11 14:58 by admin.

Files
File name Uploaded Description Edit
cpython_timeout.patch Will Starms, 2017-10-13 19:23 Cpython 3.6 Pool timeout patch
cpython_raise_timeout.patch Will Starms, 2017-10-13 19:39 Pool join timeout that raises TimeoutError
Messages (3)
msg304350 - (view) Author: Will Starms (Will Starms) * Date: 2017-10-13 19:23
Pool's join function currently (3.6.3) lacks a timeout, which can cause the managing thread to sleep indefinitely when a pool worker hangs or starts misbehaving. Adding a timeout allows the owning thread to attempt a join and, after the timeout, return to other tasks, such as monitoring worker health.

In my specific situation, I have a Pool running a task on a large set of files. If any single task fails, the whole operation is ruined and the pool should be terminated. A task can communicate with the main thread through error_callback, but if the thread has already called join, it can't check until join returns, after the Pool has finished all processing.

Attached is an incredibly simple patch to the current (3.6) cpython implementation that emulates threading.thread.join's behavior.
msg304352 - (view) Author: Will Starms (Will Starms) * Date: 2017-10-13 19:39
A timeout alternative that raises TimeoutError
msg304515 - (view) Author: Will Starms (Will Starms) * Date: 2017-10-17 19:31
I've realized that my patch may not be ideal for general-purpose use, but it's a good start for a discussion on the proper way to implement a timeout.

My patch (which is based on a more involved modification to Pool) assumes that the joins after the first will complete within a timely fashion, which is not necessarily true. While this prevents leaving the pool in a half-joined state, it can still get stuck joining other components or at least take significantly longer than the requested timeout.

Assuming that joining an already-joined object is safe, or it can be wrapped in an if statement to check before rejoining, I feel the best solution would be to reduce the timeout as joins complete, either raising (much easier) or returning (like threading.thread, but makes an is_alive function more difficult) when the remaining timeout time hits zero.
History
Date User Action Args
2022-04-11 14:58:53adminsetgithub: 75963
2017-10-20 17:05:24terry.reedysetnosy: + pitrou, davin

versions: + Python 3.7, - Python 3.6
2017-10-17 19:31:30Will Starmssetmessages: + msg304515
2017-10-13 19:39:14Will Starmssetfiles: + cpython_raise_timeout.patch

messages: + msg304352
2017-10-13 19:23:42Will Starmscreate