classification
Title: ProcessPoolExecutor deadlock on KeyboardInterrupt
Type: behavior Stage:
Components: Versions: Python 3.6, Python 3.5, Python 2.7
process
Status: open Resolution:
Dependencies: 22393 Superseder:
Assigned To: Nosy List: davin, jacksontj
Priority: normal Keywords: patch

Created on 2015-12-18 19:41 by jacksontj, last changed 2015-12-27 17:11 by davin.

Files
File name Uploaded Description Edit
worker_ignore_interrupt.patch jacksontj, 2015-12-18 19:41
Messages (4)
msg256703 - (view) Author: Thomas Jackson (jacksontj) * Date: 2015-12-18 19:41
If a KeyboardInterrupt is received while a worker process is grabbing an item off of the queue that worker process dies with an uncaught exception. This means that the ProcessPool now has lost a process, and currently has no mechanism to recover from dead processes. This is especially noticeable if the CallItem is relatively large (as the call_queue.get() includes all the pickle time).

A simple fix is to have the worker process not do anything with the keyboard interrupt-- since it would have no idea what to do. This cannot be implemented with a regular try/except-- as the item will be partially pulled off of the queue and lost. My proposed fix is to disable the SIGINT handler in the worker process while getting items off of the queue.

An alternate approach is to actually change multiprocessing.Queue.get() to leave the item on the queue if it is interrupted with a keyboard interrupt

to this is to catch the KeyboardInterrupt and simply continue on-- then we can rely on the caller to do the cleanup.This cannot be done by simply 


Proposed patch attached
msg256705 - (view) Author: Thomas Jackson (jacksontj) * Date: 2015-12-18 19:46
Seems that I accidentally hit submit, so let me finish the last bit of my message here:


An alternate approach is to actually change multiprocessing.Queue.get() to leave the item on the queue if it is interrupted with a keyboard interrupt. Then the worker process could handle the exception in a more meaningful way

It is also interesting to note, that in the event that the caller gets a KeyboardInterrupt there is no `terminate` method which would let you kill jobs before they run. I'm not certain if that should be included in this issue, or if I should file a separate ticket since they are related but different.
msg256711 - (view) Author: Thomas Jackson (jacksontj) * Date: 2015-12-18 21:45
Some more investigation, it seems that the alternate `Queue` fix is a non-starter. From my investigation it seems that the ProcessPoolExecutor is assuming that multiprocess.Queue is gauranteed delivery, and it isn't (because of the pickling). So the issue is that the worker process drops the message if its interrupted while unpickling and the Pool class has no idea-- and assumes that the job is still running. With that being said it seems like my attached patch is probably the most reasonable fix without a major rework of how the ProcessPoolExecutor works.
msg257085 - (view) Author: Davin Potts (davin) * (Python committer) Date: 2015-12-27 17:11
Noting the connection to issue22393.
History
Date User Action Args
2015-12-27 17:11:24davinsetdependencies: + multiprocessing.Pool shouldn't hang forever if a worker process dies unexpectedly
messages: + msg257085
2015-12-27 16:47:57davinsetnosy: + davin
2015-12-25 17:40:12terry.reedysetversions: - Python 3.2, Python 3.3, Python 3.4
2015-12-18 21:45:31jacksontjsetmessages: + msg256711
2015-12-18 19:46:29jacksontjsetmessages: + msg256705
2015-12-18 19:41:10jacksontjcreate