classification
Title: multiprocessing.Pool hangs when issuing KeyboardInterrupt
Type: behavior Stage: test needed
Components: Library (Lib) Versions: Python 2.7, Python 2.6
process
Status: closed Resolution: duplicate
Dependencies: Superseder:
Assigned To: Nosy List: Albert.Strasheim, asksol, gdb, gkcn, jnoller, mdengler, myint, pitrou, sbt, untitaker, vinay.sajip, vlasovskikh
Priority: normal Keywords: patch

Created on 2010-04-03 03:37 by vlasovskikh, last changed 2014-08-24 01:49 by brian.curtin. This issue is now closed.

Files
File name Uploaded Description Edit
test_map_keyboard_interrput.py vlasovskikh, 2010-04-06 18:21 Unit test
fix-sigint.diff vlasovskikh, 2010-04-06 18:32 First patch fixing the problem
Messages (15)
msg102219 - (view) Author: Andrey Vlasovskikh (vlasovskikh) * Date: 2010-04-03 03:37
multiprocessing.Pool methods map, imap, etc. are said to be able to normally handle exceptions. But it seems that it is true only for synchronous exceptions inside their first func arguments.

When (typically during a long-running parallel map) a user hits ^C, an asynchronous KeyboardInterrupt isn't handled properly and leads to the interpreter hangup. More precisely, children processes become <defunct> (on Linux), so the only way to terminate the whole program is to issue the KILL signal.

As stopping a program with ^C while running potentially long parallel computations is probably quite a common scenario, the interpreter should not hang up in such a case.

I'm using Python 2.6.5 (r265:79063, Mar 23 2010, 04:44:21) [GCC 4.4.3] on linux2. I've also tried to use the current multiprocessing.pool module from the current (2.7) trunk with my 2.6.5 installation, but the bug persists.
msg102220 - (view) Author: Brian Curtin (brian.curtin) * (Python committer) Date: 2010-04-03 03:39
Do you have a test case which can reproduce the issue?
msg102221 - (view) Author: Andrey Vlasovskikh (vlasovskikh) * Date: 2010-04-03 03:45
Yes, here is my test case.
msg102222 - (view) Author: Brian Curtin (brian.curtin) * (Python committer) Date: 2010-04-03 03:54
You might want to take a look here: http://jessenoller.com/2009/01/08/multiprocessingpool-and-keyboardinterrupt/
msg102223 - (view) Author: Andrey Vlasovskikh (vlasovskikh) * Date: 2010-04-03 04:07
Yes, I've come up with the same solution by myself, but it cannot cover all the cases of the bug. It works only for cases when ^C is hit during a call to the users' function: http://stackoverflow.com/questions/1408356/keyboard-interrupts-with-pythons-multiprocessing-pool/2561809#2561809

If the user is "lucky", he may hit ^C during getting or putting data into the queues in multiprocessing.pool.worker. To reproduce such a case, you may insert `sleep(10)` before `task = get()` or `put((job, i, result))`, for example. I've encountered such cases just by running test examples several times.
msg102479 - (view) Author: Andrey Vlasovskikh (vlasovskikh) * Date: 2010-04-06 18:21
Despite of several workarounds available on the Web, the problem persists. Almost any exception that is rised in `worker` function while putting or getting tasks from queues result in Pool hang up. Currently, `worker` is only aware of Exception descendants rised inside of the map function parameter.

I've written a unit test that checks if KeyboardInterrupts are handled normally. The source code may be included in `Lib/test/test_multiprocessing.py`.
msg102481 - (view) Author: Andrey Vlasovskikh (vlasovskikh) * Date: 2010-04-06 18:32
Here is a patch that fixes this problem. Basically, it catches all the BaseExceptions that could happen during: a) getting a task from the `inqueue`, b) calling a user function, c) putting a task into the `outqueue`. The exception handler puts the exception to the `outqueue`.

It can be cleanly applied on top of revision 78790.
msg114817 - (view) Author: Albert Strasheim (Albert.Strasheim) Date: 2010-08-24 20:26
Any chance of getting this patch applied? Thanks.
msg114821 - (view) Author: Ask Solem (asksol) (Python committer) Date: 2010-08-24 20:40
This is related to our discussions at #9205 as well (http://bugs.python.org/issue9205), as the final patch there will also fix this issue.
msg114888 - (view) Author: Ask Solem (asksol) (Python committer) Date: 2010-08-25 08:43
On closer look your patch is also ignoring SystemExit. I think it's beneficial to honor SystemExit, so a user could use this as a means to replace the current process with a new one.

If we keep that behavior, the real problem here is that the
result handler hangs if the process that reserved a job is gone, which is going to be handled
by #9205. Should we mark it as a duplicate?
msg114900 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2010-08-25 13:30
> If we keep that behavior, the real problem here is that the
> result handler hangs if the process that reserved a job is gone, which is going to be handled
> by #9205. Should we mark it as a duplicate?

I would tend to agree with your assessment; we're better served just
gracefully handling everything per 9205
msg114976 - (view) Author: Andrey Vlasovskikh (vlasovskikh) * Date: 2010-08-26 13:48
> On closer look your patch is also ignoring SystemExit. I think it's beneficial to honor SystemExit, so a user could use this as a means to replace the current process with a new one.

Yes, SystemExit should cancel all the tasks that are currently in the queue. I guess my patch doesn't handle this properly.

> If we keep that behavior, the real problem here is that the result handler hangs if the process that reserved a job is gone, which is going to be handled by #9205. Should we mark it as a duplicate?

Yes, I think that #9205 covers this issue, so #8296 may be marked as a duplicate.
msg143077 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2011-08-27 15:24
Closing, as Andrey Vlasovskikh has agreed that this is a duplicate of #9205.
msg143080 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-08-27 15:37
Note that #9205 fixed concurrent.futures, but not multiprocessing.Pool which is a different kettle of fish.
msg221549 - (view) Author: Markus Unterwaditzer (untitaker) Date: 2014-06-25 13:38
Can this issue or #9205 be reopened as this particular instance of the problem doesn't seem to be resolved? I still seem to need the workaround from http://stackoverflow.com/a/1408476
History
Date User Action Args
2014-08-24 01:49:59brian.curtinsetnosy: - brian.curtin
2014-08-22 20:55:07myintsetnosy: + myint
2014-06-25 18:42:24ned.deilysetnosy: + sbt
2014-06-25 13:38:04untitakersetnosy: + untitaker
messages: + msg221549
2014-05-12 13:58:09mdenglersetnosy: + mdengler
2011-08-27 15:37:11pitrousetnosy: + pitrou
messages: + msg143080
2011-08-27 15:24:23vinay.sajipsetstatus: open -> closed

nosy: + vinay.sajip
messages: + msg143077

resolution: duplicate
2011-04-27 14:18:58gkcnsetnosy: + gkcn
2010-08-27 06:44:16gdbsetnosy: + gdb
2010-08-26 13:48:44vlasovskikhsetmessages: + msg114976
2010-08-25 13:30:55jnollersetmessages: + msg114900
2010-08-25 08:43:30asksolsetmessages: + msg114888
2010-08-24 20:40:45asksolsetmessages: + msg114821
2010-08-24 20:29:50brian.curtinsetnosy: + asksol
2010-08-24 20:26:48Albert.Strasheimsetmessages: + msg114817
2010-08-24 20:25:14Albert.Strasheimsetnosy: + Albert.Strasheim
2010-04-06 20:35:08brian.curtinsetnosy: + jnoller
2010-04-06 18:32:04vlasovskikhsetfiles: + fix-sigint.diff
keywords: + patch
messages: + msg102481
2010-04-06 18:23:24vlasovskikhsetfiles: - test_pool_keyboardinterrupt.py
2010-04-06 18:21:41vlasovskikhsetfiles: + test_map_keyboard_interrput.py

messages: + msg102479
2010-04-03 04:07:48vlasovskikhsetmessages: + msg102223
2010-04-03 03:54:01brian.curtinsetmessages: + msg102222
2010-04-03 03:45:53vlasovskikhsetfiles: + test_pool_keyboardinterrupt.py

messages: + msg102221
2010-04-03 03:39:13brian.curtinsetnosy: + brian.curtin
messages: + msg102220

type: crash -> behavior
stage: test needed
2010-04-03 03:37:14vlasovskikhcreate