classification
Title: multiprocessing.pool methods imap()[_unordered()] deadlock
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.5, Python 3.4, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Gregory.Szorc, advance512, davin, josh.r, pitrou, python-dev, sbt, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2014-12-14 22:47 by advance512, last changed 2015-08-10 20:02 by Gregory.Szorc. This issue is now closed.

Files
File name Uploaded Description Edit
Issue_23051_reproducer_v2_7.py advance512, 2014-12-14 22:50 Reproducer of issue for Python 2.7
Issue_23051_fix_v2_7.patch advance512, 2014-12-14 22:53 Fix for Python 2.7
Issue_23051_reproducer_v3_4.py advance512, 2014-12-14 22:57 Reproducer of issue for Python 3.4
Issue_23051_fix_v3_4.patch advance512, 2014-12-14 22:58 Fix for Python 3.4
issue_23051_fix_and_tests_v35_and_v34.patch davin, 2015-02-24 04:53 Ignore -- replaced by newer patch file. review
issue_23051_fix_and_tests_v27.patch davin, 2015-02-24 04:54 Ignore -- replaced by newer patch file. review
issue_23051_revised_fix_and_tests_v35_and_v34.patch davin, 2015-03-06 22:18 Revised, combined patch and unit tests for default/3.5 and 3.4 both (supercedes earlier patch files) review
issue_23051_revised_fix_and_tests_v27.patch davin, 2015-03-06 22:19 Revised, combined patch and unit tests for 2.7 (supercedes earlier patch files) review
issue_23051_4-3.4.patch serhiy.storchaka, 2015-03-07 16:49 review
Messages (13)
msg232643 - (view) Author: Alon Diamant (advance512) * Date: 2014-12-14 22:47
When imap() or imap_unordered() are called with the iterable parameter set as a generator function, and when that generator function raises an exception, then the _task_handler thread (running the method _handle_tasks) dies immediately, without causing the other threads to stop and without reporting the exception to the main thread (that one that called imap()).

I saw this issue in Python 2.7.8, 2.7.9 and 3.4.2. Didn't check other versions, but I assume this is a bug in all Python versions since 2.6.

I will be attaching examples that reproduce this issue, as well as patches for both Python 2.7 and Python 3.4.
msg232644 - (view) Author: Alon Diamant (advance512) * Date: 2014-12-14 23:03
The patches I attached do 2 things:

1. A deadlock is prevented, wherein the main thread waits forever for the Pool thread/s to finish their execution, while they wait for instructions to terminate from the _task_handler thread which has died. Instead, the exception are caught and handled and termination of the pool execution is performed.
2. The exception that was raised is caught and passed to the main thread, and is re-thrown in the context of the main thread - hence the user catch it and handle it, or - at the very least - be aware of the issue.

I tested the patch to the best of my abilities, and am almost certain nothing was changed performance wise nor anything broken. 

Further eyes would, of course, only help for confirming this.
msg234039 - (view) Author: Davin Potts (davin) * (Python committer) Date: 2015-01-14 22:22
Successfully reproduced the behavior playing through variations with the supplied examples (very helpful) on OS X v10.9.5 both in 2.7.9 and the default branch -- adding version 3.5 to issue.

Also successfully verified that the supplied patches do just as advertised on same: OS X v10.9.5, both 2.7.9 and default branch.

The patches' addition of a try around that for block looks like a sensible choice.  At the moment, I do not yet fully understand what the patches' exception block is doing with the 'cache[job]._set' call the way it is, but I would like to spend more time to digest it better -- I hope to do so later this week.
msg234116 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-01-16 08:20
The patch would at least need to add a unit test in order to avoid regressions.
msg236427 - (view) Author: Davin Potts (davin) * (Python committer) Date: 2015-02-23 07:36
For my part, I'm now good with all aspects of the patch supplied by Alon.

I have a working set of tests that will be attached in the next day or two after seeing them behave on more than one platform.
msg236477 - (view) Author: Davin Potts (davin) * (Python committer) Date: 2015-02-24 04:53
Attaching updated patch files that combine both Alon's fix with unit tests for it.  Review of the unit tests especially would be welcomed.

My one misgiving about these unit tests is that if at some future date there were a regression, the unhandled issue would potentially cause this unit test to hang -- it would be nicer if that did not have to be so.

These two combined patches (one for default/3.5 and 3.4, one for 2.7) have been tested on OS X 10.10, Ubuntu Linux 12.04.5 64-bit, and Windows 7 64-bit (though only for the default/3.5 and 3.4 patch on Win7).
msg237358 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-06 14:16
Added comments on Rietveld.
msg237386 - (view) Author: Davin Potts (davin) * (Python committer) Date: 2015-03-06 22:18
Updated (1) the patch for default/3.5 and 3.4 and (2) the patch for 2.7 to reflect recommendations from the review.

Thanks goes to Serhiy for the helpful review and especially the suggestion on better future-proofing in the tests.
msg237458 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-07 16:49
May be the code would cleaner when convert the "for" loop to the "while" loop and wrap in try/except only next()?
msg238003 - (view) Author: Davin Potts (davin) * (Python committer) Date: 2015-03-13 03:05
After pondering it for two days and coming back to it with hopefully "fresh eyes", I believe that changing the for-loop to a while-loop is not overall easier to understand -- I would lean towards keeping the for-loop.

I do think the change to the while-loop very much made the exception handling logic clearer but seeing the while-loop with the "manual" invocation of next and incrementing of the variable i and re-use of i as a signal to break out of a loop (setting i = None) made other things less clear.  My belief is that someone reading this method's code for the first time will read the for-loop version as, "try to loop through the enumerated tasks and if anything goes wrong then set the next position in the cache to 'failed'".  That top-level reading is, I think, not quite as easy with the while-loop.  Without the exception handling that we add in this patch, the original code used the for-loop and would, I think, have looked weird if it had tried to use a while-loop -- I think that's a sign that the for-loop is likely to be more easily understood by a first-time reader.

Though I am not sure it really matters, the while-loop version would only help end the processing of further jobs if an exception occurs in the iterator whereas the for-loop version might help if exceptions occur in a couple of other places.  We do not have a clear motivation for needing that however.
msg238007 - (view) Author: Roundup Robot (python-dev) Date: 2015-03-13 06:32
New changeset 525ccfcc55f7 by Serhiy Storchaka in branch '3.4':
Issue #23051: multiprocessing.Pool methods imap() and imap_unordered() now
https://hg.python.org/cpython/rev/525ccfcc55f7

New changeset 7891d084a9ad by Serhiy Storchaka in branch 'default':
Issue #23051: multiprocessing.Pool methods imap() and imap_unordered() now
https://hg.python.org/cpython/rev/7891d084a9ad

New changeset 311d52878a65 by Serhiy Storchaka in branch '2.7':
Issue #23051: multiprocessing.Pool methods imap() and imap_unordered() now
https://hg.python.org/cpython/rev/311d52878a65
msg238008 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-13 06:33
Thank you for your contribution Alon and Davin.
msg248366 - (view) Author: Gregory Szorc (Gregory.Szorc) Date: 2015-08-10 20:02
For posterity, I think we ran into a similar problem in https://bugzilla.mozilla.org/show_bug.cgi?id=1191877, where our code uses apply_async():

11:09:47     INFO -  Exception in thread Thread-2:
11:09:47     INFO -  Traceback (most recent call last):
11:09:47     INFO -    File "/tools/python/lib/python2.7/threading.py", line 551, in __bootstrap_inner
11:09:47     INFO -      self.run()
11:09:47     INFO -    File "/tools/python/lib/python2.7/threading.py", line 504, in run
11:09:47     INFO -      self.__target(*self.__args, **self.__kwargs)
11:09:47     INFO -    File "/tools/python/lib/python2.7/multiprocessing/pool.py", line 319, in _handle_tasks
11:09:47     INFO -      put(task)
11:09:47     INFO -  RuntimeError: dictionary changed size during iteration

This resulted in deadlock, just like this issue.

The added try..except around the iteration of taskseq likely fixes this as well.
History
Date User Action Args
2015-08-10 20:02:30Gregory.Szorcsetnosy: + Gregory.Szorc
messages: + msg248366
2015-03-13 06:33:43serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg238008

stage: patch review -> resolved
2015-03-13 06:32:27python-devsetnosy: + python-dev
messages: + msg238007
2015-03-13 03:05:56davinsetmessages: + msg238003
2015-03-07 16:49:56serhiy.storchakasetfiles: + issue_23051_4-3.4.patch

messages: + msg237458
2015-03-07 15:41:55serhiy.storchakasetassignee: serhiy.storchaka
2015-03-06 22:19:01davinsetfiles: + issue_23051_revised_fix_and_tests_v27.patch
2015-03-06 22:18:25davinsetfiles: + issue_23051_revised_fix_and_tests_v35_and_v34.patch

messages: + msg237386
2015-03-06 14:16:58serhiy.storchakasetnosy: + serhiy.storchaka

messages: + msg237358
stage: commit review -> patch review
2015-02-24 04:56:51davinsetstage: test needed -> commit review
2015-02-24 04:54:56davinsetfiles: + issue_23051_fix_and_tests_v27.patch
2015-02-24 04:53:58davinsetfiles: + issue_23051_fix_and_tests_v35_and_v34.patch

messages: + msg236477
2015-02-23 07:36:18davinsetmessages: + msg236427
stage: test needed
2015-01-16 08:20:37pitrousetnosy: + pitrou
messages: + msg234116
2015-01-14 22:22:06davinsetmessages: + msg234039
versions: + Python 3.5
2015-01-14 20:22:21davinsetnosy: + davin
2014-12-19 23:48:55josh.rsetnosy: + josh.r
2014-12-19 20:13:30terry.reedysettitle: multiprocessing.pool methods imap() and imap_unordered() cause deadlock -> multiprocessing.pool methods imap()[_unordered()] deadlock
2014-12-15 08:16:35ned.deilysetnosy: + sbt
2014-12-14 23:03:11advance512setmessages: + msg232644
2014-12-14 22:58:55advance512setfiles: + Issue_23051_fix_v3_4.patch
2014-12-14 22:58:20advance512setfiles: - Issue_23051_fix_v3_4.patch
2014-12-14 22:57:52advance512setfiles: + Issue_23051_fix_v3_4.patch
2014-12-14 22:57:34advance512setfiles: + Issue_23051_reproducer_v3_4.py
2014-12-14 22:53:39advance512setfiles: + Issue_23051_fix_v2_7.patch
keywords: + patch
2014-12-14 22:50:55advance512setfiles: + Issue_23051_reproducer_v2_7.py
2014-12-14 22:47:29advance512create