Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiprocessing.pool methods imap()[_unordered()] deadlock #67240

Closed
advance512 mannequin opened this issue Dec 14, 2014 · 13 comments
Closed

multiprocessing.pool methods imap()[_unordered()] deadlock #67240

advance512 mannequin opened this issue Dec 14, 2014 · 13 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@advance512
Copy link
Mannequin

advance512 mannequin commented Dec 14, 2014

BPO 23051
Nosy @pitrou, @serhiy-storchaka, @MojoVampire, @applio, @indygreg
Files
  • Issue_23051_reproducer_v2_7.py: Reproducer of issue for Python 2.7
  • Issue_23051_fix_v2_7.patch: Fix for Python 2.7
  • Issue_23051_reproducer_v3_4.py: Reproducer of issue for Python 3.4
  • Issue_23051_fix_v3_4.patch: Fix for Python 3.4
  • issue_23051_fix_and_tests_v35_and_v34.patch: Ignore -- replaced by newer patch file.
  • issue_23051_fix_and_tests_v27.patch: Ignore -- replaced by newer patch file.
  • issue_23051_revised_fix_and_tests_v35_and_v34.patch: Revised, combined patch and unit tests for default/3.5 and 3.4 both (supercedes earlier patch files)
  • issue_23051_revised_fix_and_tests_v27.patch: Revised, combined patch and unit tests for 2.7 (supercedes earlier patch files)
  • issue_23051_4-3.4.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2015-03-13.06:33:43.680>
    created_at = <Date 2014-12-14.22:47:29.181>
    labels = ['type-bug', 'library']
    title = 'multiprocessing.pool methods imap()[_unordered()] deadlock'
    updated_at = <Date 2015-08-10.20:02:30.690>
    user = 'https://bugs.python.org/advance512'

    bugs.python.org fields:

    activity = <Date 2015-08-10.20:02:30.690>
    actor = 'indygreg'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2015-03-13.06:33:43.680>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)']
    creation = <Date 2014-12-14.22:47:29.181>
    creator = 'advance512'
    dependencies = []
    files = ['37449', '37450', '37451', '37453', '38220', '38221', '38362', '38363', '38377']
    hgrepos = []
    issue_num = 23051
    keywords = ['patch']
    message_count = 13.0
    messages = ['232643', '232644', '234039', '234116', '236427', '236477', '237358', '237386', '237458', '238003', '238007', '238008', '248366']
    nosy_count = 8.0
    nosy_names = ['pitrou', 'python-dev', 'sbt', 'serhiy.storchaka', 'josh.r', 'davin', 'advance512', 'indygreg']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue23051'
    versions = ['Python 2.7', 'Python 3.4', 'Python 3.5']

    @advance512
    Copy link
    Mannequin Author

    advance512 mannequin commented Dec 14, 2014

    When imap() or imap_unordered() are called with the iterable parameter set as a generator function, and when that generator function raises an exception, then the _task_handler thread (running the method _handle_tasks) dies immediately, without causing the other threads to stop and without reporting the exception to the main thread (that one that called imap()).

    I saw this issue in Python 2.7.8, 2.7.9 and 3.4.2. Didn't check other versions, but I assume this is a bug in all Python versions since 2.6.

    I will be attaching examples that reproduce this issue, as well as patches for both Python 2.7 and Python 3.4.

    @advance512 advance512 mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Dec 14, 2014
    @advance512
    Copy link
    Mannequin Author

    advance512 mannequin commented Dec 14, 2014

    The patches I attached do 2 things:

    1. A deadlock is prevented, wherein the main thread waits forever for the Pool thread/s to finish their execution, while they wait for instructions to terminate from the _task_handler thread which has died. Instead, the exception are caught and handled and termination of the pool execution is performed.
    2. The exception that was raised is caught and passed to the main thread, and is re-thrown in the context of the main thread - hence the user catch it and handle it, or - at the very least - be aware of the issue.

    I tested the patch to the best of my abilities, and am almost certain nothing was changed performance wise nor anything broken.

    Further eyes would, of course, only help for confirming this.

    @terryjreedy terryjreedy changed the title multiprocessing.pool methods imap() and imap_unordered() cause deadlock multiprocessing.pool methods imap()[_unordered()] deadlock Dec 19, 2014
    @applio
    Copy link
    Member

    applio commented Jan 14, 2015

    Successfully reproduced the behavior playing through variations with the supplied examples (very helpful) on OS X v10.9.5 both in 2.7.9 and the default branch -- adding version 3.5 to issue.

    Also successfully verified that the supplied patches do just as advertised on same: OS X v10.9.5, both 2.7.9 and default branch.

    The patches' addition of a try around that for block looks like a sensible choice. At the moment, I do not yet fully understand what the patches' exception block is doing with the 'cache[job]._set' call the way it is, but I would like to spend more time to digest it better -- I hope to do so later this week.

    @pitrou
    Copy link
    Member

    pitrou commented Jan 16, 2015

    The patch would at least need to add a unit test in order to avoid regressions.

    @applio
    Copy link
    Member

    applio commented Feb 23, 2015

    For my part, I'm now good with all aspects of the patch supplied by Alon.

    I have a working set of tests that will be attached in the next day or two after seeing them behave on more than one platform.

    @applio
    Copy link
    Member

    applio commented Feb 24, 2015

    Attaching updated patch files that combine both Alon's fix with unit tests for it. Review of the unit tests especially would be welcomed.

    My one misgiving about these unit tests is that if at some future date there were a regression, the unhandled issue would potentially cause this unit test to hang -- it would be nicer if that did not have to be so.

    These two combined patches (one for default/3.5 and 3.4, one for 2.7) have been tested on OS X 10.10, Ubuntu Linux 12.04.5 64-bit, and Windows 7 64-bit (though only for the default/3.5 and 3.4 patch on Win7).

    @serhiy-storchaka
    Copy link
    Member

    Added comments on Rietveld.

    @applio
    Copy link
    Member

    applio commented Mar 6, 2015

    Updated (1) the patch for default/3.5 and 3.4 and (2) the patch for 2.7 to reflect recommendations from the review.

    Thanks goes to Serhiy for the helpful review and especially the suggestion on better future-proofing in the tests.

    @serhiy-storchaka serhiy-storchaka self-assigned this Mar 7, 2015
    @serhiy-storchaka
    Copy link
    Member

    May be the code would cleaner when convert the "for" loop to the "while" loop and wrap in try/except only next()?

    @applio
    Copy link
    Member

    applio commented Mar 13, 2015

    After pondering it for two days and coming back to it with hopefully "fresh eyes", I believe that changing the for-loop to a while-loop is not overall easier to understand -- I would lean towards keeping the for-loop.

    I do think the change to the while-loop very much made the exception handling logic clearer but seeing the while-loop with the "manual" invocation of next and incrementing of the variable i and re-use of i as a signal to break out of a loop (setting i = None) made other things less clear. My belief is that someone reading this method's code for the first time will read the for-loop version as, "try to loop through the enumerated tasks and if anything goes wrong then set the next position in the cache to 'failed'". That top-level reading is, I think, not quite as easy with the while-loop. Without the exception handling that we add in this patch, the original code used the for-loop and would, I think, have looked weird if it had tried to use a while-loop -- I think that's a sign that the for-loop is likely to be more easily understood by a first-time reader.

    Though I am not sure it really matters, the while-loop version would only help end the processing of further jobs if an exception occurs in the iterator whereas the for-loop version might help if exceptions occur in a couple of other places. We do not have a clear motivation for needing that however.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 13, 2015

    New changeset 525ccfcc55f7 by Serhiy Storchaka in branch '3.4':
    Issue bpo-23051: multiprocessing.Pool methods imap() and imap_unordered() now
    https://hg.python.org/cpython/rev/525ccfcc55f7

    New changeset 7891d084a9ad by Serhiy Storchaka in branch 'default':
    Issue bpo-23051: multiprocessing.Pool methods imap() and imap_unordered() now
    https://hg.python.org/cpython/rev/7891d084a9ad

    New changeset 311d52878a65 by Serhiy Storchaka in branch '2.7':
    Issue bpo-23051: multiprocessing.Pool methods imap() and imap_unordered() now
    https://hg.python.org/cpython/rev/311d52878a65

    @serhiy-storchaka
    Copy link
    Member

    Thank you for your contribution Alon and Davin.

    @indygreg
    Copy link
    Mannequin

    indygreg mannequin commented Aug 10, 2015

    For posterity, I think we ran into a similar problem in https://bugzilla.mozilla.org/show_bug.cgi?id=1191877, where our code uses apply_async():

    11:09:47     INFO -  Exception in thread Thread-2:
    11:09:47     INFO -  Traceback (most recent call last):
    11:09:47     INFO -    File "/tools/python/lib/python2.7/threading.py", line 551, in __bootstrap_inner
    11:09:47     INFO -      self.run()
    11:09:47     INFO -    File "/tools/python/lib/python2.7/threading.py", line 504, in run
    11:09:47     INFO -      self.__target(*self.__args, **self.__kwargs)
    11:09:47     INFO -    File "/tools/python/lib/python2.7/multiprocessing/pool.py", line 319, in _handle_tasks
    11:09:47     INFO -      put(task)
    11:09:47     INFO -  RuntimeError: dictionary changed size during iteration

    This resulted in deadlock, just like this issue.

    The added try..except around the iteration of taskseq likely fixes this as well.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants