classification
Title: Multiprocessing maxtasksperchild results in hang
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.3, Python 3.2, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Jimbofbx, asksol, jkeating, jnoller, neologix, pitrou, python-dev
Priority: normal Keywords: needs review, patch

Created on 2010-11-05 22:22 by Jimbofbx, last changed 2011-10-24 18:06 by neologix. This issue is now closed.

Files
File name Uploaded Description Edit
pool_lifetime_close-1.diff neologix, 2011-10-21 19:46 review
Messages (8)
msg120547 - (view) Author: James Hutchison (Jimbofbx) Date: 2010-11-05 22:22
v.3.2a3

If the maxtasksperchild argument is used, the program will just hang after whatever that value is rather than working as expected. Tested in Windows XP 32-bit

test code:

import multiprocessing

def f(x):
    return 0;

if __name__ == '__main__':
    pool = multiprocessing.Pool(processes=2,maxtasksperchild=1);
    results = list();
    for i in range(10):
        results.append(pool.apply_async(f, (i)));
    pool.close();
    pool.join();
    for r in results:
        print(r);
    print("Done");
msg133003 - (view) Author: Jesse Keating (jkeating) Date: 2011-04-05 05:12
I can duplicate this using python-2.7-8.fc14.1 on Fedora 14, and using map_async with the pool.
msg133648 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-04-13 08:19
This problem arises because the pool's close method is called before all the tasks have completed. Putting a sleep(1) before pool.close() won't exhibit this lockup.
The root cause is that close makes the workers handler thread exit: since the maxtasksperchild argument is used, workers exit when they've processed their max number of tasks. But since the workers handler thread exited, it doesn't maintain the pool of workers anymore, and thus the remaining tasks are not treated anymore, and the task handler thread waits indefinitely (since it waits until the cache is empty).
The solution is to prevent the worker handler thread from exiting until the cache has been drained (unless the pool is terminated in which case it must exit right away).
Attached is a patch and relevant test.

Note: I noticed that there are some thread-unsafe operations (the cache that can be modified from different threads, and thread states are modified also from different threads). While this isn't an issue with the current cPython implementation (GIL), I wonder if this should be fixed.
msg133667 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2011-04-13 13:53
> Note: I noticed that there are some thread-unsafe operations (the cache that can be modified from different threads, and thread states are modified also from different threads). While this isn't an issue with the current cPython implementation (GIL), I wonder if this should be fixed.
>

Yes. We should fix those.
msg146119 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-10-21 19:46
Here's an updated patch.
I'll open a separate issue for the thread-safety.
msg146268 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-10-23 23:09
The patch looks good to me, thanks.
msg146308 - (view) Author: Roundup Robot (python-dev) Date: 2011-10-24 16:45
New changeset 3465a9b2d25c by Charles-François Natali in branch '2.7':
Issue #10332: multiprocessing: fix a race condition when a Pool is closed
http://hg.python.org/cpython/rev/3465a9b2d25c

New changeset 52c98a729a71 by Charles-François Natali in branch '3.2':
Issue #10332: multiprocessing: fix a race condition when a Pool is closed
http://hg.python.org/cpython/rev/52c98a729a71

New changeset c2cdabc44665 by Charles-François Natali in branch 'default':
Issue #10332: multiprocessing: fix a race condition when a Pool is closed
http://hg.python.org/cpython/rev/c2cdabc44665
msg146313 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-10-24 18:06
James, thanks for the report!
History
Date User Action Args
2012-03-26 13:15:21neologixlinkissue14404 superseder
2011-10-24 18:06:42neologixsetstatus: open -> closed
resolution: fixed
messages: + msg146313

stage: patch review -> resolved
2011-10-24 16:45:29python-devsetnosy: + python-dev
messages: + msg146308
2011-10-23 23:09:35pitrousetmessages: + msg146268
versions: + Python 3.3, - Python 3.1
2011-10-21 19:47:20neologixsetfiles: - pool_lifetime_close.diff
2011-10-21 19:46:57neologixsetfiles: + pool_lifetime_close-1.diff

nosy: + pitrou
messages: + msg146119

keywords: + needs review
stage: patch review
2011-04-13 13:53:23jnollersetmessages: + msg133667
2011-04-13 08:19:50neologixsetfiles: + pool_lifetime_close.diff

nosy: + neologix
messages: + msg133648

keywords: + patch
2011-04-05 05:12:09jkeatingsetnosy: + jkeating
messages: + msg133003
2010-11-05 22:24:31pitrousetnosy: + jnoller, asksol

type: behavior
versions: + Python 3.1, Python 2.7
2010-11-05 22:22:28Jimbofbxcreate