Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiprocessing Pool keeps objects (tasks, args, results) alive too long #74047

Closed
pitrou opened this issue Mar 20, 2017 · 14 comments
Closed

multiprocessing Pool keeps objects (tasks, args, results) alive too long #74047

pitrou opened this issue Mar 20, 2017 · 14 comments
Labels
3.7 (EOL) end of life performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@pitrou
Copy link
Member

pitrou commented Mar 20, 2017

BPO 29861
Nosy @pitrou, @vstinner, @applio, @zhangyangyu
PRs
  • bpo-29861: release references to multiprocessing Pool tasks #743
  • bpo-29861: release references to multiprocessing Pool tasks (#743) #800
  • bpo-29861: release references to multiprocessing Pool tasks (#743) #801
  • bpo-29861: release references to multiprocessing Pool tasks (#743) #803
  • Relax test timing (bpo-29861) to avoid sporadic failures #1120
  • [3.6] Relax test timing (bpo-29861) to avoid sporadic failures (#1120) #1132
  • [3.5] Relax test timing (bpo-29861) to avoid sporadic failures (#1120) #1133
  • Relax test timing (bpo-29861) to avoid sporadic failures (#1120) #1472
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2017-03-24.15:04:38.949>
    created_at = <Date 2017-03-20.18:14:14.443>
    labels = ['3.7', 'library', 'performance']
    title = 'multiprocessing Pool keeps objects (tasks, args, results) alive too long'
    updated_at = <Date 2017-05-05.07:47:13.988>
    user = 'https://github.com/pitrou'

    bugs.python.org fields:

    activity = <Date 2017-05-05.07:47:13.988>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2017-03-24.15:04:38.949>
    closer = 'pitrou'
    components = ['Library (Lib)']
    creation = <Date 2017-03-20.18:14:14.443>
    creator = 'pitrou'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 29861
    keywords = []
    message_count = 14.0
    messages = ['289894', '289895', '290086', '290088', '290089', '290092', '291587', '291628', '291629', '291646', '291648', '291649', '293061', '293064']
    nosy_count = 5.0
    nosy_names = ['pitrou', 'vstinner', 'sbt', 'davin', 'xiang.zhang']
    pr_nums = ['743', '800', '801', '803', '1120', '1132', '1133', '1472']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'resource usage'
    url = 'https://bugs.python.org/issue29861'
    versions = ['Python 2.7', 'Python 3.5', 'Python 3.6', 'Python 3.7']

    @pitrou
    Copy link
    Member Author

    pitrou commented Mar 20, 2017

    The various workers in multiprocessing.Pool keep a reference to the last encountered task or task result. This means some data may be kept alive even after the caller is done with them, as long as some other task doesn't clobber the relevant variables.

    Specifically, Pool._handle_tasks(), Pool._handle_results() and the toplevel worker() function fail to clear references at the end of each loop.

    Originally reported at dask/distributed#956

    @pitrou pitrou added 3.7 (EOL) end of life stdlib Python modules in the Lib dir performance Performance or resource usage labels Mar 20, 2017
    @pitrou
    Copy link
    Member Author

    pitrou commented Mar 20, 2017

    Quick patch below. I'll make a PR once I have time to :-)

    diff --git a/Lib/multiprocessing/pool.py b/Lib/multiprocessing/pool.py
    index ffdf426..945afa2 100644
    --- a/Lib/multiprocessing/pool.py
    +++ b/Lib/multiprocessing/pool.py
    @@ -128,6 +128,8 @@ def worker(inqueue, outqueue, initializer=None, initargs=(), maxtasks=None,
                 util.debug("Possible encoding error while sending result: %s" % (
                     wrapped))
                 put((job, i, (False, wrapped)))
    +
    +        task = job = result = func = args = kwds = None
             completed += 1
         util.debug('worker exiting after %d tasks' % completed)
     
    @@ -402,6 +404,8 @@ class Pool(object):
                     if set_length:
                         util.debug('doing set_length()')
                         set_length(i+1)
    +            finally:
    +                task = taskseq = job = None
             else:
                 util.debug('task handler got sentinel')
     
    @@ -445,6 +449,7 @@ class Pool(object):
                     cache[job]._set(i, obj)
                 except KeyError:
                     pass
    +            task = job = obj = None
     
             while cache and thread._state != TERMINATE:
                 try:
    @@ -461,6 +466,7 @@ class Pool(object):
                     cache[job]._set(i, obj)
                 except KeyError:
                     pass
    +            task = job = obj = None
     
             if hasattr(outqueue, '_reader'):
                 util.debug('ensuring that outqueue is not full')

    @pitrou
    Copy link
    Member Author

    pitrou commented Mar 24, 2017

    New changeset 8988945 by Antoine Pitrou in branch 'master':
    bpo-29861: release references to multiprocessing Pool tasks (#743)
    8988945

    @pitrou
    Copy link
    Member Author

    pitrou commented Mar 24, 2017

    New changeset cc3331f by Antoine Pitrou in branch '3.6':
    bpo-29861: release references to multiprocessing Pool tasks (#743) (#800)
    cc3331f

    @pitrou
    Copy link
    Member Author

    pitrou commented Mar 24, 2017

    New changeset 80cb6ed by Antoine Pitrou in branch '3.5':
    bpo-29861: release references to multiprocessing Pool tasks (#743) (#801)
    80cb6ed

    @pitrou
    Copy link
    Member Author

    pitrou commented Mar 24, 2017

    New changeset 5084ff7 by Antoine Pitrou in branch '2.7':
    bpo-29861: release references to multiprocessing Pool tasks (#743) (#803)
    5084ff7

    @pitrou pitrou closed this as completed Mar 24, 2017
    @zhangyangyu
    Copy link
    Member

    Hi, Antoine, after this change, I sometimes see tests fail for 3.5 branch, for example http://buildbot.python.org/all/builders/x86%20Ubuntu%20Shared%203.5/builds/194/steps/test/logs/stdio.

    @pitrou
    Copy link
    Member Author

    pitrou commented Apr 13, 2017

    I can't reproduce here, on Ubuntu 16.04, after running the test 500 times.

    @pitrou
    Copy link
    Member Author

    pitrou commented Apr 13, 2017

    Ok, I can reproduce now.

    @pitrou
    Copy link
    Member Author

    pitrou commented Apr 14, 2017

    New changeset 685cdb9 by Antoine Pitrou in branch 'master':
    Relax test timing (bpo-29861) to avoid sporadic failures (bpo-1120)
    685cdb9

    @pitrou
    Copy link
    Member Author

    pitrou commented Apr 14, 2017

    New changeset 413a891 by Antoine Pitrou in branch '3.6':
    Relax test timing (bpo-29861) to avoid sporadic failures (bpo-1120) (bpo-1132)
    413a891

    @pitrou
    Copy link
    Member Author

    pitrou commented Apr 14, 2017

    New changeset 47f24a0 by Antoine Pitrou in branch '3.5':
    Relax test timing (bpo-29861) to avoid sporadic failures (bpo-1120) (bpo-1133)
    47f24a0

    @vstinner
    Copy link
    Member

    vstinner commented May 5, 2017

    New changeset 685cdb9 by Antoine Pitrou in branch 'master':
    Relax test timing (bpo-29861) to avoid sporadic failures (bpo-1120)

    Oh, this change wasn't backported to 2.7 and caused the bpo-30269. I proposed a backport: #1472 I will merge it once tests pass ;-)

    @vstinner
    Copy link
    Member

    vstinner commented May 5, 2017

    New changeset fd6094c by Victor Stinner in branch '2.7':
    Relax test timing (bpo-29861) to avoid sporadic failures (bpo-1120) (bpo-1472)
    fd6094c

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants