Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

concurrent.futures.as_completed() memory inefficiency #71331

Closed
grzgrzgrz3 mannequin opened this issue May 28, 2016 · 15 comments
Closed

concurrent.futures.as_completed() memory inefficiency #71331

grzgrzgrz3 mannequin opened this issue May 28, 2016 · 15 comments
Labels
3.7 (EOL) end of life performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@grzgrzgrz3
Copy link
Mannequin

grzgrzgrz3 mannequin commented May 28, 2016

BPO 27144
Nosy @brianquinlan, @pitrou, @vstinner, @zhangyangyu, @grzgrzgrz3, @robnester, @willvousden
PRs
  • bpo-27144: concurrent.futures as_complete and map iterators do not keep reference to returned object #1560
  • [3.6] bpo-27144: concurrent.futures as_complete and map iterators do not keep reference to returned object (GH-1560) #3266
  • Fix a concurrent.futures.as_completed() refleak previously introduced in bpo-27144 #3270
  • [3.6] Fix a c.f.as_completed() refleak previously introduced in bpo-27144 (GH-3270) #3271
  • Files
  • reproduce.py
  • issue27144.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2017-09-03.13:31:34.282>
    created_at = <Date 2016-05-28.13:25:35.704>
    labels = ['3.7', 'library', 'performance']
    title = 'concurrent.futures.as_completed() memory inefficiency'
    updated_at = <Date 2017-09-03.20:24:00.641>
    user = 'https://github.com/grzgrzgrz3'

    bugs.python.org fields:

    activity = <Date 2017-09-03.20:24:00.641>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2017-09-03.13:31:34.282>
    closer = 'pitrou'
    components = ['Library (Lib)']
    creation = <Date 2016-05-28.13:25:35.704>
    creator = 'grzgrzgrz3'
    dependencies = []
    files = ['43037', '43038']
    hgrepos = []
    issue_num = 27144
    keywords = ['patch']
    message_count = 15.0
    messages = ['266552', '288708', '293435', '293439', '293563', '301136', '301141', '301142', '301181', '301186', '301187', '301189', '301190', '301191', '301197']
    nosy_count = 8.0
    nosy_names = ['bquinlan', 'pitrou', 'vstinner', 'xiang.zhang', 'grzgrzgrz3', 'rnester', 'willvousden', 'Mark DePristo']
    pr_nums = ['1560', '3266', '3270', '3271']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'resource usage'
    url = 'https://bugs.python.org/issue27144'
    versions = ['Python 3.6', 'Python 3.7']

    @grzgrzgrz3
    Copy link
    Mannequin Author

    grzgrzgrz3 mannequin commented May 28, 2016

    as_complite generator keeps reference of all passed futures until StopIteration. It may lead to serious memory inefficiency.

    Solution is to remove reference from lists and yield future ad-hoc.

    I have submitted patch and reproduce sample.

    I can create backport for older versions if needed.

    @grzgrzgrz3 grzgrzgrz3 mannequin added stdlib Python modules in the Lib dir performance Performance or resource usage labels May 28, 2016
    @willvousden
    Copy link
    Mannequin

    willvousden mannequin commented Feb 28, 2017

    Is there a reason this hasn't been merged yet?

    Fixing this bug locally (albeit just by setting Future._result = None when I've extracted the result) reduced the memory footprint of my program from 50 GB to 7 GB, so it seems worth it.

    @MarkDePristo
    Copy link
    Mannequin

    MarkDePristo mannequin commented May 10, 2017

    Any update on this reviewing, patching, and then backporting to 2.7? We've just run into the exact same issue here in Google using a ThreadPoolExecutor.map call growing memory usage without bounds due to the Future holding onto its result even after being returned from the map call. There's a more specific patch possible for just map() itself, but this more general one seems like it'd be better.

    @zhangyangyu
    Copy link
    Member

    Any update on this reviewing, patching, and then backporting to 2.7?

    concurrent package was added in 3.2. How backport to 2.7? Nosy myself.

    @zhangyangyu zhangyangyu added the 3.7 (EOL) end of life label May 10, 2017
    @grzgrzgrz3
    Copy link
    Mannequin Author

    grzgrzgrz3 mannequin commented May 12, 2017

    We just ran into the exact same issue here in Google using a ThreadPoolExecutor.map call
    Looks like map got simillar issue. I created GitHub PR with map fixed.

    Concurrent package was added in 3.2. How backport it 2.7?
    There is official, unofficial 2.7 backport.

    Git: https://github.com/agronholm/pythonfutures
    Pip: futures
    Ubuntu package: python-concurrent.futures

    @pitrou
    Copy link
    Member

    pitrou commented Sep 1, 2017

    New changeset 97e1b1c by Antoine Pitrou (Grzegorz Grzywacz) in branch 'master':
    bpo-27144: concurrent.futures as_complete and map iterators do not keep reference to returned object (bpo-1560)
    97e1b1c

    @pitrou
    Copy link
    Member

    pitrou commented Sep 1, 2017

    New changeset ea76791 by Antoine Pitrou in branch '3.6':
    [3.6] bpo-27144: concurrent.futures as_complete and map iterators do not keep reference to returned object (GH-1560) (bpo-3266)
    ea76791

    @pitrou
    Copy link
    Member

    pitrou commented Sep 1, 2017

    I've merged the PR to master and backported it to 3.6. Thank you Grzegorz for contributing this!

    @pitrou pitrou closed this as completed Sep 1, 2017
    @vstinner
    Copy link
    Member

    vstinner commented Sep 3, 2017

    "concurrent.futures.as_completed() memory inefficiency" hum, sadly the commit 97e1b1c introduced a reference leak. Example:

    $ ./python -m test -R 3:3 -test_concurrent_futures -m test.test_concurrent_futures.ProcessPoolAsCompletedTests.test_zero_timeout
    (...)
    test_concurrent_futures leaked [27, 27, 27] references, sum=81
    test_concurrent_futures leaked [16, 17, 16] memory blocks, sum=49
    (...)

    @vstinner vstinner reopened this Sep 3, 2017
    @pitrou
    Copy link
    Member

    pitrou commented Sep 3, 2017

    Thank you Victor. There is indeed a logic error in the new code. I'm opening a new PR.

    @grzgrzgrz3
    Copy link
    Mannequin Author

    grzgrzgrz3 mannequin commented Sep 3, 2017

    Tests are reusing finished futures. _yield_and_decref function do not clear waiters in finished futures.

    In the initial merge i propose to clear waiters but after review we decide it should be removed.

    I am confused now, should we change tests or restore initial _yield_and_decref function.

    @pitrou
    Copy link
    Member

    pitrou commented Sep 3, 2017

    New changeset 2ef3760 by Antoine Pitrou in branch 'master':
    Fix a c.f.as_completed() refleak previously introduced in bpo-27144 (bpo-3270)
    2ef3760

    @pitrou
    Copy link
    Member

    pitrou commented Sep 3, 2017

    New changeset 5cbca02 by Antoine Pitrou in branch '3.6':
    [3.6] Fix a c.f.as_completed() refleak previously introduced in bpo-27144 (GH-3270) (bpo-3271)
    5cbca02

    @pitrou
    Copy link
    Member

    pitrou commented Sep 3, 2017

    The regression should be fixed now.

    @pitrou pitrou closed this as completed Sep 3, 2017
    @vstinner
    Copy link
    Member

    vstinner commented Sep 3, 2017

    Thank you both for this nice enhancement.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants