This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: concurrent.futures.as_completed() memory inefficiency
Type: resource usage Stage: resolved
Components: Library (Lib) Versions: Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Mark DePristo, bquinlan, grzgrzgrz3, pitrou, rnester, vstinner, willvousden, xiang.zhang
Priority: normal Keywords: patch

Created on 2016-05-28 13:25 by grzgrzgrz3, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
reproduce.py grzgrzgrz3, 2016-05-28 13:25
issue27144.patch grzgrzgrz3, 2016-05-28 13:26 review
Pull Requests
URL Status Linked Edit
PR 1560 merged grzgrzgrz3, 2017-05-12 18:42
PR 3266 merged pitrou, 2017-09-01 17:01
PR 3270 merged pitrou, 2017-09-03 10:26
PR 3271 merged pitrou, 2017-09-03 13:11
Messages (15)
msg266552 - (view) Author: Grzegorz Grzywacz (grzgrzgrz3) * Date: 2016-05-28 13:25
as_complite generator keeps reference of all passed futures until StopIteration. It may lead to serious memory inefficiency.

Solution is to remove reference from lists and yield future ad-hoc.

I have submitted patch and reproduce sample.

I can create backport for older versions if needed.
msg288708 - (view) Author: Will Vousden (willvousden) Date: 2017-02-28 12:15
Is there a reason this hasn't been merged yet?

Fixing this bug locally (albeit just by setting Future._result = None when I've extracted the result) reduced the memory footprint of my program from 50 GB to 7 GB, so it seems worth it.
msg293435 - (view) Author: Mark DePristo (Mark DePristo) Date: 2017-05-10 16:42
Any update on this reviewing, patching, and then backporting to 2.7? We've just run into the exact same issue here in Google using a ThreadPoolExecutor.map call growing memory usage without bounds due to the Future holding onto its result even after being returned from the map call. There's a more specific patch possible for just map() itself, but this more general one seems like it'd be better.
msg293439 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-05-10 17:01
> Any update on this reviewing, patching, and then backporting to 2.7?

concurrent package was added in 3.2. How backport to 2.7? Nosy myself.
msg293563 - (view) Author: Grzegorz Grzywacz (grzgrzgrz3) * Date: 2017-05-12 18:46
> We just ran into the exact same issue here in Google using a ThreadPoolExecutor.map call
Looks like map got simillar issue. I created GitHub PR with map fixed.

>  Concurrent package was added in 3.2. How backport it 2.7?
There is official, unofficial 2.7 backport.

Git: https://github.com/agronholm/pythonfutures
Pip: futures
Ubuntu package: python-concurrent.futures
msg301136 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-09-01 16:54
New changeset 97e1b1c81458d2109b2ffed32ffa1eb643a6c3b9 by Antoine Pitrou (Grzegorz Grzywacz) in branch 'master':
bpo-27144: concurrent.futures as_complete and map iterators do not keep reference to returned object (#1560)
https://github.com/python/cpython/commit/97e1b1c81458d2109b2ffed32ffa1eb643a6c3b9
msg301141 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-09-01 17:16
New changeset ea767915f7476c1fe97f7b1a53304d57f105bdd2 by Antoine Pitrou in branch '3.6':
[3.6] bpo-27144: concurrent.futures as_complete and map iterators do not keep reference to returned object (GH-1560) (#3266)
https://github.com/python/cpython/commit/ea767915f7476c1fe97f7b1a53304d57f105bdd2
msg301142 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-09-01 17:20
I've merged the PR to master and backported it to 3.6.  Thank you Grzegorz for contributing this!
msg301181 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-09-03 04:18
"concurrent.futures.as_completed() memory inefficiency" hum, sadly the commit 97e1b1c81458d2109b2ffed32ffa1eb643a6c3b9 introduced a reference leak. Example:

$ ./python -m test -R 3:3 -test_concurrent_futures -m test.test_concurrent_futures.ProcessPoolAsCompletedTests.test_zero_timeout
(...)
test_concurrent_futures leaked [27, 27, 27] references, sum=81
test_concurrent_futures leaked [16, 17, 16] memory blocks, sum=49
(...)
msg301186 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-09-03 10:26
Thank you Victor.  There is indeed a logic error in the new code.  I'm opening a new PR.
msg301187 - (view) Author: Grzegorz Grzywacz (grzgrzgrz3) * Date: 2017-09-03 10:53
Tests are reusing finished futures. `_yield_and_decref` function do not clear waiters in finished futures.

In the initial merge i propose to clear waiters but after review we decide it should be removed.

I am confused now, should we change tests or restore initial `_yield_and_decref` function.
msg301189 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-09-03 13:09
New changeset 2ef37607b7aacb7c750d008b9113fe11f96163c0 by Antoine Pitrou in branch 'master':
Fix a c.f.as_completed() refleak previously introduced in bpo-27144 (#3270)
https://github.com/python/cpython/commit/2ef37607b7aacb7c750d008b9113fe11f96163c0
msg301190 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-09-03 13:30
New changeset 5cbca0235b8da07c9454bcaa94f12d59c2df0ad2 by Antoine Pitrou in branch '3.6':
[3.6] Fix a c.f.as_completed() refleak previously introduced in bpo-27144 (GH-3270) (#3271)
https://github.com/python/cpython/commit/5cbca0235b8da07c9454bcaa94f12d59c2df0ad2
msg301191 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-09-03 13:31
The regression should be fixed now.
msg301197 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-09-03 20:24
Thank you both for this nice enhancement.
History
Date User Action Args
2022-04-11 14:58:31adminsetgithub: 71331
2017-09-03 20:24:00vstinnersetmessages: + msg301197
2017-09-03 13:31:34pitrousetstatus: open -> closed
resolution: fixed
messages: + msg301191
2017-09-03 13:30:58pitrousetmessages: + msg301190
2017-09-03 13:11:09pitrousetpull_requests: + pull_request3315
2017-09-03 13:09:27pitrousetmessages: + msg301189
2017-09-03 10:53:22grzgrzgrz3setmessages: + msg301187
2017-09-03 10:26:31pitrousetpull_requests: + pull_request3314
2017-09-03 10:26:09pitrousetmessages: + msg301186
2017-09-03 04:18:31vstinnersetstatus: closed -> open

nosy: + vstinner
messages: + msg301181

resolution: fixed -> (no value)
2017-09-01 17:20:37pitrousetstatus: open -> closed
versions: - Python 3.5
messages: + msg301142

resolution: fixed
stage: patch review -> resolved
2017-09-01 17:16:49pitrousetmessages: + msg301141
2017-09-01 17:01:58pitrousetpull_requests: + pull_request3310
2017-09-01 16:54:02pitrousetnosy: + pitrou
messages: + msg301136
2017-05-12 18:46:28grzgrzgrz3setmessages: + msg293563
2017-05-12 18:42:55grzgrzgrz3setpull_requests: + pull_request1658
2017-05-10 17:01:57xiang.zhangsetnosy: + xiang.zhang

messages: + msg293439
versions: + Python 3.7
2017-05-10 16:42:23Mark DePristosetnosy: + Mark DePristo
messages: + msg293435
2017-02-28 12:15:52willvousdensetmessages: + msg288708
2017-02-28 12:03:23willvousdensetnosy: + willvousden
2016-07-21 18:13:13rnestersetnosy: + rnester
2016-05-28 14:04:30SilentGhostsetstage: patch review
versions: - Python 3.2, Python 3.3, Python 3.4
2016-05-28 13:26:47grzgrzgrz3setfiles: + issue27144.patch
keywords: + patch
2016-05-28 13:25:35grzgrzgrz3create