classification
Title: concurrent.futures.Executor.map() consumes all memory when big generators are used
Type: resource usage Stage: resolved
Components: Library (Lib) Versions: Python 3.7, Python 3.6, Python 3.3, Python 3.4, Python 3.5
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Make Executor.map work with infinite/large inputs correctly
View: 29842
Assigned To: Nosy List: Klamann, josh.r, xiang.zhang
Priority: normal Keywords:

Created on 2017-05-09 21:37 by Klamann, last changed 2017-05-16 09:38 by berker.peksag. This issue is now closed.

Messages (5)
msg293353 - (view) Author: (Klamann) Date: 2017-05-09 21:37
The Executor's map() function accepts a function and an iterable that holds the function arguments for each call to the function that should be made. This iterable could be a generator, and as such it could reference data that won't fit into memory.

The behaviour I would expect is that the Executor requests the next element from the iterable whenever a thread, process or whatever is ready to make the next function call.

But what actually happens is that the entire iterable gets converted into a list right after the map function is called and therefore any underlying generator will load all referenced data into memory. Here's where the list gets built from the iterable:
https://github.com/python/cpython/blob/3.6/Lib/concurrent/futures/_base.py#L548

The way I see it, there's no reason to convert the iterable to a list in the map function (or any other place in the Executor). Just replacing the list comprehension with a generator expression would probably fix that.


Here's an example that illustrates the issue:

    from concurrent.futures import ThreadPoolExecutor
    import time
    
    def generate():
        for i in range(10):
            print("generating input", i)
            yield i
    
    def work(i):
        print("working on input", i)
        time.sleep(1)
    
    with ThreadPoolExecutor(max_workers=2) as executor:
        generator = generate()
        executor.map(work, generator)

The output is:

    generating input 0
    working on input 0
    generating input 1
    working on input 1
    generating input 2
    generating input 3
    generating input 4
    generating input 5
    generating input 6
    generating input 7
    generating input 8
    generating input 9
    working on input 2
    working on input 3
    working on input 4
    working on input 5
    working on input 6
    working on input 7
    working on input 8
    working on input 9

Ideally, the lines should alternate, but currently all input is generated immediately.
msg293420 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-05-10 14:30
IIUC, the meaning of executor.map() is to run the tasks concurrently. So you have to loop the iterable ahead to submit all the tasks. If you make it a generator comprehension, you are actually submitting a task and then getting the result. So you are executing the tasks sequentially. Doesn't it violate the intention and meaning of executor.map()?
msg293539 - (view) Author: (Klamann) Date: 2017-05-12 09:55
Yes, I was wrong in my assumption that simply replacing the list comprehension with a generator expression would fix the issue.

Nevertheless, there is no need to load the *entire* generator into memory by converting it to a list. All we have to read are the first n elements, where n is the number of workers that are currently available.

I've implemented an alternative solution that works for me, using wait() and notify() from threading.Condition, but I'm not quite sure if this would be the best solution for everyone. But I could post it here, if you're intrested.

We should also consider that not strictly evaluating every iterable that is passed to the map() function might break existing code that implicitly relies on that fact that this is happening (although this is not a documented feature of the map function and was probably not the intended behaviour in the first place).
msg293627 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2017-05-13 23:00
This is a strict duplicate of #29842, for which I've had a PR outstanding for a couple months now.
msg293755 - (view) Author: (Klamann) Date: 2017-05-16 09:32
Thanks for pointing this out.
*closed*
History
Date User Action Args
2017-05-16 09:38:41berker.peksagsetsuperseder: Make Executor.map work with infinite/large inputs correctly
components: + Library (Lib)
2017-05-16 09:32:15Klamannsetstatus: open -> closed
resolution: duplicate
messages: + msg293755

stage: resolved
2017-05-13 23:00:05josh.rsetnosy: + josh.r
messages: + msg293627
2017-05-12 09:55:31Klamannsetmessages: + msg293539
2017-05-10 14:30:20xiang.zhangsetnosy: + xiang.zhang
messages: + msg293420
2017-05-09 21:37:02Klamanncreate