classification
Title: multiprocessing memory huge usage
Type: resource usage Stage: resolved
Components: Versions: Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Windson Yang, methane, pitrou, xiang.zhang, zach.ware
Priority: normal Keywords: patch

Created on 2018-07-17 04:03 by Windson Yang, last changed 2019-01-25 12:17 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
test.py Windson Yang, 2018-07-17 04:03
Pull Requests
URL Status Linked Edit
PR 8324 merged Windson Yang, 2018-07-18 10:03
PR 11673 merged miss-islington, 2019-01-25 12:02
PR 11673 merged miss-islington, 2019-01-25 12:02
PR 11674 closed miss-islington, 2019-01-25 12:04
PR 11674 closed miss-islington, 2019-01-25 12:04
PR 11674 closed miss-islington, 2019-01-25 12:04
Messages (14)
msg321788 - (view) Author: Windson Yang (Windson Yang) * Date: 2018-07-17 04:03
I'm using macOX and I got huge memory usage when using generator with multiprocess. (see file) 

I think this is because (https://github.com/python/cpython/blob/master/Lib/multiprocessing/pool.py#L383)

    if not hasattr(iterable, '__len__'):
        iterable = list(iterable)

    if chunksize is None:
        chunksize, extra = divmod(len(iterable), len(self._pool) * 4)
        if extra:
            chunksize += 1

When we convert an iterable to list(iterable), we lost the advantage of using the generator. I'm not sure how to fix it, maybe we can set a default value for an object don't have '__len__' attr, any ideas?
msg321791 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2018-07-17 05:16
Do you imap or imap_unorderd?
They are intended for use with iterator, including generator.
msg321796 - (view) Author: Windson Yang (Windson Yang) * Date: 2018-07-17 06:43
Thank you for the hint, INADA. I think we should add something like "if you are using generator, consider use imap instead" in https://docs.python.org/3.4/library/multiprocessing.html?highlight=process#multiprocessing.pool.Pool.map
msg321797 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2018-07-17 06:56
> I think we should add something like "if you are using generator, consider use imap instead" 

I think it's not good hint.
There are short generator.  And there are long (or infinite) iterator other than generator too.
Maybe, "if iterator is not sequence (e.g. generator) and can be very big, consider using `imap` or `imap_unorderd` with explicit `chunksize` option for better efficiency."

But I'm not good at writing English.
Someone other than me can write better paragraph.
msg321800 - (view) Author: Windson Yang (Windson Yang) * Date: 2018-07-17 07:30
Thank you, I will try to make a pull request and let other to edit it.
msg321836 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2018-07-17 15:02
One thing worth a try here maybe turn `len` to `operator.length_hint`. But I am not sure it's a good idea and just a mention here.
msg321837 - (view) Author: Windson Yang (Windson Yang) * Date: 2018-07-17 15:12
Thank you Xiang Zhang, I found the code keeps hanging when I use imap, I will try to figure out tomorrow.
msg321856 - (view) Author: Windson Yang (Windson Yang) * Date: 2018-07-18 03:52
The code didn't work with imap because imap create a generator, so we can't access result outside the with statement.

    with Pool(os.cpu_count()) as p:
        result = p.imap(clean_up, k, 50)
    for r in result:
        print(r)

In https://docs.python.org/3.4/library/multiprocessing.html?highlight=process#using-a-pool-of-workers I found the correct example. I'm not sure should me add example or warning in imap function.
msg321857 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2018-07-18 04:29
Why accessing the result outside of with block? The pool is terminated while exiting the block before the work is done.
msg321864 - (view) Author: Windson Yang (Windson Yang) * Date: 2018-07-18 07:43
Yes, we should not. But we can do this when use map function. the document gives a good example but doesn't say much about real differences between map and imap. Maybe we should add some notes like INADA suggest. 

map function will convert iterable to list if it doesn't implement __len__ function, so if you are using a generator, you should consider use imap. As well as add a warning about don't try to access the result outside the with statement. 

But if you guys think the docs are good enough, please close this issue.
msg321867 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2018-07-18 08:15
I'm +1 for INADA's change, but not more examples trying to distinguish every detail difference.
msg334354 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2019-01-25 12:01
New changeset 3bab40db96efda2e127ef84e6501fda0cdc4f5b8 by Antoine Pitrou (Windson yang) in branch 'master':
bpo-34134: Advise to use imap or imap_unordered when handling long iterables. (gh-8324)
https://github.com/python/cpython/commit/3bab40db96efda2e127ef84e6501fda0cdc4f5b8
msg334356 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2019-01-25 12:08
New changeset c2674bf11036af1e06c1be739f0eebcc72dfbf7a by Antoine Pitrou (Miss Islington (bot)) in branch '3.7':
bpo-34134: Advise to use imap or imap_unordered when handling long iterables. (gh-8324) (gh-11673)
https://github.com/python/cpython/commit/c2674bf11036af1e06c1be739f0eebcc72dfbf7a
msg334357 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2019-01-25 12:17
This is basically fixed, except that I'll let the Release Manager choose whether 3.6 gets the fix as well. Thanks!
History
Date User Action Args
2019-01-25 12:17:21pitrousetstatus: open -> closed
versions: + Python 3.7
messages: + msg334357

resolution: fixed
stage: patch review -> resolved
2019-01-25 12:08:16pitrousetmessages: + msg334356
2019-01-25 12:04:44miss-islingtonsetpull_requests: + pull_request11491
2019-01-25 12:04:35miss-islingtonsetpull_requests: + pull_request11490
2019-01-25 12:04:26miss-islingtonsetpull_requests: + pull_request11489
2019-01-25 12:02:20miss-islingtonsetpull_requests: + pull_request11488
2019-01-25 12:02:12miss-islingtonsetpull_requests: + pull_request11487
2019-01-25 12:01:45pitrousetmessages: + msg334354
2018-07-21 00:35:18terry.reedysetnosy: + pitrou
2018-07-18 10:03:16Windson Yangsetkeywords: + patch
stage: patch review
pull_requests: + pull_request7859
2018-07-18 08:15:01xiang.zhangsetmessages: + msg321867
2018-07-18 07:43:49Windson Yangsetmessages: + msg321864
2018-07-18 04:29:03xiang.zhangsetmessages: + msg321857
2018-07-18 03:52:33Windson Yangsetmessages: + msg321856
2018-07-17 15:12:55Windson Yangsetmessages: + msg321837
2018-07-17 15:02:52xiang.zhangsetnosy: + xiang.zhang

messages: + msg321836
versions: - Python 2.7, Python 3.4, Python 3.5, Python 3.6, Python 3.7
2018-07-17 07:30:46Windson Yangsetmessages: + msg321800
2018-07-17 06:56:04methanesetmessages: + msg321797
2018-07-17 06:43:59Windson Yangsetmessages: + msg321796
2018-07-17 05:16:04methanesetnosy: + methane
messages: + msg321791
2018-07-17 04:03:50Windson Yangcreate