Issue 39362: add option to make chunksize adaptive for multiprocessing.pool methods

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/83543

classification

Title:	add option to make chunksize adaptive for multiprocessing.pool methods
Type:	enhancement	Stage:
Components:	Library (Lib)	Versions:

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	davin, fgregg, pablogsal, pitrou
Priority:	normal	Keywords:

Created on 2020-01-16 14:56 by fgregg, last changed 2022-04-11 14:59 by admin.

Messages (1)
msg360126 - (view)	Author: Forest (fgregg) *	Date: 2020-01-16 14:56
In the multiprocessing Pool methods like map, chunksize determines the trade-off between computation per task and inter-process communication. Setting chunksize appropriately has a large effect on efficiency. However, for users directly interacting with the map methods, the way to find the appropriate chunksize is by manually checking different sizes and observing the program behavior. For library developers, you have to hope that you set an reasonable value that will work okay across different hardware, operating systems, and task characteristics. Generally, users of these methods want maximum throughput. It would be great if the map-like methods could adapt their chunksize towards that goal. Something along the lines of this: n_items = 0 queue = Queue(N) while True: chunk = tuple(itertools.islice(iterable, chunk_size)) if chunk: queue.put(chunk) n_items += chunk_size i += 1 if i % 10: time_delta = max(time.perf_counter() - t0, 0.0001) current_rate = n_items / time_delta # chunk_size is always either growing or shrinking, if # the shrinking led to a faster rate, keep # shrinking. Same with growing. If the rate decreased, # reverse directions if current_rate < last_rate: multiplier = 1 / multiplier chunk_size = int(min(max(chunk_size * multiplier, 1), upper_bound)) last_rate = current_rate n_items = 0 t0 = time.perf_counter() Would such a feature be desirable?

msg360126 - (view)

Author: Forest (fgregg) *

Date: 2020-01-16 14:56

In the multiprocessing Pool methods like map, chunksize determines the trade-off between computation per task and inter-process communication. Setting chunksize appropriately has a large effect on efficiency.


However, for users directly interacting with the map methods, the way to find the appropriate chunksize is by manually checking different sizes and observing the program behavior.

For library developers, you have to hope that you set an reasonable value that will work okay across different hardware, operating systems, and task characteristics.

Generally, users of these methods want maximum throughput. It would be great if the map-like methods could adapt their chunksize towards that goal.

Something along the lines of this:

    n_items = 0
    queue = Queue(N)
    while True:
        chunk = tuple(itertools.islice(iterable, chunk_size))
        if chunk:
            queue.put(chunk)

            n_items += chunk_size
            i += 1

            if i % 10:
                time_delta = max(time.perf_counter() - t0, 0.0001)

                current_rate = n_items / time_delta

                # chunk_size is always either growing or shrinking, if
                # the shrinking led to a faster rate, keep
                # shrinking. Same with growing. If the rate decreased,
                # reverse directions
                if current_rate < last_rate:
                    multiplier = 1 / multiplier

                chunk_size = int(min(max(chunk_size * multiplier, 1), upper_bound))

                last_rate = current_rate
                n_items = 0
                t0 = time.perf_counter()


Would such a feature be desirable?

History
Date	User	Action	Args
2022-04-11 14:59:25	admin	set	github: 83543
2020-01-16 18:07:25	pablogsal	set	nosy: + pablogsal
2020-01-16 17:49:19	ned.deily	set	nosy: + pitrou, davin, - ronaldoussoren, ned.deily
2020-01-16 14:57:19	fgregg	set	type: enhancement components: + Library (Lib), - macOS
2020-01-16 14:56:42	fgregg	create