This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Feature: itertools: add batches
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.11
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: rhettinger, socketpair, tim.peters
Priority: normal Keywords:

Created on 2022-02-11 12:08 by socketpair, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (2)
msg413061 - (view) Author: Марк Коренберг (socketpair) * Date: 2022-02-11 12:08
I want a new function introduced in intertools. Something like this, but more optimal, and in C:

=======================
from itertools import chain, islice
from typing import Iterable, TypeVar

T = TypeVar('T')  # pylint: disable=invalid-name


def batches(items: Iterable[T], num: int) -> Iterable[Iterable[T]]:
    items = iter(items)
    while True:
        try:
            first_item = next(items)
        except StopIteration:
            break
        yield chain((first_item,), islice(items, 0, num - 1))
=======================

Splits big arrays to iterable chunks of fixed size (except the last one). Similar to `group_by`, but spawns new iterable group based on the group size.

For example, when passing many record to a database, passing one by one is obviously too slow. Passing all the records at once may increase latency. So, a good solution is to pass, say, 1000 records in one transaction. The smae in REST API batches.

P.S. Yes, I saw solution  https://docs.python.org/3/library/itertools.html#itertools-recipes `def grouper`, but it is not optimal for big `n` values.
msg413067 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2022-02-11 13:11
For large n, I don't think a C implementation would do much better than your Python version where most of the work is done by chain() and islice() which are already in C.  The best that could be done is to eliminate the overhead of chain() which is likely about a third of the cost.

For smaller n, the grouper recipe is already very close to optimal.
History
Date User Action Args
2022-04-11 14:59:56adminsetgithub: 90874
2022-02-12 11:08:05rhettingersetstatus: open -> closed
resolution: rejected
stage: resolved
2022-02-11 13:11:23rhettingersetassignee: rhettinger
messages: + msg413067
2022-02-11 12:59:13AlexWaygoodsetnosy: + tim.peters, rhettinger

title: Feature: iptertools: add batches -> Feature: itertools: add batches
2022-02-11 12:08:56socketpaircreate