itertools.chunks(iterable, size, fill=None) #62062

techtonik · 2013-04-28T16:13:10Z

BPO	17862
Nosy	@rhettinger, @terryjreedy, @ezio-melotti, @serhiy-storchaka, @jstasiak, @MojoVampire
Files	iter_chunks.diff itertools.chunk.patch: Implementation handling arbitrary iterables

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/rhettinger'
closed_at = <Date 2014-04-02.20:50:22.172>
created_at = <Date 2013-04-28.16:13:10.007>
labels = ['type-feature', 'library']
title = 'itertools.chunks(iterable, size, fill=None)'
updated_at = <Date 2014-04-02.20:50:22.171>
user = 'https://bugs.python.org/techtonik'

bugs.python.org fields:

activity = <Date 2014-04-02.20:50:22.171>
actor = 'rhettinger'
assignee = 'rhettinger'
closed = True
closed_date = <Date 2014-04-02.20:50:22.172>
closer = 'rhettinger'
components = ['Library (Lib)']
creation = <Date 2013-04-28.16:13:10.007>
creator = 'techtonik'
dependencies = []
files = ['30177', '31565']
hgrepos = []
issue_num = 17862
keywords = ['patch']
message_count = 11.0
messages = ['187995', '188333', '188342', '188378', '188482', '188483', '188723', '196818', '196835', '197313', '215400']
nosy_count = 8.0
nosy_names = ['rhettinger', 'terry.reedy', 'techtonik', 'ezio.melotti', 'python-dev', 'serhiy.storchaka', 'jstasiak', 'josh.r']
pr_nums = []
priority = 'low'
resolution = 'rejected'
stage = 'needs patch'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue17862'
versions = ['Python 3.4']

techtonik · 2013-04-28T16:13:10Z

The history:
2007 - http://bugs.python.org/issue1502
2009 - http://bugs.python.org/issue6021

I'd like to resurrect this proposal again, but name it:
itertools.chunks(iterable, size, fill=None)

Two reasons.

practicality - top itertools request on StackOverflow
http://stackoverflow.com/search?tab=votes&q=%5bpython%5d%20%2bitertools
performance
the time should be a constant for a fixed-length iterable regardless of a size of chunk, but in fact the time is proportional to the size of chunk

{{{
import timeit

print timeit.timeit(
'grouper(30000, "x"*400000)', setup='from __main__ import grouper', number=1000
)
print timeit.timeit(
'grouper(300000, "x"*400000)', setup='from __main__ import grouper', number=1000
)
}}}

1.52581005407
14.6219704599

Addressing odd length user stories from msg87745:

no exceptions - odd end is an easy check if you need it
fill-in value - provided
truncation - just don't set fill-in value
3.1. if you really need fill-in as None, then an itertools.TRUNCATE value can be used as a truncation parameter
partially filled-in tuple - not sure what that means

Raymond, your opinion is critical here. =)

terryjreedy · 2013-05-03T23:24:41Z

[Anatoly, 'Versions 3.5' is for changes that should *not* happen in 3.4, such as a planned removal for something deprecated in 3.3.]

rhettinger · 2013-05-04T07:56:05Z

The reasons for the previous rejections still hold: more tools make the overall toolset harder to use, not being a primitive operation, dearth of real-world use-cases, not being prevalent in other functional languages, easily expressible with existing tools, non-consensus on what to do with odd-sizes, lack of adoption of the published recipe, and a hard-to-guess function name.

In addition to previously listed reasons, I have vague feelings that this isn't the right thing to do. The feelings are in-part based on poor user experience with itertools.groupby(), a function that developers said they wanted but ended-up being awkward to fit into real applications, confusing to some users, and rarely used in practice.

Another source of misgivings is that iterators may not be the right tool for this kind of task. For example, when partitioning data into subgroups for a map/reduce operation, iterators won't help because they are evaluated serially which precludes any chance of parallelization. Even in cases of serial processing such reading blocks from a socket, the chunks iterator would be useless or awkward (i.e. we need more versatility than iterator.next() to manage the control-flow, time-outs, out-of-band control, separating headers from content, etc.) In other words, I have a sense that the generic concept of "break data into chunks" tends to occur is situations where the iterator protocol would be at odds with a clean solution.

That said, I'll leave this open for a while and do my best to try to warm-up to it. Your recurring enthusiasm for it is a positive point. Another is its the faint resemblance to a numpy reshape operation.

P.S. In prior discussions, the only real use case that ever surfaced was printing long sequences of data across multiple columns. Even that use case was suspect because the desired order is typically in column-major order (for example, look at the output of the *nix "ls" command).

ezio-melotti · 2013-05-04T17:15:41Z

FWIW I'm +1 about adding grouper(), since it happened to me to use it and suggest more often than any other recipe (I think it's the only recipe I used/know), and even more often than some of the current itertools.
The fact that has been requested several times already it's a good indication that it's something useful, and not a feature creep (I don't think any other recipe received nearly as many requests).

Regarding use cases, a few days ago I needed it while shuffling a sequence of 20 chars and then split it in four 5-chars long groups.
I also remember writing code like this more than once:
>>> s = 'aaaaabbbbbcccccddddd'
>>> n = 5
>>> [s[x:x+n] for x in range(0,len(s),n)]
['aaaaa', 'bbbbb', 'ccccc', 'ddddd']

(As a side note, I find the grouper(iterable, n, fill) signature more intuitive than grouper(n, iterable, fill) (the recipe uses the latter).)

python-dev · 2013-05-06T02:45:55Z

New changeset 763d260414d1 by Raymond Hettinger in branch '2.7':
bpo-17862: Improve the signature of itertools grouper() recipe.
http://hg.python.org/cpython/rev/763d260414d1

python-dev · 2013-05-06T02:54:33Z

New changeset 6383d0c8140d by Raymond Hettinger in branch '3.3':
bpo-17862: Improve the signature of itertools grouper() recipe.
http://hg.python.org/cpython/rev/6383d0c8140d

serhiy-storchaka · 2013-05-08T15:44:52Z

A week ago I implemented chunks() on C for bpo-17804. This is an equivalent of such Python code for unlimited sequences:

    def chunks(seq, size, start=0):
        for i in itertools.count(start, size):
            yield seq[i: i + size]

or simpler for limited sequences:

    def chunks(seq, size, start=0):
        for i in range(start, len(seq), size):
            yield seq[i: i + size]

Later I gave up the idea when I saw the insignificance of the benefits. Personally I have such arguments against including it in stdlib:

While C implemented chunks() is faster than manual iteration, speed up of real loops is not worth the use of special function.
This idiom is used less than I expected (about two dozen times in stdlib, not counting tests and tools) and use chunks() saves too little number of lines. In any case Python implementation is only 2-3 lines.
This function is not very well suited for the itertools module, because it works with sequences and not with iterators.

jstasiak · 2013-09-03T01:01:40Z

I'm just gonna leave my implementation of chunk function (not sure about the name yet) here, it's basically what itertools.chunks from the previous patch is but it works for arbitrary iterables + few tests and documentation. The last chunk is currently truncated if there are not enough elements to fill it completely.

serhiy-storchaka · 2013-09-03T10:02:55Z

We should distinguish between at least two different functions. One generates slices of input sequence (it is especially useful for strings and bytes objects), and other groups items from arbitrary iterator into tuples. They have different applications.

rhettinger · 2013-09-08T19:36:39Z

If this is to go forward, it needs to be more interesting, useful, and general than what was has been discussed so far. I would be open to some kind of reshape() itertool than can ungroup, flatten, and regroup in at least two dimensions.

Ideally, it should be inspired by a successful general-purpose tool from another functional or data manipulation language (perhaps APL, Mathematica, Matlab, Numpy, or somesuch).

Ideally, the proposal will be accompanied by some non-trivial real-world use cases to help validate the design.

Ideally, there should be demonstrations of reshape() interacting effectively with the other itertools (i.e. a criterion for adding new Lego bricks is whether they work well with all the existing Lego bricks -- that is what makes a good Lego set).

rhettinger · 2014-04-02T20:50:22Z

Nothing new is happening in this thread, so I'm closing it for the reasons listed in the other posts.

The main issue is that the generic concept of "break data into chunks" tends to occur is situations where the iterator protocol would be at odds with a clean solution. A reshape() method on lists would be much better suited to the task.

techtonik mannequin added the stdlib Python modules in the Lib dir label Apr 28, 2013

rhettinger self-assigned this May 4, 2013

ezio-melotti unassigned rhettinger May 4, 2013

ezio-melotti added the type-feature A feature request or enhancement label May 4, 2013

ezio-melotti assigned rhettinger May 4, 2013

rhettinger closed this as completed Apr 2, 2014

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

itertools.chunks(iterable, size, fill=None) #62062

itertools.chunks(iterable, size, fill=None) #62062

techtonik mannequin commented Apr 28, 2013

techtonik mannequin commented Apr 28, 2013

terryjreedy commented May 3, 2013

rhettinger commented May 4, 2013

ezio-melotti commented May 4, 2013

python-dev mannequin commented May 6, 2013

python-dev mannequin commented May 6, 2013

serhiy-storchaka commented May 8, 2013

jstasiak mannequin commented Sep 3, 2013

serhiy-storchaka commented Sep 3, 2013

rhettinger commented Sep 8, 2013

rhettinger commented Apr 2, 2014

itertools.chunks(iterable, size, fill=None) #62062

itertools.chunks(iterable, size, fill=None) #62062

Comments

techtonik mannequin commented Apr 28, 2013

techtonik mannequin commented Apr 28, 2013

terryjreedy commented May 3, 2013

rhettinger commented May 4, 2013

ezio-melotti commented May 4, 2013

python-dev mannequin commented May 6, 2013

python-dev mannequin commented May 6, 2013

serhiy-storchaka commented May 8, 2013

jstasiak mannequin commented Sep 3, 2013

serhiy-storchaka commented Sep 3, 2013

rhettinger commented Sep 8, 2013

rhettinger commented Apr 2, 2014