Created on 2009-05-14 17:05 by lieryan, last changed 2012-06-29 22:20 by eric.snow. This issue is now closed.
|msg87743 - (view)||Author: Lie Ryan (lieryan)||Date: 2009-05-14 17:05|
An itertool to Group-by-n >>> lst = range(15) >>> itertools.grouper(lst, 5) [[0, 1, 2, 3, 4], [5, 6, 7, 8, 9], [10, 11, 12, 13, 14]] This function is often asked in several c.l.py discussions, such as these: http://comments.gmane.org/gmane.comp.python.general/623377 http://comments.gmane.org/gmane.comp.python.general/622763 There are several issues. What should be done if the number of items in the original list is not exactly divisible? - raise an error as default - pad with a value from 3rd argument - make the last one shorter, maybe using keyword arguments or sentinel to 3rd argument or should there be separate functions for each of them? What about infinite list? Most recipes for the function uses zip which breaks with infinite list.
|msg87745 - (view)||Author: Raymond Hettinger (rhettinger) *||Date: 2009-05-14 17:13|
This has been rejected before. * It is not a fundamental itertool primitive. The recipes section in the docs shows a clean, fast implementation derived from zip_longest(). * There is some debate on a correct API for odd lengths. Some people want an exception, some want fill-in values, some want truncation, and some want a partially filled-in tuple. The alone is reason enough not to set one behavior in stone. * There is an issue with having too many itertools. The module taken as a whole becomes more difficult to use as new tools are added.
|msg87750 - (view)||Author: Lie Ryan (lieryan)||Date: 2009-05-14 18:20|
All implementations relying on zip or zip_longest breaks with infinite iterable (e.g. itertools.count()). And it is not impossible to define a clean, flexible, and familiar API which will be similar to open()'s mode or unicode error mode. The modes would be 'error' (default), 'pad', 'truncate', and 'partial' (maybe should suggest a better name than 'partial') > There is an issue with having too many itertools. > The module taken as a whole becomes more > difficult to use as new tools are added. It should also be weighed that a lot of people are expecting for this kind of function in itertools. I think there are other functions in itertools that have more questionable value than groupers, such as starmap.
|msg87756 - (view)||Author: Raymond Hettinger (rhettinger) *||Date: 2009-05-14 19:09|
> All implementations relying on zip or zip_longest breaks > with infinite iterable (e.g. itertools.count()). How is it broken? Infinite in, infinite out. >>> def grouper(n, iterable, fillvalue=None): ... args = [iter(iterable)] * n ... return zip_longest(*args, fillvalue=fillvalue) >>> g = grouper(3, count()) >>> next(g) (0, 1, 2) >>> next(g) (3, 4, 5) >>> next(g) (6, 7, 8) >>> next(g) > And it is not impossible to define a clean, flexible, > and familiar API which will be similar to open()'s mode > or unicode error mode. The modes would be 'error' > (default), 'pad', 'truncate', and 'partial' Of course, it's possible. I find that to be bad design. Generally, we follow Guido's advice and create separate functions rather than overload a single function with flags -- that is why we have filterfalse() instead of a flag on filter(). When people suggest an API with multiple flags, it can be a symptom of hyper-generalization where api complexity gets substituted for writing a simple function that does what you want in the first place. IMO, it is easier to learn the zip(g, g, g) idiom and customize it to your own needs than to learn a new tool with four flag options that control its output signature.
|2009-05-14 19:09:48||rhettinger||set||messages: + msg87756|
|2009-05-14 18:20:50||lieryan||set||messages: + msg87750|
|2009-05-14 17:13:30||rhettinger||set||status: open -> closed|
versions: + Python 3.1, Python 2.7
nosy: + rhettinger
messages: + msg87745