This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: itertools.groupby ungraceful, un-Pythonic
Type: Stage:
Components: Library (Lib) Versions: Python 2.4
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: mkc, rhettinger
Priority: normal Keywords:

Created on 2005-05-31 15:34 by mkc, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (3)
msg25447 - (view) Author: Mike Coleman (mkc) Date: 2005-05-31 15:34
The sharing of the result iterator by itertools.groupby
leads to strange, arguably un-Pythonic behavior.  For
example, suppose we have a list of pairs that we're
about to turn into a dict and we want to check first
for duplicate keys.  We might do something like this

>>> [ (k,list(v)) for (k, v) in groupby([(1,2), (1,3),
(2,3), (3,5)], lambda x: x[0]) ]
[(1, [(1, 2), (1, 3)]), (2, [(2, 3)]), (3, [(3, 5)])]
>>> [ (k,list(v)) for (k, v) in list(groupby([(1,2),
(1,3), (2,3), (3,5)], lambda x: x[0])) ]
[(1, []), (2, []), (3, [(3, 5)])]
>>> [ (k,list(v)) for (k, v) in groupby([(1,2), (1,3),
(2,3), (3,5)], lambda x: x[0]) if len(list(v)) > 1 ]
[(1, [])]

The first result looks good, but the second two
silently produce what appear to be bizarre results. 
The second is understandable (sort of) if you know that
the result iterator is shared, and the third I don't
get at all.

This silent failure seems very Perlish.  At a minimum,
if use is made of the "expired" result iterator, an
exception should be thrown.  This is a wonderfully
useful function and ideally, there should be a version
of groupby that behaves as a naive user would expect.
msg25448 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2005-05-31 16:16
Logged In: YES 
user_id=80475

Sorry, this is more of a rant than a bug report.  The tool
is functioning as designed and documented.  The various
design options were discussed on python-dev and this was
what was settled on as the most useful, general purpose tool
(eminently practical, but not idiotproof).

Like other itertools, it can be used in a straight-forward
manner or be used to write cryptic, mysterious code.  In
general, if you can't follow your own code (in situatations
such as the above), a good first step is to unroll the list
comprehension into a regular for-loop as that tends to make
the assumptions and control flow more visible.  Also, it can
be taken as a hint that the itertool is not being used as
intended.



msg25449 - (view) Author: Mike Coleman (mkc) Date: 2005-06-03 21:10
Logged In: YES 
user_id=555

I didn't mean it as a rant.  Sorry.

I don't necessarily mind having an optimized version of
groupby with sharp edges for the unawares, but it seems like
a "friendly" version is actually at least as important and
should therefore also be supplied.  (Making an analogy with
Lisp, having 'nconc' doesn't alleviate the need for an
'append'.)  The friendly version of 'groupby' doesn't really
have much to do with itertools--maybe it should be a basic
builtin operator, like 'reduce'.

With due respect, I don't think the examples I'm giving are
at all cryptic or playing fast and loose with comprehension
semantics.  Rather, I'd argue that they demonstrate that the
somewhat surprising semantics of itertools.groupby make it
not entirely suitable for naive users.

I'm really hoping for something here, as I've been copying a
'groupby' function (from the Python recipe collection) into
my scripts now for quite a long time.  I think this is a
powerful and very much needed basic function, and I'd really
like to see a broadly usable version of it incorporated.
History
Date User Action Args
2022-04-11 14:56:11adminsetgithub: 42032
2005-05-31 15:34:25mkccreate