Issue1212077
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2005-05-31 15:34 by mkc, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Messages (3) | |||
---|---|---|---|
msg25447 - (view) | Author: Mike Coleman (mkc) | Date: 2005-05-31 15:34 | |
The sharing of the result iterator by itertools.groupby leads to strange, arguably un-Pythonic behavior. For example, suppose we have a list of pairs that we're about to turn into a dict and we want to check first for duplicate keys. We might do something like this >>> [ (k,list(v)) for (k, v) in groupby([(1,2), (1,3), (2,3), (3,5)], lambda x: x[0]) ] [(1, [(1, 2), (1, 3)]), (2, [(2, 3)]), (3, [(3, 5)])] >>> [ (k,list(v)) for (k, v) in list(groupby([(1,2), (1,3), (2,3), (3,5)], lambda x: x[0])) ] [(1, []), (2, []), (3, [(3, 5)])] >>> [ (k,list(v)) for (k, v) in groupby([(1,2), (1,3), (2,3), (3,5)], lambda x: x[0]) if len(list(v)) > 1 ] [(1, [])] The first result looks good, but the second two silently produce what appear to be bizarre results. The second is understandable (sort of) if you know that the result iterator is shared, and the third I don't get at all. This silent failure seems very Perlish. At a minimum, if use is made of the "expired" result iterator, an exception should be thrown. This is a wonderfully useful function and ideally, there should be a version of groupby that behaves as a naive user would expect. |
|||
msg25448 - (view) | Author: Raymond Hettinger (rhettinger) * ![]() |
Date: 2005-05-31 16:16 | |
Logged In: YES user_id=80475 Sorry, this is more of a rant than a bug report. The tool is functioning as designed and documented. The various design options were discussed on python-dev and this was what was settled on as the most useful, general purpose tool (eminently practical, but not idiotproof). Like other itertools, it can be used in a straight-forward manner or be used to write cryptic, mysterious code. In general, if you can't follow your own code (in situatations such as the above), a good first step is to unroll the list comprehension into a regular for-loop as that tends to make the assumptions and control flow more visible. Also, it can be taken as a hint that the itertool is not being used as intended. |
|||
msg25449 - (view) | Author: Mike Coleman (mkc) | Date: 2005-06-03 21:10 | |
Logged In: YES user_id=555 I didn't mean it as a rant. Sorry. I don't necessarily mind having an optimized version of groupby with sharp edges for the unawares, but it seems like a "friendly" version is actually at least as important and should therefore also be supplied. (Making an analogy with Lisp, having 'nconc' doesn't alleviate the need for an 'append'.) The friendly version of 'groupby' doesn't really have much to do with itertools--maybe it should be a basic builtin operator, like 'reduce'. With due respect, I don't think the examples I'm giving are at all cryptic or playing fast and loose with comprehension semantics. Rather, I'd argue that they demonstrate that the somewhat surprising semantics of itertools.groupby make it not entirely suitable for naive users. I'm really hoping for something here, as I've been copying a 'groupby' function (from the Python recipe collection) into my scripts now for quite a long time. I think this is a powerful and very much needed basic function, and I'd really like to see a broadly usable version of it incorporated. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:11 | admin | set | github: 42032 |
2005-05-31 15:34:25 | mkc | create |