classification
Title: can't list groupby generator without breaking the sub groups generators
Type: behavior Stage: resolved
Components: Versions: Python 3.5, Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: Loïc Le Loarer, eric.smith, rhettinger
Priority: normal Keywords:

Created on 2017-09-27 20:55 by Loïc Le Loarer, last changed 2017-09-28 19:26 by Loïc Le Loarer. This issue is now closed.

Messages (4)
msg303180 - (view) Author: Loïc Le Loarer (Loïc Le Loarer) Date: 2017-09-27 20:55
If I "list" the itertools groupby generator, then the sub generators of each groups are all empty except the last one.

import itertools as i
L = ['azerty','abcd','ac','aaa','z','baba','bitte','rhum','z','y']
g = list(i.groupby(L, lambda x: x[0]))
number_of_groups = len(g)
ans = 0
for k, v in g: # This doesn't work
#for k, v in i.groupby(L, lambda x: x[0]): # This works
    v = list(v)
    print(k,v,len(v))
    ans += 100*len(v)//number_of_groups
print(ans)
assert(ans == 163)
I don't understand why. Is my code broken ?

The need for saving the group generator first exists when I need the number of groups before walking thru the groups, like in the above example.

I have not been able to test to latest python versions, is the problem already fixed ?
msg303183 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-09-27 21:29
Except for display the last few elements, this is the documented and intended behavior: ( https://docs.python.org/3/library/itertools.html#itertools.groupby ):

'''
The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list:

groups = []
uniquekeys = []
data = sorted(data, key=keyfunc)
for k, g in groupby(data, keyfunc):
    groups.append(list(g))      # Store group iterator as a list
    uniquekeys.append(k)
'''

The display of the last few elements isn't supposed to happen.  That is being fixed so that all of the subiterator results are empty when the parent iterator is exhausted.
msg303221 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2017-09-28 11:20
Loïc Le Loarer: Note that your use case isn't possible, anyway. There's no way to know the number of groups until the input is exhausted, at which point you've already iterated through all of the data.
msg303270 - (view) Author: Loïc Le Loarer (Loïc Le Loarer) Date: 2017-09-28 19:26
Thanks a lot for the clear answer, sorry for not having read the online documentation, I only read the help(itertools.groupby) which has much less details.

And for my use case, I can use an explicit command just to compute the number of groups.
History
Date User Action Args
2017-09-28 19:26:42Loïc Le Loarersetmessages: + msg303270
2017-09-28 11:20:51eric.smithsetnosy: + eric.smith
messages: + msg303221
2017-09-27 21:29:33rhettingersetstatus: open -> closed

assignee: rhettinger

nosy: + rhettinger
messages: + msg303183
resolution: not a bug
stage: resolved
2017-09-27 20:55:49Loïc Le Loarercreate