Message293529
I think that works to solve the problem that I pointed out. In my stack overflow question (http://stackoverflow.com/a/43926058/748858) it has been pointed out that there are other opportunities for weirdness here.
Specifically, if if I skip processing 2 groups and then I process a third group whose key is the same as the first:
inputs = [(x > 5, x) for x in range(10)]
inputs += [(False, 10), (True, 11)]
g = groupby(inputs2 + [(True, 11)], key=itemgetter(0))
_, a = next(g)
_, b = next(g)
_, c = next(g)
print(list(a))
print(list(b))
Both `a` and `b` should probably be empty at this point, but they aren't.
What if you kept track of the last iterable group and just consumed it at whenever `next` is called? I think then you also need to keep track of whether or not the input iterable has been completely consumed, but that's not too bad either:
_sentinel = object()
class groupby:
# [k for k, g in groupby('AAAABBBCCDAABBB')] --> A B C D A B
# [list(g) for k, g in groupby('AAAABBBCCD')] --> AAAA BBB CC D
def __init__(self, iterable, key=None):
if key is None:
key = lambda x: x
self.keyfunc = key
self.it = iter(iterable)
self.last_group = self.currkey = self.currvalue = _sentinel
self.empty = False
def __iter__(self):
return self
def __next__(self):
if self.last_group is not _sentinel:
for _ in self.last_group:
pass
if self.empty:
raise StopIteration
if self.currvalue is _sentinel:
try:
self.currvalue = next(self.it)
except StopIteration:
self.empty = True
raise
self.currkey = self.keyfunc(self.currvalue)
self.last_group = self._grouper(self.currkey, self.currvalue)
return (self.currkey, self.last_group)
def _grouper(self, tgtkey, currvalue):
while self.currkey == tgtkey:
yield self.currvalue
try:
self.currvalue = next(self.it)
except StopIteration:
self.empty = True
return
self.currkey = self.keyfunc(self.currvalue)
I haven't tested this to make sure it passes the test suite -- I also don't know if this would have major performance implications or anything. If it did have severe performance implications, then it probably isn't worthwhile... |
|
Date |
User |
Action |
Args |
2017-05-12 07:20:24 | mgilson | set | recipients:
+ mgilson, rhettinger, Matt Gilson |
2017-05-12 07:20:24 | mgilson | set | messageid: <1494573624.14.0.91835268933.issue30346@psf.upfronthosting.co.za> |
2017-05-12 07:20:24 | mgilson | link | issue30346 messages |
2017-05-12 07:20:23 | mgilson | create | |
|