Author mgilson
Recipients Matt Gilson, mgilson, rhettinger
Date 2017-05-12.07:20:23
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1494573624.14.0.91835268933.issue30346@psf.upfronthosting.co.za>
In-reply-to
Content
I think that works to solve the problem that I pointed out.  In my stack overflow question (http://stackoverflow.com/a/43926058/748858) it has been pointed out that there are other opportunities for weirdness here.

Specifically, if if I skip processing 2 groups and then I process a third group whose key is the same as the first:


inputs = [(x > 5, x) for x in range(10)]
inputs += [(False, 10), (True, 11)]

g = groupby(inputs2 + [(True, 11)], key=itemgetter(0))
_, a = next(g)
_, b = next(g)
_, c = next(g)

print(list(a))
print(list(b))

Both `a` and `b` should probably be empty at this point, but they aren't.  

What if you kept track of the last iterable group and just consumed it at whenever `next` is called?  I think then you also need to keep track of whether or not the input iterable has been completely consumed, but that's not too bad either:

_sentinel = object()

class groupby:
    # [k for k, g in groupby('AAAABBBCCDAABBB')] --> A B C D A B
    # [list(g) for k, g in groupby('AAAABBBCCD')] --> AAAA BBB CC D
    def __init__(self, iterable, key=None):
        if key is None:
            key = lambda x: x
        self.keyfunc = key
        self.it = iter(iterable)
        self.last_group = self.currkey = self.currvalue = _sentinel
        self.empty = False

    def __iter__(self):
        return self

    def __next__(self):
        if self.last_group is not _sentinel:
            for _ in self.last_group:
                pass
        if self.empty:
            raise StopIteration

        if self.currvalue is _sentinel:
            try:
                self.currvalue = next(self.it)
            except StopIteration:
                self.empty = True
                raise
            self.currkey = self.keyfunc(self.currvalue)
        self.last_group = self._grouper(self.currkey, self.currvalue)
        return (self.currkey, self.last_group)

    def _grouper(self, tgtkey, currvalue):
        while self.currkey == tgtkey:
            yield self.currvalue
            try:
                self.currvalue = next(self.it)
            except StopIteration:
                self.empty = True
                return
            self.currkey = self.keyfunc(self.currvalue)

I haven't tested this to make sure it passes the test suite -- I also don't know if this would have major performance implications or anything.  If it did have severe performance implications, then it probably isn't worthwhile...
History
Date User Action Args
2017-05-12 07:20:24mgilsonsetrecipients: + mgilson, rhettinger, Matt Gilson
2017-05-12 07:20:24mgilsonsetmessageid: <1494573624.14.0.91835268933.issue30346@psf.upfronthosting.co.za>
2017-05-12 07:20:24mgilsonlinkissue30346 messages
2017-05-12 07:20:23mgilsoncreate