This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author rhettinger
Recipients rhettinger, thomaspinckney3
Date 2008-12-18.08:38:19
SpamBayes Score 1.8272245e-06
Marked as misclassified No
Message-id <1229589501.39.0.58372422351.issue4615@psf.upfronthosting.co.za>
In-reply-to
Content
My inclination is to not include this as a basic C coded itertool
because it holds potentially all of the data in memory (generally, not a
characteristic of an itertool) and because I don't see it as a basic
building block (itertools are intended to be elemental, composable
components of an iterator algebra).  Also, the pure python equivalent of
dedup() is both easy to write and runs efficiently (it gains little from
being recoded in C).

Instead, I'm think of adding two recipes to the itertools docs:

def unique_everseen(iterable, key=None):
    # unique_everseen('AAAABBBCCDAABBB') --> A B C D
    # unique_everseen('ABBCcAD', str.lower) --> A B C D    
    seen = set()
    seen_add = seen.add
    if key is None:
        for elem in iterable:
            if elem not in seen:
                seen_add(elem)
                yield elem
    else:
        for elem in iterable:
            k = key(elem)
            if k not in seen:
                seen_add(k)
                yield elem

def unique_lastseen(iterable, key=None):
    # unique_lastseen('AAAABBBCCDAABBB') --> A B C D A B
    # unique_lastseen('ABBCcAD', str.lower) --> A B C A D
    return imap(next, imap(itemgetter(1), groupby(iterable, key)))
History
Date User Action Args
2008-12-18 08:38:21rhettingersetrecipients: + rhettinger, thomaspinckney3
2008-12-18 08:38:21rhettingersetmessageid: <1229589501.39.0.58372422351.issue4615@psf.upfronthosting.co.za>
2008-12-18 08:38:20rhettingerlinkissue4615 messages
2008-12-18 08:38:19rhettingercreate