Message 78015 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	rhettinger
Recipients	rhettinger, thomaspinckney3
Date	2008-12-18.08:38:19
SpamBayes Score	1.8272245e-06
Marked as misclassified	No
Message-id	<1229589501.39.0.58372422351.issue4615@psf.upfronthosting.co.za>
In-reply-to

Content
My inclination is to not include this as a basic C coded itertool because it holds potentially all of the data in memory (generally, not a characteristic of an itertool) and because I don't see it as a basic building block (itertools are intended to be elemental, composable components of an iterator algebra). Also, the pure python equivalent of dedup() is both easy to write and runs efficiently (it gains little from being recoded in C). Instead, I'm think of adding two recipes to the itertools docs: def unique_everseen(iterable, key=None): # unique_everseen('AAAABBBCCDAABBB') --> A B C D # unique_everseen('ABBCcAD', str.lower) --> A B C D seen = set() seen_add = seen.add if key is None: for elem in iterable: if elem not in seen: seen_add(elem) yield elem else: for elem in iterable: k = key(elem) if k not in seen: seen_add(k) yield elem def unique_lastseen(iterable, key=None): # unique_lastseen('AAAABBBCCDAABBB') --> A B C D A B # unique_lastseen('ABBCcAD', str.lower) --> A B C A D return imap(next, imap(itemgetter(1), groupby(iterable, key)))

My inclination is to not include this as a basic C coded itertool
because it holds potentially all of the data in memory (generally, not a
characteristic of an itertool) and because I don't see it as a basic
building block (itertools are intended to be elemental, composable
components of an iterator algebra).  Also, the pure python equivalent of
dedup() is both easy to write and runs efficiently (it gains little from
being recoded in C).

Instead, I'm think of adding two recipes to the itertools docs:

def unique_everseen(iterable, key=None):
    # unique_everseen('AAAABBBCCDAABBB') --> A B C D
    # unique_everseen('ABBCcAD', str.lower) --> A B C D    
    seen = set()
    seen_add = seen.add
    if key is None:
        for elem in iterable:
            if elem not in seen:
                seen_add(elem)
                yield elem
    else:
        for elem in iterable:
            k = key(elem)
            if k not in seen:
                seen_add(k)
                yield elem

def unique_lastseen(iterable, key=None):
    # unique_lastseen('AAAABBBCCDAABBB') --> A B C D A B
    # unique_lastseen('ABBCcAD', str.lower) --> A B C A D
    return imap(next, imap(itemgetter(1), groupby(iterable, key)))

History
Date	User	Action	Args
2008-12-18 08:38:21	rhettinger	set	recipients: + rhettinger, thomaspinckney3
2008-12-18 08:38:21	rhettinger	set	messageid: <1229589501.39.0.58372422351.issue4615@psf.upfronthosting.co.za>
2008-12-18 08:38:20	rhettinger	link	issue4615 messages
2008-12-18 08:38:19	rhettinger	create