Message77482
My latest need for something like this was something like this:
src1 = db_query(query_1)
src2 = db_query(query_2)
results = deduped(src1 + src2, key=lambda x: x.field2)
Basically, I wanted data from src1 if it existed and otherwise from src2
, while preserving the order of src1 (I didn't care about order of
src2).
A previous example was reading from a file and wanting to de-dupe lines
based on a field in each line. Again order mattered to me since I wanted
to process the non-duped lines in the file in order.
A final example was generating a bunch of error messages from a variety
of sources and then wanting to make sure there were no duplicate errors.
Instead of:
errors = set(errors)
I find this much clearer:
errors = deduped(errors)
In reality all of these examples probably do not need to be written as a
generator. The lists being de-duped are probably not so huge in practice
as to preclude instantiating a new list (given the reality of multi-gig
RAM machines etc). It just seemed particularly clear to write this using
a yield.
An ordered dictionary would probably work for me too. I don't think a
Bag would given it's lack of ordering.
I do find it very simple to just be able to apply deduped() to any
existing sequence/iterator and not have to be more verbose about
explicitly iterating and filling in an ordered dictionary somehow. |
|
Date |
User |
Action |
Args |
2008-12-10 03:47:20 | thomaspinckney3 | set | recipients:
+ thomaspinckney3, rhettinger |
2008-12-10 03:47:20 | thomaspinckney3 | set | messageid: <1228880840.7.0.942789090503.issue4615@psf.upfronthosting.co.za> |
2008-12-10 03:47:14 | thomaspinckney3 | link | issue4615 messages |
2008-12-10 03:47:13 | thomaspinckney3 | create | |
|