Message 77482 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	thomaspinckney3
Recipients	rhettinger, thomaspinckney3
Date	2008-12-10.03:47:13
SpamBayes Score	9.580556e-09
Marked as misclassified	No
Message-id	<1228880840.7.0.942789090503.issue4615@psf.upfronthosting.co.za>
In-reply-to

Content
My latest need for something like this was something like this: src1 = db_query(query_1) src2 = db_query(query_2) results = deduped(src1 + src2, key=lambda x: x.field2) Basically, I wanted data from src1 if it existed and otherwise from src2 , while preserving the order of src1 (I didn't care about order of src2). A previous example was reading from a file and wanting to de-dupe lines based on a field in each line. Again order mattered to me since I wanted to process the non-duped lines in the file in order. A final example was generating a bunch of error messages from a variety of sources and then wanting to make sure there were no duplicate errors. Instead of: errors = set(errors) I find this much clearer: errors = deduped(errors) In reality all of these examples probably do not need to be written as a generator. The lists being de-duped are probably not so huge in practice as to preclude instantiating a new list (given the reality of multi-gig RAM machines etc). It just seemed particularly clear to write this using a yield. An ordered dictionary would probably work for me too. I don't think a Bag would given it's lack of ordering. I do find it very simple to just be able to apply deduped() to any existing sequence/iterator and not have to be more verbose about explicitly iterating and filling in an ordered dictionary somehow.

My latest need for something like this was something like this:

src1 = db_query(query_1)
src2 = db_query(query_2)
results = deduped(src1 + src2, key=lambda x: x.field2)

Basically, I wanted data from src1 if it existed and otherwise from src2 
, while preserving the order of src1 (I didn't care about order of 
src2).

A previous example was reading from a file and wanting to de-dupe lines 
based on a field in each line. Again order mattered to me since I wanted 
to process the non-duped lines in the file in order.

A final example was generating a bunch of error messages from a variety 
of sources and then wanting to make sure there were no duplicate errors. 
Instead of: 

errors = set(errors)

I find this much clearer:

errors = deduped(errors)

In reality all of these examples probably do not need to be written as a 
generator. The lists being de-duped are probably not so huge in practice 
as to preclude instantiating a new list (given the reality of multi-gig 
RAM machines etc). It just seemed particularly clear to write this using 
a yield.

An ordered dictionary would probably work for me too. I don't think a 
Bag would given it's lack of ordering. 

I do find it very simple to just be able to apply deduped() to any 
existing sequence/iterator and not have to be more verbose about 
explicitly iterating and filling in an ordered dictionary somehow.

History
Date	User	Action	Args
2008-12-10 03:47:20	thomaspinckney3	set	recipients: + thomaspinckney3, rhettinger
2008-12-10 03:47:20	thomaspinckney3	set	messageid: <1228880840.7.0.942789090503.issue4615@psf.upfronthosting.co.za>
2008-12-10 03:47:14	thomaspinckney3	link	issue4615 messages
2008-12-10 03:47:13	thomaspinckney3	create