This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author dschult
Recipients dschult, loewis, rhettinger
Date 2009-04-10.04:54:30
SpamBayes Score 0.0
Marked as misclassified No
Message-id <666A36BE-5DEB-4A8C-B524-FC6A4D89E071@colgate.edu>
In-reply-to <49DE7646.7040501@v.loewis.de>
Content
Benchmarks:
Upon trying cooked up examples, I do not notice any speedup
beyond 5-10%.  Seems like function calling time swamps everything
for small examples with fast hashes.  I don't have a handy dandy
example with long hash times or long lookup times.  That's what
it would take to show a large performance boost with this patch.

I also agree with Martin that there are many reasons not to use
setdefault...   But it is part of the API.
We might as well make it worth using.  (Which probably
means changing the default value to a factory function which gets
called when the key is not found.  But that's a much bigger change...)
I'm just suggesting that the code should not do extra work.

By the way, defaultdict is NOT like setdefault--it is like get().
Missing entries do no get set.  Perhaps there should be a
collections entry that does setdefault instead of get......

Next, the second hash gets executed "only the first time" for EACH key.
So, e.g. if you have a lot of entries that get called up 2 or 3 times,
using the second hash does make a difference1/2 to 1/3 of the time.
And we don't need a second hash or lookup so why do it.

I understand Raymond's concern about more code using the
data structure directly.  There are three basic routines to deal with
the data structure:  ma_lookup/lookdict,  insertdict and resizedict.
The comments for lookdict encourage you to use the "ep" entry
to check if it is empty and add the key/value pair if desired.
But as currently implemented, insertdict calls lookdict, so
they aren't atomistic in that sense.  If atomism is a design goal
(even if it isn't a word :) then insertdict would take "ep" as an input
instead of doing a lookup.  I'm not sure if atomism is part of the  
design
in Python though...

 From my perspective creating an internal SetItem adds another function
handling the data structure just as setdefault would--that's why I  
inlined
in setdefault.  But, it does keep the name similar and thus its  
easier to
identify it as writing to the data structure.  If this style is  
promoted here
then I think there ought to be an internal insertdict that doesn't do  
the
lookup.   Also shouldn't the parent functions call these internal  
versions
instead of duplicating code?   Ack---that means more changes.  Not
difficult ones though...

What do you think?
Dan
History
Date User Action Args
2009-04-10 04:54:57dschultsetrecipients: + dschult, loewis, rhettinger
2009-04-10 04:54:52dschultlinkissue5730 messages
2009-04-10 04:54:36dschultcreate