This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author dschult
Recipients dschult, loewis, rhettinger
Date 2009-04-13.02:33:41
SpamBayes Score 5.551115e-17
Marked as misclassified No
Message-id <7EF9FC9C-BC98-4E34-8DE2-BF9240B0452D@colgate.edu>
In-reply-to <49DF1236.5020304@v.loewis.de>
Content
On Apr 11, 2009, at 8:15 AM, Martin v. Löwis  
<report@bugs.python.org>@psf.upfronthosting.co.za  
@psf.upfronthosting.co.za> wrote:
> Martin v. Löwis <martin@v.loewis.de> added the comment:
>
>> By the way, defaultdict is NOT like setdefault--it is like get().
>> Missing entries do no get set.
>
> Why do you say that?
>
> __missing__(...)
>     __missing__(key) # Called by __getitem__ for missing key;  
> pseudo-code:
>     if self.default_factory is None: raise KeyError((key,))
>     self[key] = value = self.default_factory()
>     return value
>
> In all cases of setdefault that I know of, replacing this with
> a defaultdict would be appropriate. The only case where it wouldn't
> work is if the default value depends on the key.
The missing key reports the default object, but doesn't set that key  
or value
object in the dict.  So you cannot then call up that object and do  
something
that depends on the key.  So the default value may not depend on the key
but still need to be different for each key due to later changes.

For example, to represent a graph structure, it is helpful to use a  
dict-of-dicts.
The first time a node is looked up you want to add it as a key in the  
graph dict with
an empty "nbr dict" as the value.  the "nbr dict" is keyed by  
neighbor to the edge
weight/object.  So the default object is always a fresh nbr dict...  
but what gets
added to that nbr dict depends on which node it is.  I can come up  
with examples
for lists too...   But as you pointed out before, using a list or  
dict in setdefault
requires creating the object before the call anyway...  I'm beginning to
question whether setdefault should ever be used...  Still comes back  
to--why
have it there inefficiently.   Besides I'm still interested in being
able to do this sort of thing at least through the C-API...

Raymond Hettinger <rhettinger@users.sourceforge.net> added the comment:

>> from my perspective creating an internal SetItem adds another
>> function handling the data structure just as setdefault would
> Incorrect comparison.  Your in-lining manipulated the ep structure
> directly (not a good thing).  In contrast, adding an alternative
> _PyDict_SetItemWithHash uses the insertdict() function, fully  
> isolating
> itself of from the table implementation.  The whole purpose of having
> insertdict() and lookdict() is to isolate the data structure internals
> from all externally visible functions.

Good... I am understanding better what you mean by "handling the data  
structure".
I agree that lookdict, insertdict and resizedict should be the three  
that change the
data structure.  But the way those 3 are currently written, you can't  
change an entry
without looking it up.  insertdict_clean almost does it, but it  
assumes there aren't
any dummy entries.

> The double call to a very simple user defined __hash__ adds .07 to  
> call
> that takes on .05 with the double call to builtin object hash.  So, we
> could save half of the the .07 just by elimating the double call to
> __hash__.  With more complex hash functions the overall speedup is  
> huge
> (essentially cutting the total work almost in half).

This is a good example because it shows how the hash can matter.

I think there are similar examples where the lookup is most of the  
time of dict access.
I don't know enough about what causes collisions and how dicts are  
optimized
to quickly come up with such an example.  But such an example must  
exist or
Tim Peters wouldn't have spent so much effort optimizing lookup....
besides his comments at the top of lookup suggest that we do exactly  
what
I am suggesting to do.  Get the entry from lookup and then
"the caller can (if it wishes) add the <key, value> pair to the  
returned PyDictEntry*.

Presently there is no way to do this with C-API except to write my  
own data structure
manipulation.  Wouldn't it be better to encapsulate this in the C-API  
in a standard way
instead of having everybody writing their own C-extension doing  
setdefault the right way?
(ok..ok.. "everybody" here refers to probably one person (me)...  but  
others have thought
of it.  I've even seen the double lookup as a justification for never  
using setdefault)

I'll look for an example that has lots of collisions.  Maybe that  
would help justify a
lookup-less insertdict.
Dan
History
Date User Action Args
2009-04-13 02:33:46dschultsetrecipients: + dschult, loewis, rhettinger
2009-04-13 02:33:44dschultlinkissue5730 messages
2009-04-13 02:33:41dschultcreate