Message85933
On Apr 11, 2009, at 8:15 AM, Martin v. Löwis
<report@bugs.python.org>@psf.upfronthosting.co.za
@psf.upfronthosting.co.za> wrote:
> Martin v. Löwis <martin@v.loewis.de> added the comment:
>
>> By the way, defaultdict is NOT like setdefault--it is like get().
>> Missing entries do no get set.
>
> Why do you say that?
>
> __missing__(...)
> __missing__(key) # Called by __getitem__ for missing key;
> pseudo-code:
> if self.default_factory is None: raise KeyError((key,))
> self[key] = value = self.default_factory()
> return value
>
> In all cases of setdefault that I know of, replacing this with
> a defaultdict would be appropriate. The only case where it wouldn't
> work is if the default value depends on the key.
The missing key reports the default object, but doesn't set that key
or value
object in the dict. So you cannot then call up that object and do
something
that depends on the key. So the default value may not depend on the key
but still need to be different for each key due to later changes.
For example, to represent a graph structure, it is helpful to use a
dict-of-dicts.
The first time a node is looked up you want to add it as a key in the
graph dict with
an empty "nbr dict" as the value. the "nbr dict" is keyed by
neighbor to the edge
weight/object. So the default object is always a fresh nbr dict...
but what gets
added to that nbr dict depends on which node it is. I can come up
with examples
for lists too... But as you pointed out before, using a list or
dict in setdefault
requires creating the object before the call anyway... I'm beginning to
question whether setdefault should ever be used... Still comes back
to--why
have it there inefficiently. Besides I'm still interested in being
able to do this sort of thing at least through the C-API...
Raymond Hettinger <rhettinger@users.sourceforge.net> added the comment:
>> from my perspective creating an internal SetItem adds another
>> function handling the data structure just as setdefault would
> Incorrect comparison. Your in-lining manipulated the ep structure
> directly (not a good thing). In contrast, adding an alternative
> _PyDict_SetItemWithHash uses the insertdict() function, fully
> isolating
> itself of from the table implementation. The whole purpose of having
> insertdict() and lookdict() is to isolate the data structure internals
> from all externally visible functions.
Good... I am understanding better what you mean by "handling the data
structure".
I agree that lookdict, insertdict and resizedict should be the three
that change the
data structure. But the way those 3 are currently written, you can't
change an entry
without looking it up. insertdict_clean almost does it, but it
assumes there aren't
any dummy entries.
> The double call to a very simple user defined __hash__ adds .07 to
> call
> that takes on .05 with the double call to builtin object hash. So, we
> could save half of the the .07 just by elimating the double call to
> __hash__. With more complex hash functions the overall speedup is
> huge
> (essentially cutting the total work almost in half).
This is a good example because it shows how the hash can matter.
I think there are similar examples where the lookup is most of the
time of dict access.
I don't know enough about what causes collisions and how dicts are
optimized
to quickly come up with such an example. But such an example must
exist or
Tim Peters wouldn't have spent so much effort optimizing lookup....
besides his comments at the top of lookup suggest that we do exactly
what
I am suggesting to do. Get the entry from lookup and then
"the caller can (if it wishes) add the <key, value> pair to the
returned PyDictEntry*.
Presently there is no way to do this with C-API except to write my
own data structure
manipulation. Wouldn't it be better to encapsulate this in the C-API
in a standard way
instead of having everybody writing their own C-extension doing
setdefault the right way?
(ok..ok.. "everybody" here refers to probably one person (me)... but
others have thought
of it. I've even seen the double lookup as a justification for never
using setdefault)
I'll look for an example that has lots of collisions. Maybe that
would help justify a
lookup-less insertdict.
Dan |
|
Date |
User |
Action |
Args |
2009-04-13 02:33:46 | dschult | set | recipients:
+ dschult, loewis, rhettinger |
2009-04-13 02:33:44 | dschult | link | issue5730 messages |
2009-04-13 02:33:41 | dschult | create | |
|