classification
Title: Creating dicts for dict subclasses
Type: Stage:
Components: Interpreter Core Versions: Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: doerwalter, jimjjewett, nnorwitz, rhettinger
Priority: normal Keywords: patch

Created on 2006-12-14 13:08 by doerwalter, last changed 2007-04-17 04:06 by nnorwitz. This issue is now closed.

Files
File name Uploaded Description Edit
diff.txt doerwalter, 2006-12-14 13:08
Messages (11)
msg51522 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2006-12-14 13:08
This patch changes dictobject.c so that creating dicts from mapping like objects only uses the internal dict functions if the argument is a *real* dict, not a subclass. This means that overwritten keys() and __getitem__() methods are now honored. In addition to that the fallback implementation now tries iterkeys() before trying keys(). It also adds a PyMapping_IterKeys() macro.
msg51523 - (view) Author: Jim Jewett (jimjjewett) Date: 2006-12-19 17:50
Why are you using iterkeys instead of iteritems?

It seems like if they've filled out the interface enough to have iterkeys, they've probably filled it out all the way, and you do need the value as soon as you get the key.
msg51524 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2006-12-19 19:23
iteritems() has to create a new tuple for each item, so this might be slower.
msg51525 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2006-12-19 21:07
I'm -1 on making ANY guarantees about which methods underlie others -- that would constitute new and everlasting guarantees about how mappings are implemented.  Subclasses should explicity override/extend the methods withed changed behavior.  If that proves non-trivial, then it is likely there should be a has-a relationship instead of an is-a relationship.  Also, it is likely that the subclass will have Liskov substitutability violations.  Either way, there is probably a design flaw.
msg51526 - (view) Author: Jim Jewett (jimjjewett) Date: 2006-12-19 22:26
As I understand it, the problem is that dict.update is assuming any dict subclass will use the same internal data representation.

Restricting the fast path to exactly builtin dicts (not subclasses) fixes the bug, but makes the fallback more frequent.

The existing fallback is to call keys(), then iterate over it, retrieving the value for each key.  (keys is required for a "minimal mapping" as documented is UserDict, and a few other random places.)

The only potential dependency on other methods is his proposed new intermediate path that avoids creating a list of all keys, by using iterkeys if it exists.  (I suggested using iteritems to avoid the lookups.)  If iter* aren't implemented, the only harm is falling back to the existing fallback of "for k in keys():"
msg51527 - (view) Author: Jim Jewett (jimjjewett) Date: 2006-12-19 22:29
FWIW, I'm not sure I agree on not specifying which methods call share implementation.

If someone hooks __getitem__ but not get, that is just asking for bugs.  (The implementation of get may -- but need not -- make its own call to __getitem__, and not everyone will make the same decision.)
msg51528 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2006-12-19 23:00
It is also asking for bugs if someone hooks __getitem__ and starts to make possibly invalid assumptions about what other changes occur implicitly.

Also, this patch kills the performance of builtin subclasses.  If I subclass dict to add a new method, it would suck to have the performance of all of the other methods drop percariously.

This patch is somewhat overzealous.  It encroaches on the terriority of UserDict.DictMixin which was specifically made for propagating new behaviors.  It unnecessarily exposes implementation details.  It introduces implicit behaviors that should be done through explicit overrides of methods whose behavior is supposed to change.  

And, it is on the top of a slippery slope that we don't want to go down (i.e. do we want to guarantee that list.append is implemented in terms of list.extend, etc).  Python has no shortage of places where builtin subclasses make direct calls to the underlying C code -- this patch leads to a bottomless pit of changes that kill performance and make implicit side-effects the norm instead of the exception.
msg51529 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2006-12-20 00:13
Since update already supports (key, item) changes, I do not see that rationale in trying to expand the definition what is dict-like to include a try-this, then try-that approach.  This is a little too ad-hoc for too little benefit.

Also, I do not see the point of adding PyMapping_Iterkeys to the C API.  It affords no advantage over its macro definition (the current one-way-to-do-it). 
msg51530 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2006-12-20 12:59
To clear up some apparent misunderstandings: This patch does *not* advocate that some dict methods should be implemented by calling other dict methods so that dict subclasses only have to overwrite a few methods to gain a completely consistent implementation.

This patch only fixes the dict constructor (and update()) and consists of two parts:

(1) There are three code paths in dictobject.c::dict_update_common(): (a) if the constructor argument doesn't have a "keys" attribute treat it as a iterable of items; (b) if the argument has a "keys" attribute, but is not a dict (and not an instance of a subclass of dict), use keys() and __getitem__() to make a copy of the mapping-like object. (c) if the argument has a "keys" attribute and is a dict (or an instance of a subclass of dict) bypass any of the overwritten methods that the object might provide and directly use the dict implementation. This patch changes PyDict_Merge() so that code path (b) is used for dict constructor arguments that are subclasses of dict, so that any overwritten methods are honored.

(2) This means that now if a subclass of dict is passed to the constructor or update() the code is IMHO more correct (it honors the reimplemenation of the mapping methods), but slower. To reduce the slowdown instead of using kesY() and __getitem__(), iterkeys() and __getitem__() are used.

I can't see why the current behaviour should be better: Yes, it is faster, but it is also wrong: Without the patch the behaviour of dict() and dict.update() depends on the fact whether the argument happens to subclass dict or not. If it doesn't all is well: the argument is treated as a mapping (i.e. keys() and __getitem__() are used) otherwise the methods are completely ignored.

So can we agree on the fact that (1) is desirable? (At least Guido said as much: http://mail.python.org/pipermail/python-dev/2006-December/070341.html)

BTW, I only added PyMapping_Iterkeys() because it mirrors PyMapping_Keys().
msg51531 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2007-02-07 20:17
Added PyDict_CheckExact() in revisions 53655 and 53656.  A side-effect of this change is to slow-down updates with dict subclasses that are not overriding keys() and __getitem__().  This is especially unfortunate given good existing alternatives and given a lack of real use cases (dict subclasses that aspire to hand-off updates but not use their actual keys and mapped values).

Left out the gratuitous expansion of the API which exposed too much of the internal implementation and sought to introduce more implicit behaviors that would better be handled by explictly passing in an iterable of items to d.update().  For example.  d.update((k(x), g(x)) for x in myweirdmapping).
msg51532 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2007-04-17 04:06
This was reverted on the 2.5 branch due to the change in behavior.  See http://mail.python.org/pipermail/python-dev/2007-April/072565.html for more details.
History
Date User Action Args
2006-12-14 13:08:32doerwaltercreate