classification
Title: Dict copy optimization violates subclass invariant
Type: behavior Stage:
Components: Versions:
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: jab, josh.r, methane, pgjones, rhettinger, serhiy.storchaka
Priority: normal Keywords:

Created on 2021-02-17 22:12 by jab, last changed 2021-02-19 02:54 by methane.

Messages (4)
msg387191 - (view) Author: Joshua Bronson (jab) * Date: 2021-02-17 22:12
If I understand correctly, it should be an invariant that in the code below, for all "Parent" classes, for all "method"s, Child1.method should return the same result as Child2.method:

```
class Parent:
    def __init__(self, value):
        self._value = value

    def method(self):
        return self._value


class Child1(Parent):
    pass


c1 = Child1(42)
result = c1.method()
assert result == 42, result


class Child2(Parent):
    def method(self):
        return super().method()


c2 = Child2(42)
result = c2.method()
assert result == 42, result
```

But when "Parent" is "dict" and method is "__iter__", that is not the case:

```
SHOW_BUG = True

class ChildDict1(dict):
    """Simplification of werkzeug.datastructures.MultiDict."""
    def __init__(self):
        pass
    
    if not SHOW_BUG:
        def __iter__(self):
            return super().__iter__()

    def add(self, key, value):
        self.setdefault(key, []).append(value)
    
    def __setitem__(self, key, value):
        """Like add, but removes any existing key first."""
        super().__setitem__(key, [value])
    
    def getall(self, key) -> list:
        return super().__getitem__(key)

    def __getitem__(self, key):
        """Return the first value for this key."""
        return self.getall(key)[0]

    def items(self, multi=False):
        for (key, values) in super().items():
            if multi:
                yield from ((key, value) for value in values)
            else:
                yield key, values[0]
    
    def values(self):
        return (values[0] for values in super().values())
    
    # Remaining overridden implementations of methods
    # inherited from dict are elided for brevity.


cd1 = ChildDict1()
assert dict(cd1) == {}
cd1[1] = "one"
cd1.add(1, "uno")
assert cd1.getall(1) == ["one", "uno"]
assert list(cd1.items()) == [(1, "one")]
assert list(cd1.values()) == [ "one"]
assert dict(cd1) == {1: "one"}, cd1  # XXX
```

If you run the above as-is, the line marked "XXX" will trigger an AssertionError demonstrating the unexpected behavior. If you change SHOW_BUG to False, it won’t.

Is it intended that toggling the value of SHOW_BUG in this code causes different results?

You can visit https://repl.it/@jab/dict-subclass-copy-surprise to run the examples above directly in your browser.
msg387198 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2021-02-18 01:43
The cause is in dict_merge (see here: https://github.com/python/cpython/blob/master/Objects/dictobject.c ); it has a fast path for when the object being merged in (which is what the dict constructor does; it makes an empty dict, then merges the provided dict-like) is:

1. A dict (or subclass thereof)
2. Which has not overridden __iter__

When that's the case, it assumes it's "dict-compatible" and performs the merge with heavy use of dict-internals. When it's not the case (as in your simple wrapper), it calls .keys() on the object, iterates that, and uses it to pull values via bracket lookup-equivalent code.

I assume the choice of testing __iter__ (really, the C slot for tp_iter, which is equivalent) is for performance; it's more expensive to check if keys was overridden and/or if the __getitem__ implementation (of which there is more than one possibility for slots at the C layer) has been overridden.

What the code is doing is probably logically wrong, but it's significantly faster than doing it the right way, and easy to work around (if you're writing your own dictionary-like thing with wildly different semantics, collections.abc.MutableMapping is probably a better base class to avoid inheriting dict-specific weirdness), so it's probably not worth fixing. Leaving open for others to discuss.
msg387206 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-02-18 09:37
The choice of testing __iter__ was because OrderedDict overrides __iter__, and since dicts are ordered now, the order of insertion does matter.
msg387279 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2021-02-19 02:54
I am not sure this should be fixed.

If so, I think we should use PyDict_CheckExact() instead of PyDict_Check() && (tp_iter is overridden || keys() is overridden || sq_item is overridden).
History
Date User Action Args
2021-02-19 02:54:18methanesetnosy: + rhettinger
messages: + msg387279
2021-02-18 09:37:14serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg387206
2021-02-18 08:58:05pgjonessetnosy: + pgjones
2021-02-18 06:05:24methanesetnosy: + methane
2021-02-18 01:43:42josh.rsetnosy: + josh.r
messages: + msg387198
2021-02-17 22:12:32jabcreate