Issue 43246: Dict copy optimization violates subclass invariant

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/87412

classification

Title:	Dict copy optimization violates subclass invariant
Type:	behavior	Stage:
Components:		Versions:

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	jab, josh.r, methane, pgjones, rhettinger, serhiy.storchaka
Priority:	normal	Keywords:

Created on 2021-02-17 22:12 by jab, last changed 2022-04-11 14:59 by admin.

Messages (4)
msg387191 - (view)	Author: Joshua Bronson (jab) *	Date: 2021-02-17 22:12
If I understand correctly, it should be an invariant that in the code below, for all "Parent" classes, for all "method"s, Child1.method should return the same result as Child2.method: ``` class Parent: def __init__(self, value): self._value = value def method(self): return self._value class Child1(Parent): pass c1 = Child1(42) result = c1.method() assert result == 42, result class Child2(Parent): def method(self): return super().method() c2 = Child2(42) result = c2.method() assert result == 42, result ``` But when "Parent" is "dict" and method is "__iter__", that is not the case: ``` SHOW_BUG = True class ChildDict1(dict): """Simplification of werkzeug.datastructures.MultiDict.""" def __init__(self): pass if not SHOW_BUG: def __iter__(self): return super().__iter__() def add(self, key, value): self.setdefault(key, []).append(value) def __setitem__(self, key, value): """Like add, but removes any existing key first.""" super().__setitem__(key, [value]) def getall(self, key) -> list: return super().__getitem__(key) def __getitem__(self, key): """Return the first value for this key.""" return self.getall(key)[0] def items(self, multi=False): for (key, values) in super().items(): if multi: yield from ((key, value) for value in values) else: yield key, values[0] def values(self): return (values[0] for values in super().values()) # Remaining overridden implementations of methods # inherited from dict are elided for brevity. cd1 = ChildDict1() assert dict(cd1) == {} cd1[1] = "one" cd1.add(1, "uno") assert cd1.getall(1) == ["one", "uno"] assert list(cd1.items()) == [(1, "one")] assert list(cd1.values()) == [ "one"] assert dict(cd1) == {1: "one"}, cd1 # XXX ``` If you run the above as-is, the line marked "XXX" will trigger an AssertionError demonstrating the unexpected behavior. If you change SHOW_BUG to False, it won’t. Is it intended that toggling the value of SHOW_BUG in this code causes different results? You can visit https://repl.it/@jab/dict-subclass-copy-surprise to run the examples above directly in your browser.
msg387198 - (view)	Author: Josh Rosenberg (josh.r) *	Date: 2021-02-18 01:43
The cause is in dict_merge (see here: https://github.com/python/cpython/blob/master/Objects/dictobject.c ); it has a fast path for when the object being merged in (which is what the dict constructor does; it makes an empty dict, then merges the provided dict-like) is: 1. A dict (or subclass thereof) 2. Which has not overridden __iter__ When that's the case, it assumes it's "dict-compatible" and performs the merge with heavy use of dict-internals. When it's not the case (as in your simple wrapper), it calls .keys() on the object, iterates that, and uses it to pull values via bracket lookup-equivalent code. I assume the choice of testing __iter__ (really, the C slot for tp_iter, which is equivalent) is for performance; it's more expensive to check if keys was overridden and/or if the __getitem__ implementation (of which there is more than one possibility for slots at the C layer) has been overridden. What the code is doing is probably logically wrong, but it's significantly faster than doing it the right way, and easy to work around (if you're writing your own dictionary-like thing with wildly different semantics, collections.abc.MutableMapping is probably a better base class to avoid inheriting dict-specific weirdness), so it's probably not worth fixing. Leaving open for others to discuss.
msg387206 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2021-02-18 09:37
The choice of testing __iter__ was because OrderedDict overrides __iter__, and since dicts are ordered now, the order of insertion does matter.
msg387279 - (view)	Author: Inada Naoki (methane) *	Date: 2021-02-19 02:54
I am not sure this should be fixed. If so, I think we should use PyDict_CheckExact() instead of PyDict_Check() && (tp_iter is overridden \|\| keys() is overridden \|\| sq_item is overridden).

History
Date	User	Action	Args
2022-04-11 14:59:41	admin	set	github: 87412
2021-02-19 02:54:18	methane	set	nosy: + rhettinger messages: + msg387279
2021-02-18 09:37:14	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg387206
2021-02-18 08:58:05	pgjones	set	nosy: + pgjones
2021-02-18 06:05:24	methane	set	nosy: + methane
2021-02-18 01:43:42	josh.r	set	nosy: + josh.r messages: + msg387198
2021-02-17 22:12:32	jab	create