Title: OrderedDict has strange behaviour when dict.__setitem__ is used.
Created on 2015-07-26 10:20 by Mark.Shannon, last changed 2022-04-11 14:58 by admin.

Setting an item in an ordered dict via dict.__setitem__, or by using it as an object dictionary and setting an attribute on that object, creates a dictionary whose repr is:


Test case attached.
Linking related issues and
Attached revised file that runs to completion on 2.7 and 3.x.
Marco, #-prefixed issue numbers like this, #24721, #24667, and #24685, are easier to read.
There is a bug in _PyObject_GenericSetAttrWithDict() Objects/object.c   where a calls are made to PyDict_SetItem() and PyDict_DelItem() without checking first checking for PyDict_CheckExact(). 

* In PEP 372, OrderedDict was consciously specified to be a subclass of regular dicts in order to improve substitutability for dicts in most existing code.  That decision had some negative consequences as well.  It is unavoidable the someone can call the parent class directly and undermine the invariants of the subclass (that is a fact of life for all subclasses that maintain their own state while trying to stay in-sync with state in the parent class -- see for an example).

With pure python code for the subclass, we say, "don't do that". I'll add a note to that effect in the docs for the OD (that said, it is a general rule that applies to all subclasses that have to stay synchronized to state in the parent).

In C version of the OD subclass, we still can't avoid being bypassed (see and having our subclass invariants violated.  Though the C code can't prevent the invariants from being scrambled it does have an obligation to not segfault and to not leak something like "OrderedDict([<NULL>])".  Ideally, if is possible to detect an invalid state (i.e. the linked link being out of sync with the inherited dict), then a RuntimeError or somesuch should be raised.
FTR, this will likely involve more than just fixing odict_repr().
__repr__() allocates a list with the size len(od) and fills it iterating linked list. If the size of linked list is less then the size of the dict, the rest of the list is not initialized.

Even worse things happened when the size of linked list is greater then the size of the dict. Following example causes a crash:

from collections import OrderedDict
od = OrderedDict()
class K(str):
    def __hash__(self):
        return 1

od[K('a')] = 1
od[K('b')] = 2
print(len(od), len(list(od)))
K.__eq__ = lambda self, other: True
dict.__delitem__(od, K('a'))
print(len(od), len(list(od)))

Proposed patch fixes both issues.
Review posted. Aside from a couple minor comments, LGTM.  Thanks for doing this.

Incidentally, it should be possible to auto-detect independent changes to the underlying dict and sync the odict with those changes.  However, doing so likely isn't worth it.
Thanks for your review Eric.

test_delitem_2 was not added because it fails in just added TestCase for COrderedDict subclass. Added tests for direct calls of other dict methods as Eric suggested.

During writing new tests for direct calls of other dict methods I found yet one bug. Following code makes Python to hang and eat memory.

from collections import OrderedDict
od = OrderedDict()
for i in range(10):
    od[str(i)] = i

for i in range(9):
    dict.__delitem__(od, str(i))

Here is a patch that fixes an infinite loop reported in msg254071. May be this is not the best solution. It makes the behavior of Python and C implementation differ (the former just iterates a linked list, the latter raises an error). But to reproduce Python implementation behavior we need to add refcounters to linked list nodes.
There may still be some holes still remaining in OrderedDict but it doesn't seem to have been relevant in practice and will become even less so now that regular dicts are ordered and compact.  

If an issue does are arise with someone setting OrderedDict values via dict.__setitem__ we should probably just document "don't do that" rather than performing brain surgery on the current implementation which was known in advance to be vulnerable to exactly this sort of trickery.

If there are no objections, I recommend closing this as out-of-date.   IMO this would be better than risking introducing new problems are getting the C version further out of sync with the Python version or altering how existing code is working.
See also:
