classification
Title: OrderedDict has strange behaviour when dict.__setitem__ is used.
Type: crash Stage: patch review
Components: Extension Modules Versions: Python 3.6, Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: Mark.Shannon, eric.snow, mpaolini, python-dev, rhettinger, serhiy.storchaka, terry.reedy
Priority: normal Keywords: patch

Created on 2015-07-26 10:20 by Mark.Shannon, last changed 2015-12-23 14:16 by serhiy.storchaka.

Files
File name Uploaded Description Edit
test.py Mark.Shannon, 2015-07-26 10:20
tem2.py terry.reedy, 2015-07-31 22:29
odict_repr_after_dict_setitem_delitem.patch serhiy.storchaka, 2015-10-21 19:18 review
odict_delitem_iter_hung.patch serhiy.storchaka, 2015-12-23 14:16 review
Messages (14)
msg247421 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2015-07-26 10:20
Setting an item in an ordered dict via dict.__setitem__, or by using it as an object dictionary and setting an attribute on that object, creates a dictionary whose repr is:

OrderedDict([<NULL>])

Test case attached.
msg247426 - (view) Author: Marco Paolini (mpaolini) * Date: 2015-07-26 12:14
Linking related issues http://bugs.python.org/issue24721 http://bugs.python.org/issue24667 and http://bugs.python.org/issue24685
msg247781 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2015-07-31 22:29
Attached revised file that runs to completion on 2.7 and 3.x.
msg247782 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2015-07-31 22:32
Marco, #-prefixed issue numbers like this, #24721, #24667, and #24685, are easier to read.
msg247792 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2015-07-31 23:52
There is a bug in _PyObject_GenericSetAttrWithDict() Objects/object.c   where a calls are made to PyDict_SetItem() and PyDict_DelItem() without checking first checking for PyDict_CheckExact(). 

* In PEP 372, OrderedDict was consciously specified to be a subclass of regular dicts in order to improve substitutability for dicts in most existing code.  That decision had some negative consequences as well.  It is unavoidable the someone can call the parent class directly and undermine the invariants of the subclass (that is a fact of life for all subclasses that maintain their own state while trying to stay in-sync with state in the parent class -- see http://bugs.python.org/msg247358 for an example).

With pure python code for the subclass, we say, "don't do that". I'll add a note to that effect in the docs for the OD (that said, it is a general rule that applies to all subclasses that have to stay synchronized to state in the parent).

In C version of the OD subclass, we still can't avoid being bypassed (see http://bugs.python.org/issue10977) and having our subclass invariants violated.  Though the C code can't prevent the invariants from being scrambled it does have an obligation to not segfault and to not leak something like "OrderedDict([<NULL>])".  Ideally, if is possible to detect an invalid state (i.e. the linked link being out of sync with the inherited dict), then a RuntimeError or somesuch should be raised.
msg253294 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2015-10-21 15:33
FTR, this will likely involve more than just fixing odict_repr().
msg253310 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-10-21 19:18
__repr__() allocates a list with the size len(od) and fills it iterating linked list. If the size of linked list is less then the size of the dict, the rest of the list is not initialized.

Even worse things happened when the size of linked list is greater then the size of the dict. Following example causes a crash:

from collections import OrderedDict
od = OrderedDict()
class K(str):
    def __hash__(self):
        return 1

od[K('a')] = 1
od[K('b')] = 2
print(len(od), len(list(od)))
K.__eq__ = lambda self, other: True
dict.__delitem__(od, K('a'))
print(len(od), len(list(od)))
print(repr(od))

Proposed patch fixes both issues.
msg254045 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-04 10:17
Ping.
msg254064 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2015-11-04 16:48
Review posted. Aside from a couple minor comments, LGTM.  Thanks for doing this.

Incidentally, it should be possible to auto-detect independent changes to the underlying dict and sync the odict with those changes.  However, doing so likely isn't worth it.
msg254069 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-11-04 20:39
New changeset 88d97cd99d16 by Serhiy Storchaka in branch '3.5':
Issue #24726: Fixed issue number for previous changeset 59c7615ea921.
https://hg.python.org/cpython/rev/88d97cd99d16

New changeset 965109e81ffa by Serhiy Storchaka in branch 'default':
Issue #24726: Fixed issue number for previous changeset 76e848554b5d.
https://hg.python.org/cpython/rev/965109e81ffa
msg254071 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-04 21:09
Thanks for your review Eric.

test_delitem_2 was not added because it fails in just added TestCase for COrderedDict subclass. Added tests for direct calls of other dict methods as Eric suggested.

During writing new tests for direct calls of other dict methods I found yet one bug. Following code makes Python to hang and eat memory.

from collections import OrderedDict
od = OrderedDict()
for i in range(10):
    od[str(i)] = i

for i in range(9):
    dict.__delitem__(od, str(i))

list(od)
msg254174 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-11-06 09:21
New changeset 1594c23d8c2f by Serhiy Storchaka in branch '3.5':
Issue #24726: Revert setting the value on the dict if
https://hg.python.org/cpython/rev/1594c23d8c2f

New changeset b391e97ccfe5 by Serhiy Storchaka in branch 'default':
Issue #24726: Revert setting the value on the dict if
https://hg.python.org/cpython/rev/b391e97ccfe5
msg254178 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-06 09:27
Wrong issue. The correct one is issue25410.
msg256912 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-12-23 14:16
Here is a patch that fixes an infinite loop reported in msg254071. May be this is not the best solution. It makes the behavior of Python and C implementation differ (the former just iterates a linked list, the latter raises an error). But to reproduce Python implementation behavior we need to add refcounters to linked list nodes.
History
Date User Action Args
2015-12-23 14:16:28serhiy.storchakasetfiles: + odict_delitem_iter_hung.patch

messages: + msg256912
stage: commit review -> patch review
2015-11-06 09:27:22serhiy.storchakasetmessages: + msg254178
2015-11-06 09:21:56python-devsetmessages: + msg254174
2015-11-04 21:09:06serhiy.storchakasetmessages: + msg254071
2015-11-04 20:39:05python-devsetnosy: + python-dev
messages: + msg254069
2015-11-04 16:48:09eric.snowsetmessages: + msg254064
stage: patch review -> commit review
2015-11-04 10:17:38serhiy.storchakasetmessages: + msg254045
2015-10-21 19:18:33serhiy.storchakasetfiles: + odict_repr_after_dict_setitem_delitem.patch

components: + Extension Modules, - Library (Lib)
versions: - Python 2.7, Python 3.4
keywords: + patch
type: behavior -> crash
messages: + msg253310
stage: test needed -> patch review
2015-10-21 16:04:53serhiy.storchakasetnosy: + serhiy.storchaka
2015-10-21 15:33:24eric.snowsetmessages: + msg253294
2015-07-31 23:52:43rhettingersetmessages: + msg247792
2015-07-31 23:15:07rhettingersetassignee: rhettinger
2015-07-31 22:32:04terry.reedysetmessages: + msg247782
2015-07-31 22:29:22terry.reedysetfiles: + tem2.py
versions: + Python 2.7, Python 3.4, Python 3.5
nosy: + terry.reedy

messages: + msg247781

stage: test needed
2015-07-31 22:24:15terry.reedysetnosy: + rhettinger
2015-07-26 12:14:05mpaolinisetnosy: + mpaolini
messages: + msg247426
2015-07-26 10:20:52Mark.Shannoncreate