Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OrderedDict has strange behaviour when dict.__setitem__ is used. #68914

Closed
markshannon opened this issue Jul 26, 2015 · 16 comments
Closed

OrderedDict has strange behaviour when dict.__setitem__ is used. #68914

markshannon opened this issue Jul 26, 2015 · 16 comments
Labels
extension-modules C modules in the Modules dir type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@markshannon
Copy link
Member

BPO 24726
Nosy @rhettinger, @terryjreedy, @markshannon, @ericsnowcurrently, @serhiy-storchaka, @mpaolini
Files
  • test.py
  • tem2.py
  • odict_repr_after_dict_setitem_delitem.patch
  • odict_delitem_iter_hung.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-08-29.09:39:06.686>
    created_at = <Date 2015-07-26.10:20:52.176>
    labels = ['extension-modules', 'type-crash']
    title = 'OrderedDict has strange behaviour when dict.__setitem__ is used.'
    updated_at = <Date 2019-08-29.09:39:06.685>
    user = 'https://github.com/markshannon'

    bugs.python.org fields:

    activity = <Date 2019-08-29.09:39:06.685>
    actor = 'rhettinger'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-08-29.09:39:06.686>
    closer = 'rhettinger'
    components = ['Extension Modules']
    creation = <Date 2015-07-26.10:20:52.176>
    creator = 'Mark.Shannon'
    dependencies = []
    files = ['40028', '40089', '40834', '41398']
    hgrepos = []
    issue_num = 24726
    keywords = ['patch']
    message_count = 16.0
    messages = ['247421', '247426', '247781', '247782', '247792', '253294', '253310', '254045', '254064', '254069', '254071', '254174', '254178', '256912', '350238', '350258']
    nosy_count = 7.0
    nosy_names = ['rhettinger', 'terry.reedy', 'Mark.Shannon', 'python-dev', 'eric.snow', 'serhiy.storchaka', 'mpaolini']
    pr_nums = []
    priority = 'low'
    resolution = 'wont fix'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue24726'
    versions = ['Python 3.5', 'Python 3.6']

    @markshannon
    Copy link
    Member Author

    Setting an item in an ordered dict via dict.__setitem__, or by using it as an object dictionary and setting an attribute on that object, creates a dictionary whose repr is:

    OrderedDict([<NULL>])

    Test case attached.

    @markshannon markshannon added type-bug An unexpected behavior, bug, or error stdlib Python modules in the Lib dir labels Jul 26, 2015
    @mpaolini
    Copy link
    Mannequin

    mpaolini mannequin commented Jul 26, 2015

    @terryjreedy
    Copy link
    Member

    Attached revised file that runs to completion on 2.7 and 3.x.

    @terryjreedy
    Copy link
    Member

    Marco, #-prefixed issue numbers like this, bpo-24721, bpo-24667, and bpo-24685, are easier to read.

    @rhettinger rhettinger self-assigned this Jul 31, 2015
    @rhettinger
    Copy link
    Contributor

    There is a bug in _PyObject_GenericSetAttrWithDict() Objects/object.c where a calls are made to PyDict_SetItem() and PyDict_DelItem() without checking first checking for PyDict_CheckExact().

    • In PEP-372, OrderedDict was consciously specified to be a subclass of regular dicts in order to improve substitutability for dicts in most existing code. That decision had some negative consequences as well. It is unavoidable the someone can call the parent class directly and undermine the invariants of the subclass (that is a fact of life for all subclasses that maintain their own state while trying to stay in-sync with state in the parent class -- see http://bugs.python.org/msg247358 for an example).

    With pure python code for the subclass, we say, "don't do that". I'll add a note to that effect in the docs for the OD (that said, it is a general rule that applies to all subclasses that have to stay synchronized to state in the parent).

    In C version of the OD subclass, we still can't avoid being bypassed (see http://bugs.python.org/issue10977) and having our subclass invariants violated. Though the C code can't prevent the invariants from being scrambled it does have an obligation to not segfault and to not leak something like "OrderedDict([<NULL>])". Ideally, if is possible to detect an invalid state (i.e. the linked link being out of sync with the inherited dict), then a RuntimeError or somesuch should be raised.

    @ericsnowcurrently
    Copy link
    Member

    FTR, this will likely involve more than just fixing odict_repr().

    @serhiy-storchaka
    Copy link
    Member

    __repr__() allocates a list with the size len(od) and fills it iterating linked list. If the size of linked list is less then the size of the dict, the rest of the list is not initialized.

    Even worse things happened when the size of linked list is greater then the size of the dict. Following example causes a crash:

    from collections import OrderedDict
    od = OrderedDict()
    class K(str):
        def __hash__(self):
            return 1

    od[K('a')] = 1
    od[K('b')] = 2
    print(len(od), len(list(od)))
    K.__eq__ = lambda self, other: True
    dict.__delitem__(od, K('a'))
    print(len(od), len(list(od)))
    print(repr(od))

    Proposed patch fixes both issues.

    @serhiy-storchaka serhiy-storchaka added extension-modules C modules in the Modules dir type-crash A hard crash of the interpreter, possibly with a core dump and removed stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Oct 21, 2015
    @serhiy-storchaka
    Copy link
    Member

    Ping.

    @ericsnowcurrently
    Copy link
    Member

    Review posted. Aside from a couple minor comments, LGTM. Thanks for doing this.

    Incidentally, it should be possible to auto-detect independent changes to the underlying dict and sync the odict with those changes. However, doing so likely isn't worth it.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 4, 2015

    New changeset 88d97cd99d16 by Serhiy Storchaka in branch '3.5':
    Issue bpo-24726: Fixed issue number for previous changeset 59c7615ea921.
    https://hg.python.org/cpython/rev/88d97cd99d16

    New changeset 965109e81ffa by Serhiy Storchaka in branch 'default':
    Issue bpo-24726: Fixed issue number for previous changeset 76e848554b5d.
    https://hg.python.org/cpython/rev/965109e81ffa

    @serhiy-storchaka
    Copy link
    Member

    Thanks for your review Eric.

    test_delitem_2 was not added because it fails in just added TestCase for COrderedDict subclass. Added tests for direct calls of other dict methods as Eric suggested.

    During writing new tests for direct calls of other dict methods I found yet one bug. Following code makes Python to hang and eat memory.

    from collections import OrderedDict
    od = OrderedDict()
    for i in range(10):
        od[str(i)] = i
    
    for i in range(9):
        dict.__delitem__(od, str(i))
    
    list(od)

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 6, 2015

    New changeset 1594c23d8c2f by Serhiy Storchaka in branch '3.5':
    Issue bpo-24726: Revert setting the value on the dict if
    https://hg.python.org/cpython/rev/1594c23d8c2f

    New changeset b391e97ccfe5 by Serhiy Storchaka in branch 'default':
    Issue bpo-24726: Revert setting the value on the dict if
    https://hg.python.org/cpython/rev/b391e97ccfe5

    @serhiy-storchaka
    Copy link
    Member

    Wrong issue. The correct one is bpo-25410.

    @serhiy-storchaka
    Copy link
    Member

    Here is a patch that fixes an infinite loop reported in msg254071. May be this is not the best solution. It makes the behavior of Python and C implementation differ (the former just iterates a linked list, the latter raises an error). But to reproduce Python implementation behavior we need to add refcounters to linked list nodes.

    @rhettinger
    Copy link
    Contributor

    There may still be some holes still remaining in OrderedDict but it doesn't seem to have been relevant in practice and will become even less so now that regular dicts are ordered and compact.

    If an issue does are arise with someone setting OrderedDict values via dict.__setitem__ we should probably just document "don't do that" rather than performing brain surgery on the current implementation which was known in advance to be vulnerable to exactly this sort of trickery.

    If there are no objections, I recommend closing this as out-of-date. IMO this would be better than risking introducing new problems are getting the C version further out of sync with the Python version or altering how existing code is working.

    @rhettinger
    Copy link
    Contributor

    See also: https://bugs.python.org/msg131551

    @rhettinger rhettinger removed their assignment Aug 24, 2019
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants