Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

collections.OrderedDict and weakref.ref raises "refcount is too small" assertion #83959

Closed
leezu mannequin opened this issue Feb 27, 2020 · 21 comments
Closed

collections.OrderedDict and weakref.ref raises "refcount is too small" assertion #83959

leezu mannequin opened this issue Feb 27, 2020 · 21 comments
Assignees
Labels
3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes extension-modules C modules in the Modules dir type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@leezu
Copy link
Mannequin

leezu mannequin commented Feb 27, 2020

BPO 39778
Nosy @tim-one, @rhettinger, @pitrou, @ericsnowcurrently, @leezu, @pablogsal, @miss-islington, @iritkatriel
PRs
  • bpo-39778: Don't traverse weak-reference lists OrderedDict's tp_traverse and tp_clear #18749
  • bpo-39778: Add clarification about tp_traverse and ownership #18754
  • [3.7] bpo-39778: Don't traverse weak-reference lists OrderedDict's tp_traverse and tp_clear (GH-18749) #18755
  • [3.8] bpo-39778: Don't traverse weak-reference lists OrderedDict's tp_traverse and tp_clear (GH-18749) #18756
  • [3.8] bpo-39778: Add clarification about tp_traverse and ownership (GH-18754) #18762
  • [3.7] bpo-39778: Add clarification about tp_traverse and ownership (GH-18754) #18763
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/pablogsal'
    closed_at = <Date 2021-09-20.19:40:09.511>
    created_at = <Date 2020-02-27.21:52:09.045>
    labels = ['extension-modules', '3.7', '3.8', '3.9', 'type-crash']
    title = 'collections.OrderedDict and weakref.ref raises "refcount is too small" assertion'
    updated_at = <Date 2021-09-20.19:40:09.510>
    user = 'https://github.com/leezu'

    bugs.python.org fields:

    activity = <Date 2021-09-20.19:40:09.510>
    actor = 'rhettinger'
    assignee = 'pablogsal'
    closed = True
    closed_date = <Date 2021-09-20.19:40:09.511>
    closer = 'rhettinger'
    components = ['Extension Modules']
    creation = <Date 2020-02-27.21:52:09.045>
    creator = 'leezu'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 39778
    keywords = ['patch']
    message_count = 21.0
    messages = ['362846', '362862', '362866', '363177', '363189', '363197', '363198', '363214', '363219', '363220', '363236', '363237', '363238', '363246', '402173', '402181', '402191', '402192', '402221', '402222', '402275']
    nosy_count = 8.0
    nosy_names = ['tim.peters', 'rhettinger', 'pitrou', 'eric.snow', 'leezu', 'pablogsal', 'miss-islington', 'iritkatriel']
    pr_nums = ['18749', '18754', '18755', '18756', '18762', '18763']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue39778'
    versions = ['Python 3.7', 'Python 3.8', 'Python 3.9']

    @leezu
    Copy link
    Mannequin Author

    leezu mannequin commented Feb 27, 2020

    Below sample program, will raise "Modules/gcmodule.c:110: gc_decref: Assertion "gc_get_refs(g) > 0" failed: refcount is too small" on Python 3.8.2 debug build.
    On 3.7.6 debug build, "Modules/gcmodule.c:277: visit_decref: Assertion `_PyGCHead_REFS(gc) != 0' failed." is raised.

    import collections
    import gc
    import weakref
    
    hooks_dict = collections.OrderedDict()
    hooks_dict_ref = weakref.ref(hooks_dict)
    gc.collect()
    
    print('Hello world')
    

    The complete error message on 3.8.2 debug build is

    Modules/gcmodule.c:110: gc_decref: Assertion "gc_get_refs(g) > 0" failed: refcount is too small
    Memory block allocated at (most recent call first):
      File "/home/$USER/test.py", line 6
    
    object  : <weakref at 0x7ff788208a70; to 'collections.OrderedDict' at 0x7ff7881fab90>
    type    : weakref
    refcount: 1
    address : 0x7ff788208a70
    Fatal Python error: _PyObject_AssertFailed
    Python runtime state: initialized
    
    Current thread 0x00007ff789f9c080 (most recent call first):
      File "/home/$USER/test.py", line 7 in <module>
    zsh: abort      PYTHONTRACEMALLOC=1 python ~/test.py
    

    @leezu leezu mannequin added topic-C-API 3.8 only security fixes type-crash A hard crash of the interpreter, possibly with a core dump labels Feb 27, 2020
    @tim-one
    Copy link
    Member

    tim-one commented Feb 28, 2020

    Thanks for the succinct example! Although you don't need the print() ;-)

    I can confirm this crashes the same way under a current master build (albeit on Windows 64-bit).

    gc is calculating how many references in the current generation are accounted for by intra-generation references, and gc's visit_decref() is getting called back by odictobject.c's odict_traverse(), on line

        Py_VISIT(od->od_weakreflist);

    gc has found a second pointer to od->od_weakreflist, which is more than its refcount (1) claims exist.

    Python's weakref implementation gives me headaches, so I'm adding Raymond to the nosy list under the hope the problem will be obvious to him.

    @tim-one tim-one added extension-modules C modules in the Modules dir 3.7 (EOL) end of life 3.9 only security fixes and removed topic-C-API labels Feb 28, 2020
    @tim-one
    Copy link
    Member

    tim-one commented Feb 28, 2020

    I'm suspecting that maybe we shouldn't be doing

        Py_VISIT(od->od_weakreflist);

    at all - best I can tell from a quick (non-exhaustive!) scan, the objects in the weakref list aren't incref'ed to begin with. And even if they were, that line would only be looking at the head of the list, ignoring all the non-head weakrefs after the head.

    @pitrou
    Copy link
    Member

    pitrou commented Mar 2, 2020

    Yes, I don't think other weakref-supporting objects traverse the weakreflist in their tp_traverse.

    @pablogsal
    Copy link
    Member

    I concur with Antoine and Tim. The GC already has the machinery to deal with weak references in the correct way (even more after recent bugfixes regarding callbacks). Traversing the weak reference list in incorrect because the object does not "own" the weak references to it, as the weak references can die even if the object is alive. Also, as Tim mentions, the traverse will be called on the head of the list, in the same way if you do object.__weakref__ you will only get the HEAD of the list:

    >>> import weakref
    >>> class A: ...
    >>> a = A()
    >>> w1 = weakref.ref(a)
    >>> w2 = weakref.ref(a, lambda *args: None) # Use a callback to avoid re-using the original weakref
    >>> id(w1)
    4328697104
    >>> id(w2)
    4328758864
    >>> id(a.__weakref__)
    4328697104

    I think that this is not very well documented, as there is no pointers on what should and should not be traversed in https://docs.python.org/3.8/c-api/typeobj.html#c.PyTypeObject.tp_traverse.

    I will prepare a PR to the documentation if everybody agrees and another one removing the traverse unless someone sees that something else is at play.

    @tim-one
    Copy link
    Member

    tim-one commented Mar 2, 2020

    After some thought, I'm sure the diagnosis is correct: the weakref list must be made invisible to gc. That is, simply don't traverse it at all. The crash is exactly what's expected from traversing objects a container doesn't own references to.

    I agree the tp_traverse docs should point out that weakref lists are special this way, but I think the problem is unique to them - can't think of another case where a container points to an object it doesn't own a reference to.

    @pablogsal
    Copy link
    Member

    I agree the tp_traverse docs should point out that weakref lists are special this way, but I think the problem is unique to them - can't think of another case where a container points to an object it doesn't own a reference to.

    Agreed, I was thinking a small warning or something because this is not the first time I see similar errors or users confused about what they should and should not track (with this I mention that we should say that "only objects that are *owned* should be tracked, and then the weakref warning or something similar). I will prepare a PR soon.

    @pablogsal
    Copy link
    Member

    New changeset 0c2b509 by Pablo Galindo in branch 'master':
    bpo-39778: Don't traverse weak-reference lists OrderedDict's tp_traverse and tp_clear (GH-18749)
    0c2b509

    @miss-islington
    Copy link
    Contributor

    New changeset 69ded39 by Miss Islington (bot) in branch '3.7':
    bpo-39778: Don't traverse weak-reference lists OrderedDict's tp_traverse and tp_clear (GH-18749)
    69ded39

    @pablogsal
    Copy link
    Member

    New changeset 9ddcb91 by Pablo Galindo in branch '3.8':
    [3.8] bpo-39778: Don't traverse weak-reference lists OrderedDict's tp_traverse and tp_clear (GH-18749) (GH-18756)
    9ddcb91

    @miss-islington
    Copy link
    Contributor

    New changeset 6df421f by Pablo Galindo in branch 'master':
    bpo-39778: Add clarification about tp_traverse and ownership (GH-18754)
    6df421f

    @miss-islington
    Copy link
    Contributor

    New changeset 72fff60 by Miss Islington (bot) in branch '3.7':
    bpo-39778: Add clarification about tp_traverse and ownership (GH-18754)
    72fff60

    @miss-islington
    Copy link
    Contributor

    New changeset 1827fc3 by Miss Islington (bot) in branch '3.8':
    bpo-39778: Add clarification about tp_traverse and ownership (GH-18754)
    1827fc3

    @vstinner
    Copy link
    Member

    vstinner commented Mar 3, 2020

    It seems like OrderedDict was the only type which visited weak references:

    $ scm.py grep 'VISIT.*weak'
    master: Grep 'VISIT.*weak' -- <4284 filenames>

    ==============================================

    Objects/odictobject.c:1457: Py_VISIT(od->od_weakreflist);

    @iritkatriel
    Copy link
    Member

    I could not reproduce the crash and from the discussion it seems resolved. Is there anything left here?

    @pablogsal
    Copy link
    Member

    I could not reproduce the crash and from the discussion it seems resolved. Is there anything left here?

    No, this should be fixed by now. I just forgot to close the issue. Thanks for the ping!

    @rhettinger
    Copy link
    Contributor

    I'm thinking that edit to tp_dealloc was incorrect. The PyClear call should have been replaced (not removed) with the pattern shown in the C APIs docs ( https://docs.python.org/3/extending/newtypes.html?highlight=pyobject_clearw#weak-reference-support ):

        if (self->weakreflist != NULL)
            PyObject_ClearWeakRefs((PyObject *) self);

    @rhettinger
    Copy link
    Contributor

    Also a test should be added to make sure the weakref callbacks are still occurring.

    @pablogsal
    Copy link
    Member

    I'm thinking that edit to tp_dealloc was incorrect.

    What edit are you referring to? PR 18749 only touches tp_clear and tp_traverse, not tp_dealloc

    @pablogsal
    Copy link
    Member

    Also, the ordered dict dealloc is already doing that:

    cpython/Objects/odictobject.c

    Lines 1372 to 1373 in a856364

    if (self->od_weakreflist != NULL)
    PyObject_ClearWeakRefs((PyObject *)self);

    @rhettinger
    Copy link
    Contributor

    False alarm. I misread the diff. All is well.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes extension-modules C modules in the Modules dir type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants