Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference loss for local classes #78223

Closed
kayhayen mannequin opened this issue Jul 4, 2018 · 16 comments
Closed

Reference loss for local classes #78223

kayhayen mannequin opened this issue Jul 4, 2018 · 16 comments
Labels
3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@kayhayen
Copy link
Mannequin

kayhayen mannequin commented Jul 4, 2018

BPO 34042
Nosy @pitrou, @vstinner, @serhiy-storchaka, @1st1, @pablogsal, @miss-islington
PRs
  • bpo-34042: Fix dict.copy() to maintain correct total refcount #8119
  • [3.7] bpo-34042: Fix dict.copy() to maintain correct total refcount (GH-8119) #8143
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2018-07-06.16:41:53.314>
    created_at = <Date 2018-07-04.11:42:51.318>
    labels = ['interpreter-core', '3.7', 'performance']
    title = 'Reference loss for local classes'
    updated_at = <Date 2018-07-06.16:41:53.313>
    user = 'https://bugs.python.org/kayhayen'

    bugs.python.org fields:

    activity = <Date 2018-07-06.16:41:53.313>
    actor = 'yselivanov'
    assignee = 'none'
    closed = True
    closed_date = <Date 2018-07-06.16:41:53.314>
    closer = 'yselivanov'
    components = ['Interpreter Core']
    creation = <Date 2018-07-04.11:42:51.318>
    creator = 'kayhayen'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 34042
    keywords = ['patch']
    message_count = 16.0
    messages = ['321037', '321043', '321090', '321104', '321120', '321122', '321123', '321124', '321128', '321129', '321130', '321159', '321178', '321179', '321182', '321183']
    nosy_count = 7.0
    nosy_names = ['pitrou', 'vstinner', 'kayhayen', 'serhiy.storchaka', 'yselivanov', 'pablogsal', 'miss-islington']
    pr_nums = ['8119', '8143']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'resource usage'
    url = 'https://bugs.python.org/issue34042'
    versions = ['Python 3.7']

    @kayhayen
    Copy link
    Mannequin Author

    kayhayen mannequin commented Jul 4, 2018

    I have a test in Nuitka, designed to detect reference counting problems with Python code. It basically takes a snapshot of the refcount, runs a function until it stabilizes, then says PASS, or else reports the last diff. Obviously for CPython it's supposed to pass.

    Testing with self compiled 3.7.0 and 3.7.0-1 as in Debian testing (buster) currently, this happens for me:

    This is a cutout, there are more than 100 functions, I am listing the ones that report:

    simpleFunction16: FAILED 118414 118412 leaked -2
    simpleFunction17: FAILED 118395 118393 leaked -2
    simpleFunction18: FAILED 118375 118373 leaked -2
    ...

    simpleFunction21: FAILED 118337 118333 leaked -4
    ...
    simpleFunction25: FAILED 118315 118313 leaked -2
    simpleFunction26: FAILED 118295 118293 leaked -2
    ...
    simpleFunction38: FAILED 118257 118253 leaked -4
    simpleFunction39: FAILED 118235 118233 leaked -2
    ...
    simpleFunction43: FAILED 118215 118213 leaked -2
    simpleFunction48: FAILED 118195 118193 leaked -2
    ...
    simpleFunction76: FAILED 118422 118418 leaked -4
    ...
    simpleFunction88: FAILED 118400 118398 leaked -2

    This is really bad, because normally values are positive, merely preventing a release. A negative value indicates that references have been lost. This will normally result in corruption of the memory allocator for Python, although I have not yet seen that. I have a few remaining cases, where compiled code causes negative leaks too, there is happens. But I didn't force the issue yet.

    Notice that this is of course with the debug Python version and let me express, pure CPython is used to run the test code.

    When compiling with Nuitka and 3.7, my private code gives the same ref counts for there, but it also pretty much does
    the same things, these are some of the functions:

    def simpleFunction16():
        class EmptyClass:
            pass
    
        return EmptyClass
    
    def simpleFunction39():
        class Parent(object):
            pass

    All the other cases also use locally defined classes and only test cases using local classes are failing for me here.

    To reproduce is easy:

    git clone --branch develop http://git.nuitka.net/Nuitka.git
    python3.7-dbg Nuitka/tests/basics/Referencing.py

    Thanks,
    Kay Hayen

    @kayhayen kayhayen mannequin added the 3.7 (EOL) end of life label Jul 4, 2018
    @serhiy-storchaka serhiy-storchaka added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Jul 4, 2018
    @serhiy-storchaka
    Copy link
    Member

    The simplest reproducer:

    import gc, sys
    for i in range(100):
        class A:
            pass
        del A
        c = gc.collect()
        del c
        print(sys.gettotalrefcount())

    @kayhayen
    Copy link
    Mannequin Author

    kayhayen mannequin commented Jul 5, 2018

    Just to confirm, this is also happening on Windows as well, with both 32 and 64 bits.

    @pablogsal
    Copy link
    Member

    I ran git bisect with @serhiy reproducer and it yielded commit 3929499 as the first bad commit that has this problem. I did not look more into that but the bisect is reproducible (yields the same commit if you run it several times).

    @pablogsal
    Copy link
    Member

    I investigated that commit but I cannot find anything :(.

    @pablogsal
    Copy link
    Member

    I think I found it. After doing the bisect manually (as redirecting stdout/stderr) seems to affect somehow the test) I found commit b0a7a03 as the culprit. Reverting this commit on the current master seem to solve the problem. This is the diff of the revert:

    diff --git a/Objects/dictobject.c b/Objects/dictobject.c
    index 40d7d8af6e..f7d792232e 100644
    --- a/Objects/dictobject.c
    +++ b/Objects/dictobject.c
    @@ -613,52 +613,6 @@ new_dict_with_shared_keys(PyDictKeysObject *keys)
    return new_dict(keys, values);
    }

    -
    -static PyObject *
    -clone_combined_dict(PyDictObject *orig)
    -{
    -    assert(PyDict_CheckExact(orig));
    -    assert(orig->ma_values == NULL);
    -    assert(orig->ma_keys->dk_refcnt == 1);
    -
    -    Py_ssize_t keys_size = _PyDict_KeysSize(orig->ma_keys);
    -    PyDictKeysObject *keys = PyObject_Malloc(keys_size);
    -    if (keys == NULL) {
    -        PyErr_NoMemory();
    -        return NULL;
    -    }
    -
    -    memcpy(keys, orig->ma_keys, keys_size);
    -
    -    /* After copying key/value pairs, we need to incref all
    -       keys and values and they are about to be co-owned by a
    -       new dict object. */
    -    PyDictKeyEntry *ep0 = DK_ENTRIES(keys);
    -    Py_ssize_t n = keys->dk_nentries;
    -    for (Py_ssize_t i = 0; i < n; i++) {
    -        PyDictKeyEntry *entry = &ep0[i];
    -        PyObject *value = entry->me_value;
    -        if (value != NULL) {
    -            Py_INCREF(value);
    -            Py_INCREF(entry->me_key);
    -        }
    -    }
    -
    -    PyDictObject *new = (PyDictObject *)new_dict(keys, NULL);
    -    if (new == NULL) {
    -        /* In case of an error, `new_dict()` takes care of
    -           cleaning up `keys`. */
    -        return NULL;
    -    }
    -    new->ma_used = orig->ma_used;
    -    assert(_PyDict_CheckConsistency(new));
    -    if (_PyObject_GC_IS_TRACKED(orig)) {
    -        /* Maintain tracking. */
    -        _PyObject_GC_TRACK(new);
    -    }
    -    return (PyObject *)new;
    -}
    -
     PyObject *
     PyDict_New(void)
     {
    @@ -2527,13 +2481,7 @@ PyDict_Copy(PyObject *o)
             PyErr_BadInternalCall();
             return NULL;
         }
    -
         mp = (PyDictObject *)o;
    -    if (mp->ma_used == 0) {
    -        /* The dict is empty; just return a new dict. */
    -        return PyDict_New();
    -    }
    -
         if (_PyDict_HasSplitTable(mp)) {
             PyDictObject *split_copy;
             Py_ssize_t size = USABLE_FRACTION(DK_SIZE(mp->ma_keys));
    @@ -2560,27 +2508,6 @@ PyDict_Copy(PyObject *o)
                 _PyObject_GC_TRACK(split_copy);
             return (PyObject *)split_copy;
         }
    -
    -    if (PyDict_CheckExact(mp) && mp->ma_values == NULL &&
    -            (mp->ma_used >= (mp->ma_keys->dk_nentries * 2) / 3))
    -    {
    -        /* Use fast-copy if:
    -
    -           (1) 'mp' is an instance of a subclassed dict; and
    -
    -           (2) 'mp' is not a split-dict; and
    -
    -           (3) if 'mp' is non-compact ('del' operation does not resize dicts),
    -               do fast-copy only if it has at most 1/3 non-used keys.
    -
    -           The last condition (3) is important to guard against a pathological
    -           case when a large dict is almost emptied with multiple del/pop
    -           operations and copied after that.  In cases like this, we defer to
    -           PyDict_Merge, which produces a compacted copy.
    -        */
    -        return clone_combined_dict(mp);
    -    }
    -
         copy = PyDict_New();
         if (copy == NULL)
             return NULL;

    @pitrou
    Copy link
    Member

    pitrou commented Jul 5, 2018

    I've bisected myself and the culprit seems to be b0a7a03 ("""bpo-31179: Make dict.copy() up to 5.5 times faster.""").

    @pablogsal
    Copy link
    Member

    @antoine Wow, it seems that we both foundb0a7a037b8fde56b62f886d5188bced7776777b4 with one minute of difference :D

    I added Yuri to the noise list as is the author of that PR.

    @1st1
    Copy link
    Member

    1st1 commented Jul 5, 2018

    This isn't a real reference bug, but rather a bug in total refs accountability. It seems that I missed the fact that we track refs to the keys table with a DK_INCREF macro.

    The new clone_combined_dict uses memcpy to clone the keys table (along with its dk_refcnt field, but it doesn't register the fact that we have a new keys table after copying. The bug can be solved with the following diff:

    diff --git a/Objects/dictobject.c b/Objects/dictobject.c
    index 7a1bcebec6..3ac6a54415 100644
    --- a/Objects/dictobject.c
    +++ b/Objects/dictobject.c
    @@ -656,6 +656,7 @@ clone_combined_dict(PyDictObject *orig)
             /* Maintain tracking. */
             _PyObject_GC_TRACK(new);
         }
    +    _Py_INC_REFTOTAL;
         return (PyObject *)new;
     }

    I don't think this is a critical-level bug that needs an emergency 3.7.1 release, but I'll submit a PR right now. Would appreciate if you guys can review it.

    @pitrou
    Copy link
    Member

    pitrou commented Jul 5, 2018

    Agreed, if it was a real reference bug, the interpreter should crash before the total reference count gets negative ;-)

    @1st1
    Copy link
    Member

    1st1 commented Jul 5, 2018

    Agreed, if it was a real reference bug, the interpreter should crash before the total reference count gets negative ;-)

    First thing I checked with Serhiy's script :)

    A PR with a fix: #8119

    @kayhayen
    Copy link
    Mannequin Author

    kayhayen mannequin commented Jul 6, 2018

    Hello,

    so it's harmless and it explains the other reference counting issue, where a change in call convention could make a reference counting bug show or go away:

    codecs.open(TESTFN, encoding='cp949')
    This was showing it, where as
    codecs.open(TESTFN, "rb", 'cp949')

    was not. With all the difference being a dictionary passed into a function call or not. I am now assuming that getting your fix this will also go away. Caused me a bit of head scratching already.

    If you could do what you often you, and make this what distributions like Debian pull from, it would be good enough in terms of release for me, as it blocks Nuitka tests from passing on them.

    Yours,
    Kay

    @1st1
    Copy link
    Member

    1st1 commented Jul 6, 2018

    If you could do what you often you, and make this what distributions like Debian pull from, it would be good enough in terms of release for me, as it blocks Nuitka tests from passing on them.

    I'm not sure what you mean by "what you often do". I'll push a fix to master and 3.7 branches in a few minutes, so perhaps you can open a PR to ask Linux vendors to add a patch for 3.7.0 to their packages. An easier solution would be for you to disable refcounting tests that rely on gettotalrefcount just for 3.7.0. And I apologize for the hassle :(

    @1st1
    Copy link
    Member

    1st1 commented Jul 6, 2018

    New changeset 0b75228 by Yury Selivanov in branch 'master':
    bpo-34042: Fix dict.copy() to maintain correct total refcount (GH-8119)
    0b75228

    @miss-islington
    Copy link
    Contributor

    New changeset 127bd9b by Miss Islington (bot) in branch '3.7':
    bpo-34042: Fix dict.copy() to maintain correct total refcount (GH-8119)
    127bd9b

    @1st1
    Copy link
    Member

    1st1 commented Jul 6, 2018

    Thank you for reporting the issue, Kay. And huge thanks to Antoine and Pablo for bisecting.

    @1st1 1st1 closed this as completed Jul 6, 2018
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants