classification
Title: Reference loss for local classes
Type: resource usage Stage: resolved
Components: Interpreter Core Versions: Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: kayhayen, miss-islington, pablogsal, pitrou, serhiy.storchaka, vstinner, yselivanov
Priority: normal Keywords: patch

Created on 2018-07-04 11:42 by kayhayen, last changed 2018-07-06 16:41 by yselivanov. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 8119 merged yselivanov, 2018-07-05 19:52
PR 8143 merged miss-islington, 2018-07-06 16:21
Messages (16)
msg321037 - (view) Author: Kay Hayen (kayhayen) Date: 2018-07-04 11:42
I have a test in Nuitka, designed to detect reference counting problems with Python code. It basically takes a snapshot of the refcount, runs a function until it stabilizes, then says PASS, or else reports the last diff. Obviously for CPython it's supposed to pass. 

Testing with self compiled 3.7.0 and 3.7.0-1 as in Debian testing (buster) currently, this happens for me:

This is a cutout, there are more than 100 functions, I am listing the ones that report:

simpleFunction16: FAILED 118414 118412 leaked -2
simpleFunction17: FAILED 118395 118393 leaked -2
simpleFunction18: FAILED 118375 118373 leaked -2
...

simpleFunction21: FAILED 118337 118333 leaked -4
...
simpleFunction25: FAILED 118315 118313 leaked -2
simpleFunction26: FAILED 118295 118293 leaked -2
...
simpleFunction38: FAILED 118257 118253 leaked -4
simpleFunction39: FAILED 118235 118233 leaked -2
...
simpleFunction43: FAILED 118215 118213 leaked -2
simpleFunction48: FAILED 118195 118193 leaked -2
...
simpleFunction76: FAILED 118422 118418 leaked -4
...
simpleFunction88: FAILED 118400 118398 leaked -2

This is really bad, because normally values are positive, merely preventing a release. A negative value indicates that references have been lost. This will normally result in corruption of the memory allocator for Python, although I have not yet seen that. I have a few remaining cases, where compiled code causes negative leaks too, there is happens. But I didn't force the issue yet.

Notice that this is of course with the debug Python version and let me express, pure CPython is used to run the test code.

When compiling with Nuitka and 3.7, my private code gives the same ref counts for there, but it also pretty much does
the same things, these are some of the functions:

def simpleFunction16():
    class EmptyClass:
        pass

    return EmptyClass

def simpleFunction39():
    class Parent(object):
        pass

All the other cases also use locally defined classes and only test cases using local classes are failing for me here.

To reproduce is easy:

git clone --branch develop http://git.nuitka.net/Nuitka.git
python3.7-dbg Nuitka/tests/basics/Referencing.py

Thanks,
Kay Hayen
msg321043 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-07-04 13:29
The simplest reproducer:

import gc, sys
for i in range(100):
    class A:
        pass
    del A
    c = gc.collect()
    del c
    print(sys.gettotalrefcount())
msg321090 - (view) Author: Kay Hayen (kayhayen) Date: 2018-07-05 08:55
Just to confirm, this is also happening on Windows as well, with both 32 and 64 bits.
msg321104 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2018-07-05 14:50
I ran git bisect with @Serhiy reproducer and it yielded commit 3929499914d47365ae744df312e16da8955c90ac as the first bad commit that has this problem. I did not look more into that but the bisect is reproducible (yields the same commit if you run it several times).
msg321120 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2018-07-05 17:15
I investigated that commit but I cannot find anything :(.
msg321122 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2018-07-05 18:03
I think I found it. After doing the bisect manually (as redirecting stdout/stderr) seems to affect somehow the test) I found commit b0a7a037b8fde56b62f886d5188bced7776777b4 as the culprit. Reverting this commit on the current master seem to solve the problem. This is the diff of the revert:

diff --git a/Objects/dictobject.c b/Objects/dictobject.c
index 40d7d8af6e..f7d792232e 100644
--- a/Objects/dictobject.c
+++ b/Objects/dictobject.c
@@ -613,52 +613,6 @@ new_dict_with_shared_keys(PyDictKeysObject *keys)
     return new_dict(keys, values);
 }

-
-static PyObject *
-clone_combined_dict(PyDictObject *orig)
-{
-    assert(PyDict_CheckExact(orig));
-    assert(orig->ma_values == NULL);
-    assert(orig->ma_keys->dk_refcnt == 1);
-
-    Py_ssize_t keys_size = _PyDict_KeysSize(orig->ma_keys);
-    PyDictKeysObject *keys = PyObject_Malloc(keys_size);
-    if (keys == NULL) {
-        PyErr_NoMemory();
-        return NULL;
-    }
-
-    memcpy(keys, orig->ma_keys, keys_size);
-
-    /* After copying key/value pairs, we need to incref all
-       keys and values and they are about to be co-owned by a
-       new dict object. */
-    PyDictKeyEntry *ep0 = DK_ENTRIES(keys);
-    Py_ssize_t n = keys->dk_nentries;
-    for (Py_ssize_t i = 0; i < n; i++) {
-        PyDictKeyEntry *entry = &ep0[i];
-        PyObject *value = entry->me_value;
-        if (value != NULL) {
-            Py_INCREF(value);
-            Py_INCREF(entry->me_key);
-        }
-    }
-
-    PyDictObject *new = (PyDictObject *)new_dict(keys, NULL);
-    if (new == NULL) {
-        /* In case of an error, `new_dict()` takes care of
-           cleaning up `keys`. */
-        return NULL;
-    }
-    new->ma_used = orig->ma_used;
-    assert(_PyDict_CheckConsistency(new));
-    if (_PyObject_GC_IS_TRACKED(orig)) {
-        /* Maintain tracking. */
-        _PyObject_GC_TRACK(new);
-    }
-    return (PyObject *)new;
-}
-
 PyObject *
 PyDict_New(void)
 {
@@ -2527,13 +2481,7 @@ PyDict_Copy(PyObject *o)
         PyErr_BadInternalCall();
         return NULL;
     }
-
     mp = (PyDictObject *)o;
-    if (mp->ma_used == 0) {
-        /* The dict is empty; just return a new dict. */
-        return PyDict_New();
-    }
-
     if (_PyDict_HasSplitTable(mp)) {
         PyDictObject *split_copy;
         Py_ssize_t size = USABLE_FRACTION(DK_SIZE(mp->ma_keys));
@@ -2560,27 +2508,6 @@ PyDict_Copy(PyObject *o)
             _PyObject_GC_TRACK(split_copy);
         return (PyObject *)split_copy;
     }
-
-    if (PyDict_CheckExact(mp) && mp->ma_values == NULL &&
-            (mp->ma_used >= (mp->ma_keys->dk_nentries * 2) / 3))
-    {
-        /* Use fast-copy if:
-
-           (1) 'mp' is an instance of a subclassed dict; and
-
-           (2) 'mp' is not a split-dict; and
-
-           (3) if 'mp' is non-compact ('del' operation does not resize dicts),
-               do fast-copy only if it has at most 1/3 non-used keys.
-
-           The last condition (3) is important to guard against a pathological
-           case when a large dict is almost emptied with multiple del/pop
-           operations and copied after that.  In cases like this, we defer to
-           PyDict_Merge, which produces a compacted copy.
-        */
-        return clone_combined_dict(mp);
-    }
-
     copy = PyDict_New();
     if (copy == NULL)
         return NULL;
msg321123 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-07-05 18:04
I've bisected myself and the culprit seems to be b0a7a037b8fde56b62f886d5188bced7776777b4 ("""bpo-31179: Make dict.copy() up to 5.5 times faster.""").
msg321124 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2018-07-05 18:07
@Antoine Wow, it seems that we both  foundb0a7a037b8fde56b62f886d5188bced7776777b4 with one minute of difference :D

I added Yuri to the noise list as is the author of that PR.
msg321128 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-07-05 19:43
This isn't a real reference bug, but rather a bug in total refs accountability.  It seems that I missed the fact that we track refs to the keys table with a DK_INCREF macro.

The new `clone_combined_dict` uses `memcpy` to clone the keys table (along with its `dk_refcnt` field, but it doesn't register the fact that we have a new keys table after copying.  The bug can be solved with the following diff:

diff --git a/Objects/dictobject.c b/Objects/dictobject.c
index 7a1bcebec6..3ac6a54415 100644
--- a/Objects/dictobject.c
+++ b/Objects/dictobject.c
@@ -656,6 +656,7 @@ clone_combined_dict(PyDictObject *orig)
         /* Maintain tracking. */
         _PyObject_GC_TRACK(new);
     }
+    _Py_INC_REFTOTAL;
     return (PyObject *)new;
 }


I don't think this is a critical-level bug that needs an emergency 3.7.1 release, but I'll submit a PR right now. Would appreciate if you guys can review it.
msg321129 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-07-05 19:44
Agreed, if it was a real reference bug, the interpreter should crash before the total reference count gets negative ;-)
msg321130 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-07-05 19:54
> Agreed, if it was a real reference bug, the interpreter should crash before the total reference count gets negative ;-)

First thing I checked with Serhiy's script :)

A PR with a fix: https://github.com/python/cpython/pull/8119
msg321159 - (view) Author: Kay Hayen (kayhayen) Date: 2018-07-06 09:03
Hello,

so it's harmless and it explains the other reference counting issue, where a change in call convention could make a reference counting bug show or go away:

codecs.open(TESTFN, encoding='cp949')
This was showing it, where as
codecs.open(TESTFN, "rb", 'cp949')

was not. With all the difference being a dictionary passed into a function call or not. I am now assuming that getting your fix this will also go away. Caused me a bit of head scratching already.

If you could do what you often you, and make this what distributions like Debian pull from, it would be good enough in terms of release for me, as it blocks Nuitka tests from passing on them.

Yours,
Kay
msg321178 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-07-06 16:03
> If you could do what you often you, and make this what distributions like Debian pull from, it would be good enough in terms of release for me, as it blocks Nuitka tests from passing on them.

I'm not sure what you mean by "what you often do". I'll push a fix to master and 3.7 branches in a few minutes, so perhaps you can open a PR to ask Linux vendors to add a patch for 3.7.0 to their packages.  An easier solution would be for you to disable refcounting tests that rely on gettotalrefcount just for 3.7.0.  And I apologize for the hassle :(
msg321179 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-07-06 16:20
New changeset 0b75228700e0077d8bf2636e74733389514b4b2f by Yury Selivanov in branch 'master':
bpo-34042: Fix dict.copy() to maintain correct total refcount (GH-8119)
https://github.com/python/cpython/commit/0b75228700e0077d8bf2636e74733389514b4b2f
msg321182 - (view) Author: miss-islington (miss-islington) Date: 2018-07-06 16:40
New changeset 127bd9bfd591c8ec1a97eb7f4037c8b884eef973 by Miss Islington (bot) in branch '3.7':
bpo-34042: Fix dict.copy() to maintain correct total refcount (GH-8119)
https://github.com/python/cpython/commit/127bd9bfd591c8ec1a97eb7f4037c8b884eef973
msg321183 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-07-06 16:41
Thank you for reporting the issue, Kay.  And huge thanks to Antoine and Pablo for bisecting.
History
Date User Action Args
2018-07-06 16:41:53yselivanovsetstatus: open -> closed
resolution: fixed
messages: + msg321183

stage: patch review -> resolved
2018-07-06 16:40:19miss-islingtonsetnosy: + miss-islington
messages: + msg321182
2018-07-06 16:21:26miss-islingtonsetpull_requests: + pull_request7715
2018-07-06 16:20:11yselivanovsetmessages: + msg321179
2018-07-06 16:03:58yselivanovsetmessages: + msg321178
2018-07-06 09:03:44kayhayensetmessages: + msg321159
2018-07-05 19:54:19yselivanovsetmessages: + msg321130
2018-07-05 19:52:29yselivanovsetkeywords: + patch
stage: patch review
pull_requests: + pull_request7704
2018-07-05 19:44:53pitrousetmessages: + msg321129
2018-07-05 19:43:52yselivanovsetmessages: + msg321128
2018-07-05 18:07:02pablogsalsetmessages: + msg321124
2018-07-05 18:04:26pitrousetnosy: + yselivanov
messages: + msg321123
2018-07-05 18:03:58pablogsalsetmessages: + msg321122
2018-07-05 17:15:55pablogsalsetmessages: + msg321120
2018-07-05 14:50:12pablogsalsetnosy: + pablogsal
messages: + msg321104
2018-07-05 08:55:09kayhayensetmessages: + msg321090
2018-07-04 13:29:17serhiy.storchakasetnosy: + pitrou, vstinner
messages: + msg321043
2018-07-04 13:07:28serhiy.storchakasetnosy: + serhiy.storchaka
type: resource usage
components: + Interpreter Core
2018-07-04 11:42:51kayhayencreate