Reference loss for local classes #78223

kayhayen · 2018-07-04T11:42:51Z

BPO	34042
Nosy	@pitrou, @vstinner, @serhiy-storchaka, @1st1, @pablogsal, @miss-islington
PRs	bpo-34042: Fix dict.copy() to maintain correct total refcount #8119 [3.7] bpo-34042: Fix dict.copy() to maintain correct total refcount (GH-8119) #8143

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2018-07-06.16:41:53.314>
created_at = <Date 2018-07-04.11:42:51.318>
labels = ['interpreter-core', '3.7', 'performance']
title = 'Reference loss for local classes'
updated_at = <Date 2018-07-06.16:41:53.313>
user = 'https://bugs.python.org/kayhayen'

bugs.python.org fields:

activity = <Date 2018-07-06.16:41:53.313>
actor = 'yselivanov'
assignee = 'none'
closed = True
closed_date = <Date 2018-07-06.16:41:53.314>
closer = 'yselivanov'
components = ['Interpreter Core']
creation = <Date 2018-07-04.11:42:51.318>
creator = 'kayhayen'
dependencies = []
files = []
hgrepos = []
issue_num = 34042
keywords = ['patch']
message_count = 16.0
messages = ['321037', '321043', '321090', '321104', '321120', '321122', '321123', '321124', '321128', '321129', '321130', '321159', '321178', '321179', '321182', '321183']
nosy_count = 7.0
nosy_names = ['pitrou', 'vstinner', 'kayhayen', 'serhiy.storchaka', 'yselivanov', 'pablogsal', 'miss-islington']
pr_nums = ['8119', '8143']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'resource usage'
url = 'https://bugs.python.org/issue34042'
versions = ['Python 3.7']

kayhayen · 2018-07-04T11:42:51Z

I have a test in Nuitka, designed to detect reference counting problems with Python code. It basically takes a snapshot of the refcount, runs a function until it stabilizes, then says PASS, or else reports the last diff. Obviously for CPython it's supposed to pass.

Testing with self compiled 3.7.0 and 3.7.0-1 as in Debian testing (buster) currently, this happens for me:

This is a cutout, there are more than 100 functions, I am listing the ones that report:

simpleFunction16: FAILED 118414 118412 leaked -2
simpleFunction17: FAILED 118395 118393 leaked -2
simpleFunction18: FAILED 118375 118373 leaked -2
...

simpleFunction21: FAILED 118337 118333 leaked -4
...
simpleFunction25: FAILED 118315 118313 leaked -2
simpleFunction26: FAILED 118295 118293 leaked -2
...
simpleFunction38: FAILED 118257 118253 leaked -4
simpleFunction39: FAILED 118235 118233 leaked -2
...
simpleFunction43: FAILED 118215 118213 leaked -2
simpleFunction48: FAILED 118195 118193 leaked -2
...
simpleFunction76: FAILED 118422 118418 leaked -4
...
simpleFunction88: FAILED 118400 118398 leaked -2

This is really bad, because normally values are positive, merely preventing a release. A negative value indicates that references have been lost. This will normally result in corruption of the memory allocator for Python, although I have not yet seen that. I have a few remaining cases, where compiled code causes negative leaks too, there is happens. But I didn't force the issue yet.

Notice that this is of course with the debug Python version and let me express, pure CPython is used to run the test code.

When compiling with Nuitka and 3.7, my private code gives the same ref counts for there, but it also pretty much does
the same things, these are some of the functions:

def simpleFunction16():
    class EmptyClass:
        pass

    return EmptyClass

def simpleFunction39():
    class Parent(object):
        pass

All the other cases also use locally defined classes and only test cases using local classes are failing for me here.

To reproduce is easy:

git clone --branch develop http://git.nuitka.net/Nuitka.git
python3.7-dbg Nuitka/tests/basics/Referencing.py

Thanks,
Kay Hayen

serhiy-storchaka · 2018-07-04T13:29:17Z

The simplest reproducer:

import gc, sys
for i in range(100):
    class A:
        pass
    del A
    c = gc.collect()
    del c
    print(sys.gettotalrefcount())

kayhayen · 2018-07-05T08:55:09Z

Just to confirm, this is also happening on Windows as well, with both 32 and 64 bits.

pablogsal · 2018-07-05T14:50:12Z

I ran git bisect with @serhiy reproducer and it yielded commit 3929499 as the first bad commit that has this problem. I did not look more into that but the bisect is reproducible (yields the same commit if you run it several times).

pablogsal · 2018-07-05T17:15:55Z

I investigated that commit but I cannot find anything :(.

pablogsal · 2018-07-05T18:03:59Z

I think I found it. After doing the bisect manually (as redirecting stdout/stderr) seems to affect somehow the test) I found commit b0a7a03 as the culprit. Reverting this commit on the current master seem to solve the problem. This is the diff of the revert:

diff --git a/Objects/dictobject.c b/Objects/dictobject.c
index 40d7d8af6e..f7d792232e 100644
--- a/Objects/dictobject.c
+++ b/Objects/dictobject.c
@@ -613,52 +613,6 @@ new_dict_with_shared_keys(PyDictKeysObject *keys)
return new_dict(keys, values);
}

-
-static PyObject *
-clone_combined_dict(PyDictObject *orig)
-{
-    assert(PyDict_CheckExact(orig));
-    assert(orig->ma_values == NULL);
-    assert(orig->ma_keys->dk_refcnt == 1);
-
-    Py_ssize_t keys_size = _PyDict_KeysSize(orig->ma_keys);
-    PyDictKeysObject *keys = PyObject_Malloc(keys_size);
-    if (keys == NULL) {
-        PyErr_NoMemory();
-        return NULL;
-    }
-
-    memcpy(keys, orig->ma_keys, keys_size);
-
-    /* After copying key/value pairs, we need to incref all
-       keys and values and they are about to be co-owned by a
-       new dict object. */
-    PyDictKeyEntry *ep0 = DK_ENTRIES(keys);
-    Py_ssize_t n = keys->dk_nentries;
-    for (Py_ssize_t i = 0; i < n; i++) {
-        PyDictKeyEntry *entry = &ep0[i];
-        PyObject *value = entry->me_value;
-        if (value != NULL) {
-            Py_INCREF(value);
-            Py_INCREF(entry->me_key);
-        }
-    }
-
-    PyDictObject *new = (PyDictObject *)new_dict(keys, NULL);
-    if (new == NULL) {
-        /* In case of an error, `new_dict()` takes care of
-           cleaning up `keys`. */
-        return NULL;
-    }
-    new->ma_used = orig->ma_used;
-    assert(_PyDict_CheckConsistency(new));
-    if (_PyObject_GC_IS_TRACKED(orig)) {
-        /* Maintain tracking. */
-        _PyObject_GC_TRACK(new);
-    }
-    return (PyObject *)new;
-}
-
 PyObject *
 PyDict_New(void)
 {
@@ -2527,13 +2481,7 @@ PyDict_Copy(PyObject *o)
         PyErr_BadInternalCall();
         return NULL;
     }
-
     mp = (PyDictObject *)o;
-    if (mp->ma_used == 0) {
-        /* The dict is empty; just return a new dict. */
-        return PyDict_New();
-    }
-
     if (_PyDict_HasSplitTable(mp)) {
         PyDictObject *split_copy;
         Py_ssize_t size = USABLE_FRACTION(DK_SIZE(mp->ma_keys));
@@ -2560,27 +2508,6 @@ PyDict_Copy(PyObject *o)
             _PyObject_GC_TRACK(split_copy);
         return (PyObject *)split_copy;
     }
-
-    if (PyDict_CheckExact(mp) && mp->ma_values == NULL &&
-            (mp->ma_used >= (mp->ma_keys->dk_nentries * 2) / 3))
-    {
-        /* Use fast-copy if:
-
-           (1) 'mp' is an instance of a subclassed dict; and
-
-           (2) 'mp' is not a split-dict; and
-
-           (3) if 'mp' is non-compact ('del' operation does not resize dicts),
-               do fast-copy only if it has at most 1/3 non-used keys.
-
-           The last condition (3) is important to guard against a pathological
-           case when a large dict is almost emptied with multiple del/pop
-           operations and copied after that.  In cases like this, we defer to
-           PyDict_Merge, which produces a compacted copy.
-        */
-        return clone_combined_dict(mp);
-    }
-
     copy = PyDict_New();
     if (copy == NULL)
         return NULL;

pitrou · 2018-07-05T18:04:26Z

I've bisected myself and the culprit seems to be b0a7a03 ("""bpo-31179: Make dict.copy() up to 5.5 times faster.""").

pablogsal · 2018-07-05T18:07:02Z

@antoine Wow, it seems that we both foundb0a7a037b8fde56b62f886d5188bced7776777b4 with one minute of difference :D

I added Yuri to the noise list as is the author of that PR.

1st1 · 2018-07-05T19:43:53Z

This isn't a real reference bug, but rather a bug in total refs accountability. It seems that I missed the fact that we track refs to the keys table with a DK_INCREF macro.

The new clone_combined_dict uses memcpy to clone the keys table (along with its dk_refcnt field, but it doesn't register the fact that we have a new keys table after copying. The bug can be solved with the following diff:

diff --git a/Objects/dictobject.c b/Objects/dictobject.c
index 7a1bcebec6..3ac6a54415 100644
--- a/Objects/dictobject.c
+++ b/Objects/dictobject.c
@@ -656,6 +656,7 @@ clone_combined_dict(PyDictObject *orig)
         /* Maintain tracking. */
         _PyObject_GC_TRACK(new);
     }
+    _Py_INC_REFTOTAL;
     return (PyObject *)new;
 }

I don't think this is a critical-level bug that needs an emergency 3.7.1 release, but I'll submit a PR right now. Would appreciate if you guys can review it.

pitrou · 2018-07-05T19:44:53Z

Agreed, if it was a real reference bug, the interpreter should crash before the total reference count gets negative ;-)

1st1 · 2018-07-05T19:54:20Z

Agreed, if it was a real reference bug, the interpreter should crash before the total reference count gets negative ;-)

First thing I checked with Serhiy's script :)

A PR with a fix: #8119

kayhayen · 2018-07-06T09:03:44Z

Hello,

so it's harmless and it explains the other reference counting issue, where a change in call convention could make a reference counting bug show or go away:

codecs.open(TESTFN, encoding='cp949')
This was showing it, where as
codecs.open(TESTFN, "rb", 'cp949')

was not. With all the difference being a dictionary passed into a function call or not. I am now assuming that getting your fix this will also go away. Caused me a bit of head scratching already.

If you could do what you often you, and make this what distributions like Debian pull from, it would be good enough in terms of release for me, as it blocks Nuitka tests from passing on them.

Yours,
Kay

1st1 · 2018-07-06T16:03:59Z

If you could do what you often you, and make this what distributions like Debian pull from, it would be good enough in terms of release for me, as it blocks Nuitka tests from passing on them.

I'm not sure what you mean by "what you often do". I'll push a fix to master and 3.7 branches in a few minutes, so perhaps you can open a PR to ask Linux vendors to add a patch for 3.7.0 to their packages. An easier solution would be for you to disable refcounting tests that rely on gettotalrefcount just for 3.7.0. And I apologize for the hassle :(

1st1 · 2018-07-06T16:20:11Z

New changeset 0b75228 by Yury Selivanov in branch 'master':
bpo-34042: Fix dict.copy() to maintain correct total refcount (GH-8119)
0b75228

miss-islington · 2018-07-06T16:40:20Z

New changeset 127bd9b by Miss Islington (bot) in branch '3.7':
bpo-34042: Fix dict.copy() to maintain correct total refcount (GH-8119)
127bd9b

1st1 · 2018-07-06T16:41:53Z

Thank you for reporting the issue, Kay. And huge thanks to Antoine and Pablo for bisecting.

kayhayen mannequin added the 3.7 (EOL) end of life label Jul 4, 2018

serhiy-storchaka added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Jul 4, 2018

1st1 closed this as completed Jul 6, 2018

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reference loss for local classes #78223

Reference loss for local classes #78223

kayhayen mannequin commented Jul 4, 2018

kayhayen mannequin commented Jul 4, 2018

serhiy-storchaka commented Jul 4, 2018

kayhayen mannequin commented Jul 5, 2018

pablogsal commented Jul 5, 2018

pablogsal commented Jul 5, 2018

pablogsal commented Jul 5, 2018

pitrou commented Jul 5, 2018

pablogsal commented Jul 5, 2018

1st1 commented Jul 5, 2018

pitrou commented Jul 5, 2018

1st1 commented Jul 5, 2018

kayhayen mannequin commented Jul 6, 2018

1st1 commented Jul 6, 2018

1st1 commented Jul 6, 2018

miss-islington commented Jul 6, 2018

1st1 commented Jul 6, 2018

Reference loss for local classes #78223

Reference loss for local classes #78223

Comments

kayhayen mannequin commented Jul 4, 2018

kayhayen mannequin commented Jul 4, 2018

serhiy-storchaka commented Jul 4, 2018

kayhayen mannequin commented Jul 5, 2018

pablogsal commented Jul 5, 2018

pablogsal commented Jul 5, 2018

pablogsal commented Jul 5, 2018

pitrou commented Jul 5, 2018

pablogsal commented Jul 5, 2018

1st1 commented Jul 5, 2018

pitrou commented Jul 5, 2018

1st1 commented Jul 5, 2018

kayhayen mannequin commented Jul 6, 2018

1st1 commented Jul 6, 2018

1st1 commented Jul 6, 2018

miss-islington commented Jul 6, 2018

1st1 commented Jul 6, 2018