classification
Title: Assertion failed in gc with __del__ and weakref
Type: crash Stage: resolved
Components: Interpreter Core Versions: Python 3.7, Python 3.6, Python 3.5, Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Frens Jan Rumph, Yang Ke, asvetlov, balloob, fdrake, guojiahua, gvanrossum, haypo, larry, ned.deily, pitrou, python-dev, serhiy.storchaka, thehesiod, tim.peters, xiang.zhang
Priority: normal Keywords:

Created on 2016-03-23 06:08 by guojiahua, last changed 2017-07-31 18:34 by thehesiod. This issue is now closed.

Files
File name Uploaded Description Edit
crash.py haypo, 2016-03-23 07:58
patch guojiahua, 2016-05-18 15:40
Pull Requests
URL Status Linked Edit
PR 2695 merged serhiy.storchaka, 2017-07-13 12:52
Messages (20)
msg262242 - (view) Author: JIahua Guo (guojiahua) * Date: 2016-03-23 06:08
$ python3.5-dbg crash.py
python3.5-dbg: ../Modules/gcmodule.c:364: update_refs: Assertion `((gc)->gc.gc_refs >> (1)) != 0' failed.


$ python3.5-dbg crash-multithread.py
python3.5-dbg: ../Modules/gcmodule.c:364: update_refs: Assertion `((gc)->gc.gc_refs >> (1)) != 0' failed.


============ crash.py ==========
import weakref
import gc


def callback(o):
    gc.collect()


class C:
    def __del__(self):
        pass

def main():
    c = C()
    cref = weakref.ref(c, callback)
    del c


main()


============ crash-multithread.py ==========
import threading
import weakref
import gc


ei = threading.Event()
eo = threading.Event()


def gc_worker():
    ei.wait()
    gc.collect()
    eo.set()


def callback(o):
    ei.set()
    eo.wait()


class C:
    def __del__(self):
        pass


def main():
    t = threading.Thread(target=gc_worker)
    t.start()
    c = C()
    cref = weakref.ref(c, callback)
    del c
    t.join()


main()
msg262243 - (view) Author: Yang Ke (Yang Ke) Date: 2016-03-23 06:49
The crash may be raised on both Windows OS and Linux. We got crashed under Ubuntu 14.04, while a similar issue was found https://bugs.python.org/issue23187#msg238139 last year under Windows OS.

We actually raise such bug working on a program with asyncio and multithread(use asyncio.BaseEventLoop.run_in_executor with its default threadpool).
The asyncio.Task class creates a weakref.WeakSet(https://hg.python.org/cpython/file/3.5/Lib/asyncio/tasks.py#l38) containing alive tasks. WeakSet is implemented as a list containing weakref.ref with CALLBACK function(https://hg.python.org/cpython/file/3.5/Lib/_weakrefset.py#l84). And that may lead to the situation described in sample crash code:
Once a asyncio.Task.__del__ function is called, and callback of the weakref is called, then the interpreter switches to other thread, which may trigger gc.collect(the thread allocate a huge number of objects, for example), and leaves the task object in gc_list with refcount==0.
msg262245 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-03-23 07:58
Oh, I confirm that the bug occurs on Python 3.5.1+ (3.5 branch) and Python 3.6 (default branch).

Call stack: _Py_Dealloc() => subtype_dealloc() => PyObject_ClearWeakRefs() => handle_callback() of weakref => ... => gc_collect => ... => assertion error

Full gdb tracebacks:

python: Modules/gcmodule.c:364: update_refs: Assertion `((gc)->gc.gc_refs >> (1)) != 0' failed.

Program received signal SIGABRT, Aborted.
0x00007ffff7129a98 in raise () from /lib64/libc.so.6

(gdb) where
#0  0x00007ffff7129a98 in raise () from /lib64/libc.so.6
#1  0x00007ffff712b69a in abort () from /lib64/libc.so.6
#2  0x00007ffff7122227 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007ffff71222d2 in __assert_fail () from /lib64/libc.so.6
#4  0x0000000000441b74 in update_refs (containers=0x939c60 <generations+64>) at Modules/gcmodule.c:364
#5  0x0000000000442cb1 in collect (generation=2, n_collected=0x7fffffff87f0, n_uncollectable=0x7fffffff87e8, nofail=0) at Modules/gcmodule.c:950
#6  0x00000000004432dd in collect_with_callback (generation=2) at Modules/gcmodule.c:1119
#7  0x00000000004434cb in gc_collect (self=<module at remote 0x7ffff05e2a58>, args=(), kws=0x0) at Modules/gcmodule.c:1213
#8  0x00000000004d5731 in PyCFunction_Call (func=<built-in method collect of module object at remote 0x7ffff05e2a58>, args=(), kwds=0x0)
    at Objects/methodobject.c:98
#9  0x00000000005d0fd3 in call_function (pp_stack=0x7fffffff8a18, oparg=0) at Python/ceval.c:4732
#10 0x00000000005c8cc0 in PyEval_EvalFrameEx (f=Frame 0x7ffff7eaa858, for file crash.py, line 7, in callback (o=<weakref at remote 0x7ffff05eb658>), 
    throwflag=0) at Python/ceval.c:3207
#11 0x00000000005ce4ec in _PyEval_EvalCodeWithName (_co=<code at remote 0x7ffff7e8f340>, 
    globals={'__package__': None, '__name__': '__main__', 'gc': <module at remote 0x7ffff05e2a58>, 'weakref': <module at remote 0x7ffff05cae58>, '__builtins__': <module at remote 0x7ffff7f5fc58>, 'C': <type at remote 0xac3978>, '__doc__': None, '__spec__': None, 'main': <function at remote 0x7ffff05c1788>, 'callback': <function at remote 0x7ffff05c1338>, '__loader__': <SourceFileLoader(name='__main__', path='crash.py') at remote 0x7ffff7e7de28>, '__file__': 'crash.py', '__cached__': None}, locals=0x0, args=0x7ffff05d5b78, argcount=1, kws=0x0, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0)
    at Python/ceval.c:4045
#12 0x00000000005ce61e in PyEval_EvalCodeEx (_co=<code at remote 0x7ffff7e8f340>, 
    globals={'__package__': None, '__name__': '__main__', 'gc': <module at remote 0x7ffff05e2a58>, 'weakref': <module at remote 0x7ffff05cae58>, '__builtins__': <module at remote 0x7ffff7f5fc58>, 'C': <type at remote 0xac3978>, '__doc__': None, '__spec__': None, 'main': <function at remote 0x7ffff05c1788>, 'callback': <function at remote 0x7ffff05c1338>, '__loader__': <SourceFileLoader(name='__main__', path='crash.py') at remote 0x7ffff7e7de28>, '__file__': 'crash.py', '__cached__': None}, locals=0x0, args=0x7ffff05d5b78, argcount=1, kws=0x0, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0) at Python/ceval.c:4066
#13 0x0000000000688276 in function_call (func=<function at remote 0x7ffff05c1338>, arg=(<weakref at remote 0x7ffff05eb658>,), kw=0x0)
    at Objects/funcobject.c:604
#14 0x000000000046c4f3 in PyObject_Call (func=<function at remote 0x7ffff05c1338>, arg=(<weakref at remote 0x7ffff05eb658>,), kw=0x0)
    at Objects/abstract.c:2170
#15 0x000000000046d61d in PyObject_CallFunctionObjArgs (callable=<function at remote 0x7ffff05c1338>) at Objects/abstract.c:2450
#16 0x0000000000576063 in handle_callback (ref=0x7ffff05eb658, callback=<function at remote 0x7ffff05c1338>) at Objects/weakrefobject.c:868
#17 0x0000000000576257 in PyObject_ClearWeakRefs (object=<C at remote 0x7ffff05d5ae8>) at Objects/weakrefobject.c:913
#18 0x00000000004f3bee in subtype_dealloc (self=<C at remote 0x7ffff05d5ae8>) at Objects/typeobject.c:1162
#19 0x00000000004dd7d6 in _Py_Dealloc (op=<C at remote 0x7ffff05d5ae8>) at Objects/object.c:1783
(...)

(gdb) py-bt
Traceback (most recent call first):
  Garbage-collecting
  <built-in method collect of module object at remote 0x7ffff05e2a58>
  File "crash.py", line 7, in callback
    gc.collect()
  File "crash.py", line 17, in main
    del c
  File "crash.py", line 20, in <module>
    main()
msg277825 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-10-01 17:56
Ben Bangert reported to me that this crash caused instabilities in an app using asyncio (https://github.com/home-assistant/home-assistant/issues/3453). This hack made his crashes go away:
https://github.com/home-assistant/home-assistant/commit/922dbba8814b81b69471dc4f1bf5c5a3b2bfe4ed

What are the chances of getting the crash fixed in 3.5.3 and 3.6b2?
msg277935 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-10-03 07:59
I don't think that the crash is a release blocker, but I just sent an email to python-dev to ask for reviews.
msg277941 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2016-10-03 09:04
I was unable to reproduce a crash but the patch looks straightforward and obvious.
After applying test suite for aiohttp still works without problems at least.
msg277952 - (view) Author: JIahua Guo (guojiahua) * Date: 2016-10-03 11:28
Hi asvetlov, debug mode of python interpreter should be enabled to reproduce this bug, cause assertion is disabled in release mode.
(https://hg.python.org/cpython/file/104259/Modules/gcmodule.c#l365)

One environment that can reproduce this bug:

$ cat /etc/issue
Ubuntu 14.04.3 LTS \n \l

$ uname -r
3.13.0-32-generic

$ uname -m
x86_64

$ dpkg -s python3.4-dbg | grep Version
Version: 3.4.3-1ubuntu1~14.04.4

$ python3.4-dbg crash.py
python3.4-dbg: ../Modules/gcmodule.c:364: update_refs: Assertion `((gc)->gc.gc_refs >> (1)) != 0' failed.
Aborted
msg278015 - (view) Author: Roundup Robot (python-dev) Date: 2016-10-04 07:01
New changeset c9b7272e2553 by Benjamin Peterson in branch '3.5':
ensure gc tracking is off when invoking weakref callbacks (closes #26617)
https://hg.python.org/cpython/rev/c9b7272e2553

New changeset 520cb70ecb90 by Benjamin Peterson in branch '3.6':
merge 3.5 (#26617)
https://hg.python.org/cpython/rev/520cb70ecb90

New changeset c1d0df056c19 by Benjamin Peterson in branch 'default':
merge 3.6 (#26617)
https://hg.python.org/cpython/rev/c1d0df056c19
msg278642 - (view) Author: Frens Jan Rumph (Frens Jan Rumph) Date: 2016-10-14 11:27
Would be nice if 3.4 could also be patched, not just 3.5 and 3.6, since python in EPEL currently is python34-3.4.3-7.el7.x86_64. The patch can be applied without conflict and resolves some serious cases of segfaults.
msg278644 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-10-14 12:13
> Would be nice if 3.4 could also be patched, not just 3.5 and 3.6, since python in EPEL currently is python34-3.4.3-7.el7.x86_64. The patch can be applied without conflict and resolves some serious cases of segfaults.

Sorry but Python 3.4 is not more supported upstream:
https://docs.python.org/devguide/#status-of-python-branches

You should ask for a downstream backport.
msg278646 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2016-10-14 12:30
> You should ask for a downstream backport.

I created the issue: https://bugzilla.redhat.com/show_bug.cgi?id=1384957
msg298281 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2017-07-13 12:38
I'll accept this as a backport for 3.4.  I'm convinced that, although the sample programs don't crash in release builds, the behavior is wrong and could be exploited.

Will someone have time to create the PR in the next day or two?  (It's not that it's necessary, but it seems that Victor and Serhiy are very quick, which is nice.  And adding it sooner will make it more comfortable to tag and release 3.4.7rc1 next weekend, which I think I'm going to do.)
msg298282 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-07-13 12:42
I'll do this.
msg298455 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2017-07-17 03:32
If it's also reproducible with 3.3, I'd take a 3.3 PR for it.
msg298456 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-07-17 04:42
I wait merging PR 2695 for cherry-picking it to 3.3.
msg298861 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2017-07-22 19:07
New changeset 34fae03cd6c9e304e02c571b3bf9e8df0cfe76be by larryhastings (Serhiy Storchaka) in branch '3.4':
[3.4] bpo-26617: Ensure gc tracking is off when invoking weakref callbacks. (#2695)
https://github.com/python/cpython/commit/34fae03cd6c9e304e02c571b3bf9e8df0cfe76be
msg298863 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2017-07-22 19:18
Sorry for the delay in merging your PR.  I believe Ned is planning to tag the next 3.3 rc1 tomorrow (Sunday).  If you see this, and you have time, it'd be great if you could make a PR for the 3.3 backport.

Note that 3.3 is probably *not* getting converted to use "blurb".  So if you do make the backport, please just edit Misc/NEWS directly.

Thanks!
msg298869 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-07-22 20:11
Seems this isn't reproducible with 3.3.
msg299526 - (view) Author: Alexander Mohr (thehesiod) * Date: 2017-07-30 21:57
I'm tracking something very similar issue to this in bug: http://bugs.python.org/issue31061  Given its similarities, anyone have any ideas?  Based on the second callstack I'm starting to think this is an issue with defaultdict
msg299576 - (view) Author: Alexander Mohr (thehesiod) * Date: 2017-07-31 18:34
so I just discovered that the object that has the zero refcount has the same tp_dealloc:

(gdb) print *FROM_GC(gc)->ob_type
$8 = {ob_base = {ob_base = {_ob_next = 0x7f80c8aafc88, _ob_prev = 0x7f80c8aafd00, ob_refcnt = 7, ob_type = 0x7f80cd8c86e0 <PyType_Type>}, ob_size = 0}, tp_name = 0x7f80c8aa5c38 "_GatheringFuture", tp_basicsize = 104, tp_itemsize = 0,
  tp_dealloc = 0x7f80cd41bee7 <subtype_dealloc>, tp_print = 0x0, tp_getattr = 0x0, tp_setattr = 0x0, tp_as_async = 0x556ba4342d58, tp_repr = 0x7f80c8870a9b <FutureObj_repr>, tp_as_number = 0x556ba4342d70, tp_as_sequence = 0x556ba4342ea8,
  tp_as_mapping = 0x556ba4342e90, tp_hash = 0x7f80cd54c995 <_Py_HashPointer>, tp_call = 0x0, tp_str = 0x7f80cd423d05 <object_str>, tp_getattro = 0x7f80cd3fec8b <PyObject_GenericGetAttr>, tp_setattro = 0x7f80cd3ff0b5 <PyObject_GenericSetAttr>,
  tp_as_buffer = 0x556ba4342ef8, tp_flags = 808449, tp_doc = 0x7f80c8cd7380 "Helper for gather().\n\n    This overrides cancel() to cancel all the children and act more\n    like Task.cancel(), which doesn't immediately mark itself as\n    cancelled.\n    ",
  tp_traverse = 0x7f80cd41baae <subtype_traverse>, tp_clear = 0x7f80cd41bd5c <subtype_clear>, tp_richcompare = 0x7f80cd423d42 <object_richcompare>, tp_weaklistoffset = 96, tp_iter = 0x7f80c8871d4f <future_new_iter>,
  tp_iternext = 0x7f80cd3fe6d6 <_PyObject_NextNotImplemented>, tp_methods = 0x0, tp_members = 0x556ba4342f28, tp_getset = 0x0, tp_base = 0x7f80c8a7b3c0 <FutureType>, tp_dict = 0x7f80c8aafc88, tp_descr_get = 0x0, tp_descr_set = 0x0, tp_dictoffset = 88,
  tp_init = 0x7f80cd431000 <slot_tp_init>, tp_alloc = 0x7f80cd41b7f8 <PyType_GenericAlloc>, tp_new = 0x7f80cd41b9c6 <PyType_GenericNew>, tp_free = 0x7f80cd583d8b <PyObject_GC_Del>, tp_is_gc = 0x0, tp_bases = 0x7f80c8ab20c0, tp_mro = 0x7f80c8aafd00, 
  tp_cache = 0x0, tp_subclasses = 0x0, tp_weaklist = 0x7f80c8aae5d8, tp_del = 0x0, tp_version_tag = 791, tp_finalize = 0x7f80c8870ddb <FutureObj_finalize>}

This is for a GatheringFuture, something tells me perhaps there is more to this function that needs to be resolved?
History
Date User Action Args
2017-07-31 18:34:50thehesiodsetmessages: + msg299576
2017-07-30 21:57:27thehesiodsetnosy: + thehesiod
messages: + msg299526
2017-07-22 20:11:28serhiy.storchakasetpull_requests: - pull_request953
2017-07-22 20:11:05serhiy.storchakasetpriority: release blocker -> normal
status: open -> closed
messages: + msg298869
2017-07-22 19:18:05larrysetmessages: + msg298863
2017-07-22 19:07:12larrysetmessages: + msg298861
2017-07-17 04:42:21serhiy.storchakasetmessages: + msg298456
2017-07-17 03:32:55ned.deilysetmessages: + msg298455
2017-07-13 12:52:46serhiy.storchakasetpull_requests: + pull_request2759
2017-07-13 12:42:40serhiy.storchakasetstatus: closed -> open
nosy: + serhiy.storchaka
messages: + msg298282

2017-07-13 12:38:03larrysetmessages: + msg298281
2017-03-31 16:36:21dstufftsetpull_requests: + pull_request953
2016-10-14 12:30:15hayposetmessages: + msg278646
2016-10-14 12:13:12hayposetmessages: + msg278644
2016-10-14 11:27:28Frens Jan Rumphsetnosy: + Frens Jan Rumph
messages: + msg278642
2016-10-04 07:01:14python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg278015

resolution: fixed
stage: resolved
2016-10-03 11:28:13guojiahuasetmessages: + msg277952
2016-10-03 09:04:53asvetlovsetnosy: + asvetlov
messages: + msg277941
2016-10-03 07:59:32hayposetpriority: normal -> release blocker
nosy: + larry, ned.deily
messages: + msg277935

2016-10-03 07:55:33hayposetversions: + Python 3.6, Python 3.7
2016-10-03 06:27:52balloobsetnosy: + balloob
2016-10-01 17:56:00gvanrossumsetnosy: + gvanrossum
messages: + msg277825
2016-05-18 15:40:13guojiahuasetfiles: + patch
2016-03-23 07:58:19hayposetnosy: + tim.peters
2016-03-23 07:58:12hayposetfiles: + crash.py

messages: + msg262245
2016-03-23 07:12:07serhiy.storchakasetnosy: + fdrake, pitrou, haypo
2016-03-23 06:59:39xiang.zhangsetnosy: + xiang.zhang
2016-03-23 06:54:34guojiahuasetnosy: + guojiahua
2016-03-23 06:49:42Yang Kesetnosy: + Yang Ke, - guojiahua
messages: + msg262243
2016-03-23 06:08:59guojiahuacreate