This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Never ending loop in in update_refs Modules/gcmodule.c
Type: Stage:
Components: Versions: Python 2.7
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: David.Butler, amaury.forgeotdarc, jcea
Priority: normal Keywords:

Created on 2011-12-16 23:27 by David.Butler, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
cpu_usage.png David.Butler, 2011-12-19 18:07
Messages (13)
msg149641 - (view) Author: David Butler (David.Butler) Date: 2011-12-16 23:27
CPU will sit a 100% indefinitely, this is also pretty tough (takes several hours) to reproduce

if you step through the program it is stuck between 290 and 292


Modules/gcmodule.c:

/* Set all gc_refs = ob_refcnt.  After this, gc_refs is > 0 for all objects
 * in containers, and is GC_REACHABLE for all tracked gc objects not in
 * containers.
 */
static void
update_refs(PyGC_Head *containers)
{
    PyGC_Head *gc = containers->gc.gc_next;
    for (; gc != containers; gc = gc->gc.gc_next) {      <-- line 290
        assert(gc->gc.gc_refs == GC_REACHABLE);
        gc->gc.gc_refs = Py_REFCNT(FROM_GC(gc));         <-- line 292
        /* Python's cyclic gc should never see an incoming refcount
         * of 0:  if something decref'ed to 0, it should have been
         * deallocated immediately at that time.
         * Possible cause (if the assert triggers):  a tp_dealloc
         * routine left a gc-aware object tracked during its teardown
         * phase, and did something-- or allowed something to happen --
         * that called back into Python.  gc can trigger then, and may
         * see the still-tracked dying object.  Before this assert
         * was added, such mistakes went on to allow gc to try to
         * delete the object again.  In a debug build, that caused
         * a mysterious segfault, when _Py_ForgetReference tried
         * to remove the object from the doubly-linked list of all
         * objects a second time.  In a release build, an actual
         * double deallocation occurred, which leads to corruption
         * of the allocator's internal bookkeeping pointers.  That's
         * so serious that maybe this should be a release-build
         * check instead of an assert?
         */
        assert(gc->gc.gc_refs != 0);
    }
}



GDB backtrace:

#0  0xb7750fcb in update_refs (generation=2) at Modules/gcmodule.c:290
#1  collect (generation=2) at Modules/gcmodule.c:873
#2  0xb77515e3 in collect_generations (basicsize=20) at Modules/gcmodule.c:996
#3  _PyObject_GC_Malloc (basicsize=20) at Modules/gcmodule.c:1457
#4  0xb775163e in _PyObject_GC_NewVar (tp=0xb77a0640, nitems=2) at Modules/gcmodule.c:1477
#5  0xb76df4b7 in PyTuple_New (size=2) at Objects/tupleobject.c:90
#6  0xb77230fd in PyEval_EvalFrameEx (f=
    Frame 0x889acdc, for file /usr/lib/python2.7/site-packages/twisted/python/failure.py, line 443, in __getstate__ (self=<Failure(count=3460, tb=<traceback at remote 0x9884fa4>, value=exceptions.AttributeError("DeferredList instance has no attribute 'resultList'",), parents=['exceptions.AttributeError', 'exceptions.StandardError', 'exceptions.Exception', 'exceptions.BaseException', '__builtin__.object', 'exceptions.AttributeError'], frames=[['_runCallbacks', '/usr/lib/python2.7/site-packages/twisted/internet/defer.py', 542, [('chain', [<Deferred(_chainedTo=None, called=True, _canceller=None, callbacks=[], result=<...>, _runningCallbacks=False) at remote 0x9f1cb2c>]), ('callback', <instancemethod at remote 0x9dbf98c>), ('self', <...>), ('args', (1, True)), ('current', <...>), ('item', ((<instancemethod at remote 0x9dbf98c>, (...), None), (<instancemethod at remote 0x9f1e11c>, (1, False), None))), ('finished', True), ('kw', {})], [('AlreadyCalledError', <type at remote 0x885dd6c>), ('DeferredFilesystemLock', <type...(truncated), throwflag=0) at Python/ceval.c:2178
#7  0xb772489a in fast_function (f=
    Frame 0x889ab84, for file /usr/lib/python2.7/site-packages/twisted/python/failure.py, line 452, in cleanFailure (self=<Failure(count=3460, tb=<traceback at remote 0x9884fa4>, value=exceptions.AttributeError("DeferredList instance has no attribute 'resultList'",), parents=['exceptions.AttributeError', 'exceptions.StandardError', 'exceptions.Exception', 'exceptions.BaseException', '__builtin__.object', 'exceptions.AttributeError'], frames=[['_runCallbacks', '/usr/lib/python2.7/site-packages/twisted/internet/defer.py', 542, [('chain', [<Deferred(_chainedTo=None, called=True, _canceller=None, callbacks=[], result=<...>, _runningCallbacks=False) at remote 0x9f1cb2c>]), ('callback', <instancemethod at remote 0x9dbf98c>), ('self', <...>), ('args', (1, True)), ('current', <...>), ('item', ((<instancemethod at remote 0x9dbf98c>, (...), None), (<instancemethod at remote 0x9f1e11c>, (1, False), None))), ('finished', True), ('kw', {})], [('AlreadyCalledError', <type at remote 0x885dd6c>), ('DeferredFilesystemLock', <type...(truncated), throwflag=0) at Python/ceval.c:4099
#8  call_function (f=
    Frame 0x889ab84, for file /usr/lib/python2.7/site-packages/twisted/python/failure.py, line 452, in cleanFailure (self=<Failure(count=3460, tb=<traceback at remote 0x9884fa4>, value=exceptions.AttributeError("DeferredList instance has no attribute 'resultList'",), parents=['exceptions.AttributeError', 'exceptions.StandardError', 'exceptions.Exception', 'exceptions.BaseException', '__builtin__.object', 'exceptions.AttributeError'], frames=[['_runCallbacks', '/usr/lib/python2.7/site-packages/twisted/internet/defer.py', 542, [('chain', [<Deferred(_chainedTo=None, called=True, _canceller=None, callbacks=[], result=<...>, _runningCallbacks=False) at remote 0x9f1cb2c>]), ('callback', <instancemethod at remote 0x9dbf98c>), ('self', <...>), ('args', (1, True)), ('current', <...>), ('item', ((<instancemethod at remote 0x9dbf98c>, (...), None), (<instancemethod at remote 0x9f1e11c>, (1, False), None))), ('finished', True), ('kw', {})], [('AlreadyCalledError', <type at remote 0x885dd6c>), ('DeferredFilesystemLock', <type...(truncated), throwflag=0) at Python/ceval.c:4034
#9  PyEval_EvalFrameEx (f=
    Frame 0x889ab84, for file /usr/lib/python2.7/site-packages/twisted/python/failure.py, line 452, in cleanFailure (self=<Failure(count=3460, tb=<traceback at remote 0x9884fa4>, value=exceptions.AttributeError("DeferredList instance has no attribute 'resultList'",), parents=['exceptions.AttributeError', 'exceptions.StandardError', 'exceptions.Exception', 'exceptions.BaseException', '__builtin__.object', 'exceptions.AttributeError'], frames=[['_runCallbacks', '/usr/lib/python2.7/site-packages/twisted/internet/defer.py', 542, [('chain', [<Deferred(_chainedTo=None, called=True, _canceller=None, callbacks=[], result=<...>, _runningCallbacks=False) at remote 0x9f1cb2c>]), ('callback', <instancemethod at remote 0x9dbf98c>), ('self', <...>), ('args', (1, True)), ('current', <...>), ('item', ((<instancemethod at remote 0x9dbf98c>, (...), None), (<instancemethod at remote 0x9f1e11c>, (1, False), None))), ('finished', True), ('kw', {})], [('AlreadyCalledError', <type at remote 0x885dd6c>), ('DeferredFilesystemLock', <type...(truncated), throwflag=0) at Python/ceval.c:2666
#10 0xb772489a in fast_function (f=
    Frame 0x9e4fea4, for file /usr/lib/python2.7/site-packages/twisted/internet/defer.py, line 578, in _runCallbacks (self=<Deferred(_chainedTo=None, called=True, _canceller=None, callbacks=[], result=<Failure(count=3460, tb=<traceback at remote 0x9884fa4>, value=exceptions.AttributeError("DeferredList instance has no attribute 'resultList'",), parents=['exceptions.AttributeError', 'exceptions.StandardError', 'exceptions.Exception', 'exceptions.BaseException', '__builtin__.object', 'exceptions.AttributeError'], frames=[['_runCallbacks', '/usr/lib/python2.7/site-packages/twisted/internet/defer.py', 542, [('chain', [<...>]), ('callback', <instancemethod at remote 0x9dbf98c>), ('self', <...>), ('args', (1, True)), ('current', <...>), ('item', ((<instancemethod at remote 0x9dbf98c>, (...), None), (<instancemethod at remote 0x9f1e11c>, (1, False), None))), ('finished', True), ('kw', {})], [('AlreadyCalledError', <type at remote 0x885dd6c>), ('DeferredFilesystemLock', <type at remote 0x8860f54>), ('DebugInfo', <classob...(truncated), throwflag=0) at Python/ceval.c:4099
#11 call_function (f=
    Frame 0x9e4fea4, for file /usr/lib/python2.7/site-packages/twisted/internet/defer.py, line 578, in _runCallbacks (self=<Deferred(_chainedTo=None, called=True, _canceller=None, callbacks=[], result=<Failure(count=3460, tb=<traceback at remote 0x9884fa4>, value=exceptions.AttributeError("DeferredList instance has no attribute 'resultList'",), parents=['exceptions.AttributeError', 'exceptions.StandardError', 'exceptions.Exception', 'exceptions.BaseException', '__builtin__.object', 'exceptions.AttributeError'], frames=[['_runCallbacks', '/usr/lib/python2.7/site-packages---Type <return> to continue, or q <return> to quit---
/twisted/internet/defer.py', 542, [('chain', [<...>]), ('callback', <instancemethod at remote 0x9dbf98c>), ('self', <...>), ('args', (1, True)), ('current', <...>), ('item', ((<instancemethod at remote 0x9dbf98c>, (...), None), (<instancemethod at remote 0x9f1e11c>, (1, False), None))), ('finished', True), ('kw', {})], [('AlreadyCalledError', <type at remote 0x885dd6c>), ('DeferredFilesystemLock', <type at remote 0x8860f54>), ('DebugInfo', <classob...(truncated), throwflag=0) at Python/ceval.c:4034
#12 PyEval_EvalFrameEx (f=
    Frame 0x9e4fea4, for file /usr/lib/python2.7/site-packages/twisted/internet/defer.py, line 578, in _runCallbacks (self=<Deferred(_chainedTo=None, called=True, _canceller=None, callbacks=[], result=<Failure(count=3460, tb=<traceback at remote 0x9884fa4>, value=exceptions.AttributeError("DeferredList instance has no attribute 'resultList'",), parents=['exceptions.AttributeError', 'exceptions.StandardError', 'exceptions.Exception', 'exceptions.BaseException', '__builtin__.object', 'exceptions.AttributeError'], frames=[['_runCallbacks', '/usr/lib/python2.7/site-packages/twisted/internet/defer.py', 542, [('chain', [<...>]), ('callback', <instancemethod at remote 0x9dbf98c>), ('self', <...>), ('args', (1, True)), ('current', <...>), ('item', ((<instancemethod at remote 0x9dbf98c>, (...), None), (<instancemethod at remote 0x9f1e11c>, (1, False), None))), ('finished', True), ('kw', {})], [('AlreadyCalledError', <type at remote 0x885dd6c>), ('DeferredFilesystemLock', <type at remote 0x8860f54>), ('DebugInfo', <classob...(truncated), throwflag=0) at Python/ceval.c:2666
#13 0xb772489a in fast_function (f=
    Frame 0x9b745cc, for file /usr/lib/python2.7/site-packages/twisted/internet/defer.py, line 455, in _startRunCallbacks (self=<Deferred(_chainedTo=None, called=True, _canceller=None, callbacks=[], result=<Failure(count=3460, tb=<traceback at remote 0x9884fa4>, value=exceptions.AttributeError("DeferredList instance has no attribute 'resultList'",), parents=['exceptions.AttributeError', 'exceptions.StandardError', 'exceptions.Exception', 'exceptions.BaseException', '__builtin__.object', 'exceptions.AttributeError'], frames=[['_runCallbacks', '/usr/lib/python2.7/site-packages/twisted/internet/defer.py', 542, [('chain', [<...>]), ('callback', <instancemethod at remote 0x9dbf98c>), ('self', <...>), ('args', (1, True)), ('current', <...>), ('item', ((<instancemethod at remote 0x9dbf98c>, (...), None), (<instancemethod at remote 0x9f1e11c>, (1, False), None))), ('finished', True), ('kw', {})], [('AlreadyCalledError', <type at remote 0x885dd6c>), ('DeferredFilesystemLock', <type at remote 0x8860f54>), ('DebugInfo', <cl...(truncated), throwflag=0) at Python/ceval.c:4099
#14 call_function (f=
    Frame 0x9b745cc, for file /usr/lib/python2.7/site-packages/twisted/internet/defer.py, line 455, in _startRunCallbacks (self=<Deferred(_chainedTo=None, called=True, _canceller=None, callbacks=[], result=<Failure(count=3460, tb=<traceback at remote 0x9884fa4>, value=exceptions.AttributeError("DeferredList instance has no attribute 'resultList'",), parents=['exceptions.AttributeError', 'exceptions.StandardError', 'exceptions.Exception', 'exceptions.BaseException', '__builtin__.object', 'exceptions.AttributeError'], frames=[['_runCallbacks', '/usr/lib/python2.7/site-packages/twisted/internet/defer.py', 542, [('chain', [<...>]), ('callback', <instancemethod at remote 0x9dbf98c>), ('self', <...>), ('args', (1, True)), ('current', <...>), ('item', ((<instancemethod at remote 0x9dbf98c>, (...), None), (<instancemethod at remote 0x9f1e11c>, (1, False), None))), ('finished', True), ('kw', {})], [('AlreadyCalledError', <type at remote 0x885dd6c>), ('DeferredFilesystemLock', <type at remote 0x8860f54>), ('DebugInfo', <cl...(truncated), throwflag=0) at Python/ceval.c:4034
#15 PyEval_EvalFrameEx (f=
    Frame 0x9b745cc, for file /usr/lib/python2.7/site-packages/twisted/internet/defer.py, line 455, in _startRunCallbacks (self=<Deferred(_chainedTo=None, called=True, _canceller=None, callbacks=[], result=<Failure(count=3460, tb=<traceback at remote 0x9884fa4>, value=exceptions.AttributeError("DeferredList instance has no attribute 'resultList'",), parents=['exceptions.AttributeError', 'exceptions.StandardError', 'exceptions.Exception', 'exceptions.BaseException', '__builtin__.object', 'exceptions.AttributeError'], frames=[['_runCallbacks', '/usr/lib/python2.7/site-packages/twisted/internet/defer.py', 542, [('chain', [<...>]), ('callback', <instancemethod at remote 0x9dbf98c>), ('self', <...>), ('args', (1, True)), ('current', <...>), ('item', ((<instancemethod at remote 0x9dbf98c>, (...), None), (<instancemethod at remote 0x9f1e11c>, (1, False), None))), ('finished', True), ('kw', {})], [('AlreadyCalledError', <type at remote 0x885dd6c>), ('DeferredFilesystemLock', <type at remote 0x8860f54>), ('DebugInfo', <cl...(truncated), throwflag=0) at Python/ceval.c:2666
#16 0xb772489a in fast_function (f=
    Frame 0x9ee4c04, for file /usr/lib/python2.7/site-packages/twisted/internet/defer.py, line 361, in callback (self=<Deferred(_chainedTo=None, called=True, _canceller=None, callbacks=[], result=<Failure(count=3460, tb=<traceback at remote 0x9884fa4>, value=exceptions.AttributeError("DeferredList instance has no attribute 'resultList'",), parents=['exceptions.AttributeError', 'exceptions.StandardError', 'exceptions.Exception', 'exceptions.BaseException', '__builtin__.object', 'exceptions.AttributeError'], frames=[['_runCallbacks', '/usr/lib/python2.7/site-packages/twisted/internet/defer.py', 542, [('chain', [<...>]), ('callback', <instancemethod at remote 0x9dbf98c>), ('self', <...>), ('args', (1, True)), ('current', <...>), ('item', ((<instancemethod at remote 0x9dbf98c>, (...), None), (<instancemethod at remote 0x9f1e11c>, (1, False), None))), ('finished', True), ('kw', {})], [('AlreadyCalledError', <type at remote 0x885dd6c>), ('DeferredFilesystemLock', <type at remote 0x8860f54>), ('DebugInfo', <classobj at ...(truncated), throwflag=0) at Python/ceval.c:4099
#17 call_function (f=
    Frame 0x9ee4c04, for file /usr/lib/python2.7/site-packages/twisted/internet/defer.py, line 361, in callback (self=<Deferred(_chainedTo=None, called=True, _canceller=None, callbacks=[], result=<Failure(count=3460, tb=<traceback at remote 0x9884fa4>, value=exceptions.AttributeError("DeferredList instance has no attribute 'resultList'",), parents=['exceptions.AttributeError', 'exceptions.StandardError', 'exceptions.Exception', 'exceptions.BaseException', '__builtin__.object', 'exceptions.AttributeError'], frames=[['_runCallbacks', '/usr/lib/python2.7/site-packages/twisted/internet/defer.py', 542, [('chain', [<...>]), ('callback', <instancemethod at remote 0x9dbf98c>), ('self', <...>), ('args', (1, True)), ('current', <...>), ('item', ((<instancemethod at remote 0x9dbf98c>, (...), None), (<instancemethod at remote 0x9f1e11c>, (1, False), None))), ('finished', True), ('kw', {})], [('AlreadyCalledError', <type at remote 0x885dd6c>), ('DeferredFilesystemLock', <type at remote 0x8860f54>), ('DebugInfo', <classobj at ...(truncated), throwflag=0) at Python/ceval.c:4034
msg149644 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2011-12-16 23:43
Are you sure that the program is really stuck in the gc module?
The loop you mention has to go through all objects of the process.  It's possible that it allocated many objects, that one garbage collection takes a few seconds, and even that most of the time is spent inside the garbage collector.
But can you check whether this function terminates?  For example, you could put a breakpoint in the collect() function.
msg149690 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2011-12-17 18:38
The only way of this thing actually happening is if the GC link list has actually a cycle.

Without a testcase to try to reproduce, it can't be debugged.

David, can you reproduce this consistently, even if it takes a few hours?.

As Amaury pointed out, if you have tons of objects, GC can take a while. And if your memory is paged out to swap, this can actually be VERY slow.

To know if this is actually a infinite loop, when you think it is stuck in the loop, just check current value of "gc" and set a breakpoint for that value in the loop. So GDB will stop again if "gc" gets the same value again and so we are actually in a loop.

In the meantime, check your page-in rate. If you are paging-in furiously, you are not in a loop, you are (mechanically slow) bringing objects back to memory.

Can you tell us how big your python process is (in memory), how much physical RAM you have, and if some other processes are competing for memory?.

If you can confirm that we are actually in an infinite loop... there is little that we can do without some testcase we can reproduce. The key here is that if you can reproduce, we can slowly move forward.
msg149882 - (view) Author: David Butler (David.Butler) Date: 2011-12-19 18:07
sorry for the delay, I had to wait until the problem occurred again...

I gdb'ed into the process again, the backtrace is a little different this time...

(gdb) bt
#0  0xb76adfc6 in update_refs (containers=<optimized out>) at Modules/gcmodule.c:292
#1  collect (generation=2) at Modules/gcmodule.c:873
#2  0xb76ae5e3 in collect_generations () at Modules/gcmodule.c:996
#3  _PyObject_GC_Malloc (basicsize=16) at Modules/gcmodule.c:1457
#4  0xb76ae684 in _PyObject_GC_New (tp=0xb76f6aa0) at Modules/gcmodule.c:1467
#5  0xb761a8bd in list_iter (seq=
    [((None, u'to'), u'aves.rule'), ((None, u'rel'), u'ManyToOneRel'), ((None, u'name'), u'rule')]) at Objects/listobject.c:2873
#6  0xb75eef45 in PyObject_GetIter (o=
    [((None, u'to'), u'aves.rule'), ((None, u'rel'), u'ManyToOneRel'), ((None, u'name'), u'rule')]) at Objects/abstract.c:3083
#7  0xb7680f35 in PyEval_EvalFrameEx (f=
    Frame 0x9bcc0c4, for file /usr/lib/python2.7/xml/dom/pulldom.py, line 88, in startElementNS (self=<PullDOM(documentFactory=<DOMImplementation() at remote 0x9c217cc>, _ns_contexts=[{'http://www.w3.org/XML/1998/namespace': 'xml'}], pending_events=None, _current_context={...}, elementStack=[<Document(implementation=<...>, _elem_info={}, doctype=None, _id_search_stack=None, childNodes=<NodeList at remote 0xa2fac6c>, _id_cache={}) at remote 0xa2fac4c>, <Element(ownerDocument=<...>, nodeName=u'django-objects', nextSibling=None, parentNode=<...>, namespaceURI=None, prefix=None, _attrsNS={(None, u'version'): <Attr(ownerDocument=<...>, nodeName=u'version', ownerElement=<...>, value=u'1.0', namespaceURI=None, prefix=None, nodeValue=u'1.0', childNodes=<NodeList at remote 0xa2faccc>, name=u'version') at remote 0xa2facac>}, tagName=u'django-objects', childNodes=<NodeList at remote 0xa2fabcc>, _attrs={u'version': <...>}) at remote 0xa2faa8c>, <Element(ownerDocument=<...>, nodeName=u'object', namespaceURI=None, prefix=None,...(truncated), throwflag=0) at Python/ceval.c:2483
#8  0xb768189a in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, 
    pp_stack=<optimized out>, func=<optimized out>) at Python/ceval.c:4099
#9  call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4034
#10 PyEval_EvalFrameEx (f=
    Frame 0x9ed8f74, for file /usr/lib/python2.7/xml/sax/expatreader.py, line 338, in start_element_ns (self=<ExpatParser(_namespaces=1, _parser=<pyexpat.xmlparser at remote 0xa22fe2c>, _external_ges=1, _source=<InputSource(_InputSource__charfile=None, _InputSource__bytefile=None, _InputSource__public_id=None, _InputSource__system_id=None, _InputSource__encoding=None) at remote 0xa2fabac>, _decl_handler_prop=None, _bufsize=65516, _cont_handler=<PullDOM(documentFactory=<DOMImplementation() at remote 0x9c217cc>, _ns_contexts=[{'http://www.w3.org/XML/1998/namespace': 'xml'}], pending_events=None, _current_context={...}, elementStack=[<Document(implementation=<...>, _elem_info={}, doctype=None, _id_search_stack=None, childNodes=<NodeList at remote 0xa2fac6c>, _id_cache={}) at remote 0xa2fac4c>, <Element(ownerDocument=<...>, nodeName=u'django-objects', nextSibling=None, parentNode=<...>, namespaceURI=None, prefix=None, _attrsNS={(None, u'version'): <Attr(ownerDocument=<...>, nodeName=u'version', ownerElement=<...>, va...(truncated), throwflag=0) at Python/ceval.c:2666
#11 0xb7683508 in PyEval_EvalCodeEx (co=0x9dd6c80, globals=
    {'SAXNotRecognizedException': <type at remote 0x9c6967c>, 'expat': <module at remote 0x96cbd94>, '__name__': 'xml.sax.expatreader', 'feature_namespace_prefixes': 'http://xml.org/sax/features/namespace-prefixes', 'AttributesNSImpl': <classobj at remote 0x9d8db0c>, 'feature_string_interning': 'http://xml.org/sax/features/string-interning', '__package__': 'xml.sax', 'versi---Type <return> to continue, or q <return> to quit---step
on': '0.20', 'AttributesImpl': <classobj at remote 0x9d8dadc>, 'feature_namespaces': 'http://xml.org/sax/features/namespaces', '__doc__': "\nSAX driver for the pyexpat C module.  This driver works with\npyexpat.__version__ == '2.22'.\n", 'feature_validation': 'http://xml.org/sax/features/validation', 'create_parser': <function at remote 0x9ddba74>, '__builtins__': {'bytearray': <type at remote 0xb76ed660>, 'IndexError': <type at remote 0xb76f1520>, 'all': <built-in function all>, 'help': <_Helper at remote 0xb73643ac>, 'vars': <built-in function vars>, 'SyntaxError': <type at remote 0xb76f11a0>, 'unicode': <type at remote 0xb7701340>, 'UnicodeD...(truncated), locals=0x0, args=0x9d94510, argcount=3, kws=0x0, kwcount=0, defs=0x0, defcount=0, 
    closure=0x0) at Python/ceval.c:3253
#12 0xb7614f2a in function_call (func=<function at remote 0x9dd502c>, arg=
    (<ExpatParser(_namespaces=1, _parser=<pyexpat.xmlparser at remote 0xa22fe2c>, _external_ges=1, _source=<InputSource(_InputSource__charfile=None, _InputSource__bytefile=None, _InputSource__public_id=None, _InputSource__system_id=None, _InputSource__encoding=None) at remote 0xa2fabac>, _decl_handler_prop=None, _bufsize=65516, _cont_handler=<PullDOM(documentFactory=<DOMImplementation() at remote 0x9c217cc>, _ns_contexts=[{'http://www.w3.org/XML/1998/namespace': 'xml'}], pending_events=None, _current_context={...}, elementStack=[<Document(implementation=<...>, _elem_info={}, doctype=None, _id_search_stack=None, childNodes=<NodeList at remote 0xa2fac6c>, _id_cache={}) at remote 0xa2fac4c>, <Element(ownerDocument=<...>, nodeName=u'django-objects', nextSibling=None, parentNode=<...>, namespaceURI=None, prefix=None, _attrsNS={(None, u'version'): <Attr(ownerDocument=<...>, nodeName=u'version', ownerElement=<...>, value=u'1.0', namespaceURI=None, prefix=None, nodeValue=u'1.0', childNodes=<NodeList at remote 0xa2faccc>,...(truncated), kw=0x0) at Objects/funcobject.c:526

--- truncated ---

(gdb) b collect
Breakpoint 1 at 0xb76ade3c: file Modules/gcmodule.c, line 822.
(gdb) cont
Continuing.

( here i let it run for a about a minute )

^C
Program received signal SIGINT, Interrupt.
0xb76adfcb in update_refs (containers=<optimized out>) at Modules/gcmodule.c:290
290     in Modules/gcmodule.c
(gdb) print gc
$1 = (PyGC_Head *) 0xb770df00
(gdb) step
292     in Modules/gcmodule.c
(gdb) info locals
gc = 0xb770df00
(gdb) step
290     in Modules/gcmodule.c
(gdb) info locals
gc = 0xb770df00

I was going to leave the process running, and the debugger up, but I forgot that a reboot script was still enabled, and it caused the machine to reboot... so i am recompiling python with CFLAGS="-O0 ..." to try to fix some "value optimized out" issue in gdb... need to try to figure out what object its getting caught on... 

Could this be caused by one of the loadable libraries that I am importing? ( pyside, psycopg2, ... ) and one of those is making an object that isn't playing nice with the gc?

Also, I have attached a histogram of the cpu usage

I'll try to answer more of your questions when it happens next
msg149895 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2011-12-20 00:23
It seems to be a real infinite loop. Bad thing. Could be a bug in an extension, difficult to say.

This is going to be VERY difficult to debug without a reproductible case we can try.

Could you possibly check the object type of the "infinite loop" object?. If the loop is from an object to itself, maybe you can instrumentalize python to detect this loop when created.

Add to your code "gc.collect()" is a frequent loop. It is going to suck CPU, but could reproduce the issue faster and make it easier to diagnose.
msg149900 - (view) Author: David Butler (David.Butler) Date: 2011-12-20 02:12
2011/12/19 Jesús Cea Avión <report@bugs.python.org>

I am willing to work toward a simplified test case, but its going to be
difficult, I am hoping that I can narrow down the source of the problem...

Forgive me, I'm gdb is actually a new thing to me... how could I check the
object type?
Also what do you mean exactly when you say "instrumentalize"

I started some valgrind testing today, (to try to figure out whats causing
some segfaults in a possibly unrelated bug, will this interfere with
testing for this bug?)

> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue13616>
> _______________________________________
>
msg149901 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2011-12-20 02:27
Instrumentalize: check for this pathological case (an object with a GC pointer back to itself) in the code that modify the GC pointers. Lets say, everytime code change the pointers, you test for this. Luckily you can learn the codepath creating this situation. Change and recompile python code for checking this.

Read Include/object.h. You will see that ANY python object has, at least, two components: a reference counter and a pointer to its type. Follow that pointer to type and get its name.

Let me try an example...

"""
gdb python
GNU gdb (GDB) 7.3.1
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i386-pc-solaris2.10".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/local/bin/python...done.
(gdb) br PyTuple_New
Function "PyTuple_New" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (PyTuple_New) pending.
(gdb) r
Starting program: /usr/local/bin/python 
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[Switching to Thread 1 (LWP 1)]

Breakpoint 1, PyTuple_New (size=0) at Objects/tupleobject.c:50
50      Objects/tupleobject.c: No such file or directory.
        in Objects/tupleobject.c
(gdb) fin
Run till exit from #0  PyTuple_New (size=0) at Objects/tupleobject.c:50
0xfee650e4 in PyType_Ready (type=0xfef5cd00) at Objects/typeobject.c:4077
4077    Objects/typeobject.c: No such file or directory.
        in Objects/typeobject.c
Value returned is $1 = (PyObject *) 0x806202c
(gdb) print $1->ob_type
$2 = (struct _typeobject *) 0xfef5cb40
(gdb) print *($1->ob_type)
$3 = {ob_refcnt = 1, ob_type = 0xfef5cde0, ob_size = 0, tp_name = 0xfef07542 "tuple", tp_basicsize = 12, tp_itemsize = 4, 
  tp_dealloc = 0xfee568f0 <tupledealloc>, tp_print = 0xfee56c80 <tupleprint>, tp_getattr = 0, tp_setattr = 0, tp_compare = 0, 
  tp_repr = 0xfee57380 <tuplerepr>, tp_as_number = 0x0, tp_as_sequence = 0xfef52500, tp_as_mapping = 0xfef52528, 
  tp_hash = 0xfee56850 <tuplehash>, tp_call = 0, tp_str = 0, tp_getattro = 0xfee416d0 <PyObject_GenericGetAttr>, tp_setattro = 0, 
  tp_as_buffer = 0x0, tp_flags = 67519979, 
  tp_doc = 0xfef3b580 "tuple() -> empty tuple\ntuple(iterable) -> tuple initialized from iterable's items\n\nIf the argument is a tuple, the return value is the same object.", tp_traverse = 0xfee56440 <tupletraverse>, tp_clear = 0, 
  tp_richcompare = 0xfee56b10 <tuplerichcompare>, tp_weaklistoffset = 0, tp_iter = 0xfee56a60 <tuple_iter>, tp_iternext = 0, 
  tp_methods = 0xfef52540, tp_members = 0x0, tp_getset = 0x0, tp_base = 0x0, tp_dict = 0x0, tp_descr_get = 0, tp_descr_set = 0, 
  tp_dictoffset = 0, tp_init = 0, tp_alloc = 0, tp_new = 0xfee57680 <tuple_new>, tp_free = 0xfeede6a0 <PyObject_GC_Del>, 
  tp_is_gc = 0, tp_bases = 0x0, tp_mro = 0x0, tp_cache = 0x0, tp_subclasses = 0x0, tp_weaklist = 0x0, tp_del = 0, 
  tp_version_tag = 0}
"""

I break in the tuple creation routine. Then I let it finish. It is going to return a PyObject, a generic python object. I follow the type pointer and get the name of the type: 'tp_name = 0xfef07542 "tuple"'. I already knew the result was going to be a tuple, but it is nice to cross-check :)

Sometimes it is more complicated, and you have to cast pointers. Use the source code as your map.

Sorry, it is difficult to teach how to do such a low level debug remotely :).

PS: Remember to add "gc.collect()" in your mainloop. In this way you will hit the problem faster, because you don't have to wait for Python to invoke the collector implicitly. If you have tons of objects this can be actually slower. Try.
msg149902 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2011-12-20 02:36
David, if you get desperate, let us know. If you can deal with Mercurial and compiling Python code, I could post a mercurial repository/branch with code modifications to help you to debug this.

But it is almost Christmas and I am VERY busy and have to do a few long distance trips yet, so that better be the last option. First try to advance a bit yourself. At least you will learn a lot.

But don't let this bug to languish...

If you can manage to reproduce the issue more easily, that would be VERY useful. Tell us too what external modules are you using.

Good luck.
msg149933 - (view) Author: David Butler (David.Butler) Date: 2011-12-20 22:20
I have 10 identical test machines running this this code ( operating systems are cloned ). I am not usning valgrind in these tests, it was causing various issues...

(gdb) info sharedlibrary
From        To          Syms Read   Shared Object Library
0xb75a02b0  0xb76baa38  Yes         /usr/lib/libpython2.7.so.1.0
0xb75695a0  0xb7575878  Yes         /lib/libpthread.so.0
0xb7422060  0xb7529374  Yes         /lib/libc.so.6
0xb7404bd0  0xb7405b28  Yes         /lib/libdl.so.2
0xb73ffbc0  0xb7400498  Yes         /lib/libutil.so.1
0xb73dd360  0xb73f7e58  Yes         /lib/libm.so.6
0xb77288d0  0xb773f97f  Yes         /lib/ld-linux.so.2
0xb7721310  0xb77226d8  Yes         /usr/lib/python2.7/lib-dynload/time.so
0xb728c130  0xb728e4b8  Yes         /usr/lib/python2.7/site-packages/zope/interface/_zope_interface_coptimizations.so
0xb7284060  0xb7286cf8  Yes         /usr/lib/python2.7/lib-dynload/strop.so
0xb72772a0  0xb727d648  Yes         /usr/lib/python2.7/lib-dynload/_socket.so
0xb7271c50  0xb72729e8  Yes         /usr/lib/python2.7/lib-dynload/_functools.so
0xb726a7d0  0xb726deb8  Yes         /usr/lib/python2.7/lib-dynload/_ssl.so
0xb721ec00  0xb7253f28  Yes         /usr/lib/libssl.so.1.0.0
0xb70f47c0  0xb71d3dd8  Yes         /usr/lib/libcrypto.so.1.0.0
0xb70a3b20  0xb70af7b8  Yes         /lib/libz.so.1
0xb7264130  0xb72653c8  Yes         /usr/lib/python2.7/lib-dynload/cStringIO.so
0xb709aab0  0xb709e108  Yes         /usr/lib/python2.7/lib-dynload/_struct.so
0xb70934c0  0xb70958a8  Yes         /usr/lib/python2.7/lib-dynload/operator.so
0xb708b7e0  0xb708e868  Yes         /usr/lib/python2.7/lib-dynload/_collections.so
0xb707db00  0xb7083dd8  Yes         /usr/lib/python2.7/lib-dynload/itertools.so
0xb725f7d0  0xb725ff98  Yes         /usr/lib/python2.7/lib-dynload/_bisect.so
0xb7077920  0xb7078ee8  Yes         /usr/lib/python2.7/lib-dynload/_heapq.so
0xb7073970  0xb7074028  Yes         /usr/lib/python2.7/lib-dynload/grp.so
0xb771d510  0xb771d6f8  Yes         /usr/lib/python2.7/site-packages/twisted/python/_initgroups.so
0xb7062bf0  0xb706d3f8  Yes         /usr/lib/python2.7/lib-dynload/datetime.so
0xb705ab90  0xb705d358  Yes         /usr/lib/python2.7/lib-dynload/binascii.so
0xb7055a00  0xb7057018  Yes         /usr/lib/python2.7/lib-dynload/fcntl.so
0xb704d9a0  0xb7051258  Yes         /usr/lib/python2.7/lib-dynload/math.so
0xb7046e80  0xb7047d48  Yes         /usr/lib/python2.7/lib-dynload/_hashlib.so
0xb7042d90  0xb7043d78  Yes         /usr/lib/python2.7/lib-dynload/_random.so
0xb703c460  0xb703e548  Yes         /usr/lib/python2.7/lib-dynload/select.so
0xb7036360  0xb7036e18  Yes         /usr/lib/python2.7/lib-dynload/termios.so
0xb7032660  0xb7032968  Yes         /usr/lib/python2.7/site-packages/twisted/internet/_sigchld.so
0xb6ce7930  0xb6ed3188  Yes         /usr/lib/python2.7/site-packages/PySide/QtCore.so
0xb6c903b0  0xb6ca3b58  Yes         /usr/lib/libpyside-python2.7.so.1.0
0xb7016230  0xb7023058  Yes         /usr/lib/libshiboken-python2.7.so.1.0
0xb69ba370  0xb6b4c358  Yes         /usr/lib/qt4/libQtCore.so.4
0xb68cd7e0  0xb693dac8  Yes         /usr/lib/gcc/i686-pc-linux-gnu/4.5.3/libstdc++.so.6
0xb686a390  0xb6880a18  Yes         /usr/lib/gcc/i686-pc-linux-gnu/4.5.3/libgcc_s.so.1
0xb6860e90  0xb68655c8  Yes         /lib/librt.so.1
---Type <return> to continue, or q <return> to quit---
0xb5d81c30  0xb65d9cf8  Yes         /usr/lib/python2.7/site-packages/PySide/QtGui.so
0xb54d9e50  0xb5ba8a38  Yes         /usr/lib/qt4/libQtGui.so.4
0xb5394b80  0xb53b0a58  Yes         /usr/lib/libpng15.so.15
0xb5315a80  0xb5377748  Yes         /usr/lib/libfreetype.so.6
0xb5305910  0xb530a278  Yes         /usr/lib/libSM.so.6
0xb52eee00  0xb52fddd8  Yes         /usr/lib/libICE.so.6
0xb52e3740  0xb52e8be8  Yes         /usr/lib/libXrender.so.1
0xb52db5c0  0xb52dfa48  Yes         /usr/lib/libXrandr.so.2
0xb52b0f50  0xb52ca548  Yes         /usr/lib/libfontconfig.so.1
0xb529f220  0xb52a8e98  Yes         /usr/lib/libXext.so.6
0xb519b580  0xb5220388  Yes         /usr/lib/libX11.so.6
0xb5173200  0xb5181a38  Yes         /lib/libbz2.so.1
0xb5151580  0xb5167338  Yes         /usr/lib/libexpat.so.1
0xb513e0c0  0xb514b988  Yes         /usr/lib/libxcb.so.1
0xb5131b60  0xb51328b8  Yes         /usr/lib/libXau.so.6
0xb512c0f0  0xb512dab8  Yes         /usr/lib/libXdmcp.so.6
0xb50417e0  0xb50f0cf8  Yes         /usr/lib/python2.7/site-packages/PySide/QtNetwork.so
0xb4f13ff0  0xb4fd6f18  Yes         /usr/lib/qt4/libQtNetwork.so.4
0xb702d1e0  0xb702ebd8  Yes         /usr/lib/python2.7/lib-dynload/_locale.so
0xb4ed2ab0  0xb4ee4fa8  Yes         /usr/lib/python2.7/site-packages/psycopg2/_psycopg.so
0xb4ea7d90  0xb4ebe638  Yes (*)     /usr/lib/libpq.so.5
0xb4e8fc90  0xb4e9da98  Yes         /usr/lib/python2.7/lib-dynload/cPickle.so
0xb4e84c70  0xb4e89498  Yes         /usr/lib/python2.7/lib-dynload/array.so
0xb4e69d90  0xb4e7cb08  Yes         /usr/lib/python2.7/lib-dynload/_ctypes.so
0xb4e5f310  0xb4e62638  Yes         /usr/lib/libffi.so.5
0xb4ec8100  0xb4ec9ab8  Yes         /lib/libuuid.so.1
0xb6fb2200  0xb6fb4618  Yes         /usr/lib/libXfixes.so
0xb6fa93e0  0xb6fae248  Yes         /usr/lib/libXcursor.so.1
0xb6f9ad70  0xb6fa4808  Yes         /usr/lib/libXi.so
0xb4e316d0  0xb4e56308  Yes         /usr/lib/python2.7/site-packages/PIL/_imaging.so
0xb4ddfff0  0xb4e134c8  Yes         /usr/lib/libjpeg.so.8
0xb6f79d00  0xb6f8d748  Yes         /usr/lib/python2.7/lib-dynload/_io.so
0xb6f6fef0  0xb6f724d8  Yes         /usr/lib/python2.7/lib-dynload/zlib.so
0xb4d4c190  0xb4db81e8  Yes         /usr/lib/python2.7/site-packages/PySide/QtWebKit.so
0xb4049e70  0xb4ade738  Yes         /usr/lib/qt4/libQtWebKit.so.4
0xb3e22850  0xb3e90648  Yes         /usr/lib/libsqlite3.so.0
0xb3de9c20  0xb3e13f28  Yes         /usr/lib/qt4/libphonon.so.4
0xb3dcd0d0  0xb3dd0548  Yes         /usr/lib/python2.7/site-packages/alsaaudio.so
0xb3d302c0  0xb3db2648  Yes         /usr/lib/libasound.so.2
0xb6fb9b60  0xb6fbcef8  Yes         /usr/lib/qt4/plugins/imageformats/libqgif.so
0xb3509f30  0xb350cd88  Yes         /usr/lib/qt4/plugins/imageformats/libqico.so
0xaebf0710  0xaebf3c28  Yes         /usr/lib/qt4/plugins/imageformats/libqjpeg.so
0xade9ad30  0xadea6eb8  Yes (*)     /usr/lib/libresolv.so
0xade8dfa0  0xade955c8  Yes         /lib/libnss_files.so.2
0xade86da0  0xade89e18  Yes         /lib/libnss_dns.so.2
---Type <return> to continue, or q <return> to quit---
0xad63b080  0xad63c6e8  Yes         /usr/lib/qt4/plugins/imageformats/libqsvg.so
0xad601ac0  0xad633038  Yes         /usr/lib/qt4/libQtSvg.so.4
0xad4a5610  0xad4ab728  Yes         /usr/lib/python2.7/lib-dynload/_json.so
0xad499640  0xad49f578  Yes         /usr/lib/python2.7/lib-dynload/pyexpat.so
0xad018350  0xad01a708  Yes         /usr/lib/python2.7/lib-dynload/unicodedata.so

I am importing twisted, pyside, django, psycopg2, and datetuil 


3 of the machines had a stall much like the last two backtraces, except now originating from my gc.collect() call:

#0  0xb769e9ff in update_refs (containers=0xb7709328) at Modules/gcmodule.c:290
#1  0xb769f55f in collect (generation=2) at Modules/gcmodule.c:873
#2  0xb769f98d in gc_collect (self=0x0, args=(), kws=0x0) at Modules/gcmodule.c:1067
#3  0xb75f0e91 in PyCFunction_Call (func=<built-in function collect>, arg=(), kw=0x0)
    at Objects/methodobject.c:85
#4  0xb7666287 in call_function (pp_stack=0xbfa61960, oparg=0) at Python/ceval.c:4013
#5  0xb7662dff in PyEval_EvalFrameEx (f=
    Frame 0xb0b5b9c, for file /usr/lib/python2.7/site-packages/qt4reactor.py, line 261, in doIteration (self=<QtReactor(waker=<_UnixWaker(i=13, fileno=<function at remote 0xa688994>, reactor=<...>, o=14) at remote 0xa684dac>, threadCallQueue=[], _cancellations=0, _justStopped=False, _newTimedCalls=[], _pendingTimedCalls=[<DelayedCall(resetter=<instancemethod at remote 0xb00016c>, seconds=<built-in function time>, args=(), canceller=<instancemethod at remote 0xad81e64>, delayed_time=0, kw={}, func=<LoopingCall(a=(...), interval=2, clock=<...>, _expectNextCallAt=<float at remote 0xb03ac1c>, f=<instancemethod at remote 0xa97234c>, deferred=<Deferred(_canceller=None, callbacks=[((<function at remote 0x9e59f44>, None, None), (<instancemethod at remote 0xa972374>, (...), {}))]) at remote 0xa9710cc>, running=True, kw={}, starttime=<float at remote 0x9ed5f3c>, call=<...>, _runAtStart=True) at remote 0xa96e76c>, time=<float at remote 0xac30db4>, cancelled=0, called=0) at remote 0xb03482c>, <DelayedCall(resetter=<instancem...(truncated), throwflag=0) at Python/ceval.c:2666

this is weird, but it seems like the type pointer is 0x0 (no type?):

(gdb) print {PyObject} ((PyObject *)(((PyGC_Head *)*gc)+1))
$1 = {ob_refcnt = 700, ob_type = 0x0}

gc gets stuck again in update_refs, and if you look at its members, gc.gc_next == gc.gc_prev == gc ....
(gdb) print {PyGC_Head} gc         
$3 = {gc = {gc_next = 0xb77a3300, gc_prev = 0xb77a3300, gc_refs = 700}, 
  dummy = 1.2675072995965765876244656259641832e-4721}

Two of the machines produced core files, first core:
(Incidentally, this machine, once reset was one of the ones that was counted in the previous list that locked up at update_refs)

Program terminated with signal 6, Aborted.
#0  0xb76fe424 in __kernel_vsyscall ()

(gdb) info thread
  Id   Target Id         Frame 
  12   Thread 0xb0afab70 (LWP 2752) 0xb76fe424 in __kernel_vsyscall ()
  11   Thread 0xb12fbb70 (LWP 2751) 0xb76fe424 in __kernel_vsyscall ()
  10   Thread 0xaf2f7b70 (LWP 2755) 0xb76fe424 in __kernel_vsyscall ()
  9    Thread 0xb22fdb70 (LWP 2749) 0xb76fe424 in __kernel_vsyscall ()
  8    Thread 0xb2afeb70 (LWP 2748) 0xb76fe424 in __kernel_vsyscall ()
  7    Thread 0xb1afcb70 (LWP 2750) 0xb76fe424 in __kernel_vsyscall ()
  6    Thread 0xb32ffb70 (LWP 2747) 0xb76fe424 in __kernel_vsyscall ()
  5    Thread 0xaeaeeb70 (LWP 2757) 0xb76fe424 in __kernel_vsyscall ()
  4    Thread 0xb02f9b70 (LWP 2753) 0xb76fe424 in __kernel_vsyscall ()
  3    Thread 0xb3ce6b70 (LWP 2740) 0xb76fe424 in __kernel_vsyscall ()
  2    Thread 0xafaf8b70 (LWP 2754) 0xb7039cda in __i686.get_pc_thunk.bx ()
   from /usr/lib/python2.7/lib-dynload/datetime.so
* 1    Thread 0xb73af6c0 (LWP 2597) 0xb76fe424 in __kernel_vsyscall ()

I'm not at all sure whats going on here... but the file for the service ends with this:
Fatal Python error: GC object already tracked

The other core file is different:

Program terminated with signal 11, Segmentation fault.
#0  0xb766f3ee in call_function (pp_stack=0xbfda38c0, oparg=0) at Python/ceval.c:4047
4047    Python/ceval.c: No such file or directory.
        in Python/ceval.c
#0  0xb766f3ee in call_function (pp_stack=0xbfda38c0, oparg=0) at Python/ceval.c:4047
#1  0xb766bdff in PyEval_EvalFrameEx (f=<unknown at remote 0xa0bf31c>, throwflag=0)
    at Python/ceval.c:2666
#2  0xb766f542 in fast_function (func=<function at remote 0x922733c>, pp_stack=0xbfda3ac0, 
    n=1, na=1, nk=0) at Python/ceval.c:4099
#3  0xb766f383 in call_function (pp_stack=0xbfda3ac0, oparg=0) at Python/ceval.c:4034
#4  0xb766bdff in PyEval_EvalFrameEx (f=<unknown at remote 0xa0a69ac>, throwflag=0)
    at Python/ceval.c:2666
#5  0xb766f542 in fast_function (func=<function at remote 0x92272cc>, pp_stack=0xbfda3cc0, 
    n=2, na=2, nk=0) at Python/ceval.c:4099
#6  0xb766f383 in call_function (pp_stack=0xbfda3cc0, oparg=1) at Python/ceval.c:4034
#7  0xb766bdff in PyEval_EvalFrameEx (f=<unknown at remote 0xa0c27a4>, throwflag=0)
    at Python/ceval.c:2666
#8  0xb766d7ad in PyEval_EvalCodeEx (co=0x9212cc8, globals=
    {'DeferredFilesystemLock': <type at remote 0x92691b4>, 'AlreadyCalledError': <type at remote 0x912420c>, 'returnValue': <function at remote 0x9227684>, '_deferGenerator': <function at remote 0x92275a4>, 'DebugInfo': <classobj at remote 0x922032c>, '_parseDListResult': <function at remote 0x922702c>, 'FirstError': <type at remote 0x91253a4>, 'deferredGenerator': <function at remote 0x922764c>, 'fail': <function at remote 0x9217e64>, 'waitForDeferred': <classobj at remote 0x922044c>, 'inlineCallbacks': <function at remote 0x922772c>, 'logError': <function at remote 0x9217ca4>, 'log': <module at remote 0xb6ff5a64>, '__all__': ['Deferred', 'DeferredList', 'succeed', 'fail', 'FAILURE', 'SUCCESS', 'AlreadyCalledError', 'TimeoutError', 'gatherResults', 'maybeDeferred', 'waitForDeferred', 'deferredGenerator', 'inlineCallbacks', 'returnValue', 'DeferredLock', 'DeferredSemaphore', 'DeferredQueue', 'DeferredFilesystemLock', 'AlreadyTryingToLockError'], 'CancelledError': <type at remote 0x91243e4>, '__package__': 'twiste...(truncated), locals=0x0, args=0xa2fd8d8, argcount=2, kws=0xb73a1038, kwcount=0, defs=0x0, 
    defcount=0, closure=0x0) at Python/ceval.c:3253
(past this point in the backtrace, you get alot of python tracebacks ( something in the core file is messing with the python-gdb extension )

na = 0
nk = 0
n = 0
pfunc = 0xa0bf47c
func = <classobj at remote 0x931c9bc>
x = 
    <Failure(count=60, tb=None, value=exceptions.SystemError('Python/traceback.c:89: bad argument to internal function',), parents=['exceptions.SystemError', 'exceptions.StandardError', 'exceptions.Exception', 'exceptions.BaseException', '__builtin__.object', 'exceptions.SystemError'], frames=[], type=<type at remote 0xb76ee1e0>, stack=[]) at remote 0xa2fdb8c>
w = 0x0



I don't know if this will be helpful?
http://mail.python.org/pipermail/python-dev/2001-October/018002.html
msg149934 - (view) Author: David Butler (David.Butler) Date: 2011-12-20 22:37
This also looks familiar:
http://bytes.com/topic/python/answers/36901-fatal-error-gc-object-already-tracked
"semi-randomly locks somewhere outside my C-code (after returning to
python), and semi-randomly issues a "Fatal error: GC object already
tracked... "
Someone replies with:
"... should be calling PyObject_GC_UnTrack before doing anything else..."

So, perhaps one of the extensions is not calling PyObject_GC_UnTrack? but I need to know which one to go further...
msg149935 - (view) Author: David Butler (David.Butler) Date: 2011-12-20 22:50
I think I have found the problem! I took a closer look at the Fatal error core:

#0  0xb76fe424 in __kernel_vsyscall ()
#1  0xb740ccb1 in *__GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#2  0xb740e3f2 in *__GI_abort () at abort.c:92
#3  0xb766193f in Py_FatalError (msg=0xb76949de "GC object already tracked")
    at Python/pythonrun.c:1670
#4  0xb7595a3a in PyMethod_New (func=<function at remote 0x8d649cc>, self=
    <QtReactor(waker=<_UnixWaker(i=13, fileno=<function at remote 0x93f6994>, reactor=<...>, o=14) at remote 0x93f3dac>, threadCallQueue=[(<instancemethod at remote 0x9b5639c>, (0,), {})], _cancellations=0, _justStopped=False, _newTimedCalls=[], _pendingTimedCalls=[<DelayedCall(resetter=<instancemethod at remote 0x9b5d504>, seconds=<built-in function time>, args=(), canceller=<instancemethod at remote 0x9cb48ec>, delayed_time=0, kw={}, func=<LoopingCall(a=(...), interval=2, clock=<...>, _expectNextCallAt=<float at remote 0x9ab7b44>, f=<instancemethod at remote 0x96db34c>, deferred=<Deferred(_canceller=None, callbacks=[((<function at remote 0x8bc6f44>, None, None), (<instancemethod at remote 0x96db374>, (...), {}))]) at remote 0x96da0cc>, running=True, kw={}, starttime=<float at remote 0x8c3f0f4>, call=<...>, _runAtStart=True) at remote 0x96d776c>, time=<float at remote 0x9d4cb74>, cancelled=0, called=0) at remote 0x9d1184c>, <DelayedCall(resetter=<instancemethod at remote 0xa2c4edc>, seconds=<built-in function ti...(truncated), klass=<type at remote 0x8e5a3dc>) at Objects/classobject.c:2250
#5  0xb6c71225 in PySide::DynamicSlotDataV2::callback (this=0x96b6440)
    at /var/tmp/portage/dev-python/pyside-1.0.9/work/pyside-qt4.7+1.0.9/libpyside/globalreceiverv2.cpp:129
#6  0xb6c718fc in PySide::GlobalReceiverV2::qt_metacall (this=0x9519b88, 
    call=QMetaObject::InvokeMetaMethod, id=5, args=0xbffbeef4)
    at /var/tmp/portage/dev-python/pyside-1.0.9/work/pyside-qt4.7+1.0.9/libpyside/globalreceiverv2.cpp:289
#7  0xb6ab76a7 in QMetaObject::metacall (object=0x9519b88, cl=QMetaObject::InvokeMetaMethod, 
    idx=5, argv=0xbffbeef4) at kernel/qmetaobject.cpp:237
#8  0xb6ac9b8c in QMetaObject::activate (sender=0x9760e40, m=0xb6c5d644, 
    local_signal_index=<optimized out>, argv=<optimized out>) at kernel/qobject.cpp:3278
#9  0xb6b21b27 in QTimer::timeout (this=0x9760e40) at .moc/debug-shared/moc_qtimer.cpp:128
#10 0xb6acfb01 in QTimer::timerEvent (this=0x9760e40, e=0xbffbf41c) at kernel/qtimer.cpp:271
#11 0xb6e35efb in QTimerWrapper::timerEvent (this=0x9760e40, arg__1=0xbffbf41c)
    at /var/tmp/portage/dev-python/pyside-1.0.9/work/pyside-1.0.9_build/PySide/QtCore/PySide/QtCore/qtimer_wrapper.cpp:233
#12 0xb6ac2e67 in QObject::event (this=0x9760e40, e=0x6) at kernel/qobject.cpp:1181
#13 0xb6e37717 in QTimerWrapper::event (this=0x9760e40, arg__1=0xbffbf41c)
    at /var/tmp/portage/dev-python/pyside-1.0.9/work/pyside-1.0.9_build/PySide/QtCore/PySide/QtCore/qtimer_wrapper.cpp:164


If this would explain the update_refs lock up then I will close this bug... (i also opening on on pyside)
msg149936 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2011-12-20 23:13
Hey, you found it!
PySide::DynamicSlotDataV2::callback() calls PyMethod_New() without getting the GIL.  The Python allocator is not thread-safe, operations are supposed to be serialized by this Global Interpreter Lock.

I suggest to modify this DynamicSlotDataV2::callback() and add the line "Shiboken::GilState gil;" at the beginning of the function.
msg149937 - (view) Author: David Butler (David.Butler) Date: 2011-12-20 23:18
resolved as wont fix, because its not python's fault :)
History
Date User Action Args
2022-04-11 14:57:24adminsetgithub: 57825
2011-12-20 23:18:43David.Butlersetstatus: open -> closed
resolution: wont fix
messages: + msg149937
2011-12-20 23:13:20amaury.forgeotdarcsetmessages: + msg149936
2011-12-20 22:50:03David.Butlersetmessages: + msg149935
2011-12-20 22:37:37David.Butlersetmessages: + msg149934
2011-12-20 22:20:31David.Butlersetmessages: + msg149933
2011-12-20 02:36:07jceasetmessages: + msg149902
2011-12-20 02:27:47jceasetmessages: + msg149901
2011-12-20 02:12:24David.Butlersetmessages: + msg149900
2011-12-20 00:23:28jceasetmessages: + msg149895
2011-12-19 18:07:14David.Butlersetfiles: + cpu_usage.png

messages: + msg149882
2011-12-17 18:38:20jceasetnosy: + jcea
messages: + msg149690
2011-12-16 23:43:33amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg149644
2011-12-16 23:27:12David.Butlercreate