Issue13616
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2011-12-16 23:27 by David.Butler, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
cpu_usage.png | David.Butler, 2011-12-19 18:07 |
Messages (13) | |||
---|---|---|---|
msg149641 - (view) | Author: David Butler (David.Butler) | Date: 2011-12-16 23:27 | |
CPU will sit a 100% indefinitely, this is also pretty tough (takes several hours) to reproduce if you step through the program it is stuck between 290 and 292 Modules/gcmodule.c: /* Set all gc_refs = ob_refcnt. After this, gc_refs is > 0 for all objects * in containers, and is GC_REACHABLE for all tracked gc objects not in * containers. */ static void update_refs(PyGC_Head *containers) { PyGC_Head *gc = containers->gc.gc_next; for (; gc != containers; gc = gc->gc.gc_next) { <-- line 290 assert(gc->gc.gc_refs == GC_REACHABLE); gc->gc.gc_refs = Py_REFCNT(FROM_GC(gc)); <-- line 292 /* Python's cyclic gc should never see an incoming refcount * of 0: if something decref'ed to 0, it should have been * deallocated immediately at that time. * Possible cause (if the assert triggers): a tp_dealloc * routine left a gc-aware object tracked during its teardown * phase, and did something-- or allowed something to happen -- * that called back into Python. gc can trigger then, and may * see the still-tracked dying object. Before this assert * was added, such mistakes went on to allow gc to try to * delete the object again. In a debug build, that caused * a mysterious segfault, when _Py_ForgetReference tried * to remove the object from the doubly-linked list of all * objects a second time. In a release build, an actual * double deallocation occurred, which leads to corruption * of the allocator's internal bookkeeping pointers. That's * so serious that maybe this should be a release-build * check instead of an assert? */ assert(gc->gc.gc_refs != 0); } } GDB backtrace: #0 0xb7750fcb in update_refs (generation=2) at Modules/gcmodule.c:290 #1 collect (generation=2) at Modules/gcmodule.c:873 #2 0xb77515e3 in collect_generations (basicsize=20) at Modules/gcmodule.c:996 #3 _PyObject_GC_Malloc (basicsize=20) at Modules/gcmodule.c:1457 #4 0xb775163e in _PyObject_GC_NewVar (tp=0xb77a0640, nitems=2) at Modules/gcmodule.c:1477 #5 0xb76df4b7 in PyTuple_New (size=2) at Objects/tupleobject.c:90 #6 0xb77230fd in PyEval_EvalFrameEx (f= Frame 0x889acdc, for file /usr/lib/python2.7/site-packages/twisted/python/failure.py, line 443, in __getstate__ (self=<Failure(count=3460, tb=<traceback at remote 0x9884fa4>, value=exceptions.AttributeError("DeferredList instance has no attribute 'resultList'",), parents=['exceptions.AttributeError', 'exceptions.StandardError', 'exceptions.Exception', 'exceptions.BaseException', '__builtin__.object', 'exceptions.AttributeError'], frames=[['_runCallbacks', '/usr/lib/python2.7/site-packages/twisted/internet/defer.py', 542, [('chain', [<Deferred(_chainedTo=None, called=True, _canceller=None, callbacks=[], result=<...>, _runningCallbacks=False) at remote 0x9f1cb2c>]), ('callback', <instancemethod at remote 0x9dbf98c>), ('self', <...>), ('args', (1, True)), ('current', <...>), ('item', ((<instancemethod at remote 0x9dbf98c>, (...), None), (<instancemethod at remote 0x9f1e11c>, (1, False), None))), ('finished', True), ('kw', {})], [('AlreadyCalledError', <type at remote 0x885dd6c>), ('DeferredFilesystemLock', <type...(truncated), throwflag=0) at Python/ceval.c:2178 #7 0xb772489a in fast_function (f= Frame 0x889ab84, for file /usr/lib/python2.7/site-packages/twisted/python/failure.py, line 452, in cleanFailure (self=<Failure(count=3460, tb=<traceback at remote 0x9884fa4>, value=exceptions.AttributeError("DeferredList instance has no attribute 'resultList'",), parents=['exceptions.AttributeError', 'exceptions.StandardError', 'exceptions.Exception', 'exceptions.BaseException', '__builtin__.object', 'exceptions.AttributeError'], frames=[['_runCallbacks', '/usr/lib/python2.7/site-packages/twisted/internet/defer.py', 542, [('chain', [<Deferred(_chainedTo=None, called=True, _canceller=None, callbacks=[], result=<...>, _runningCallbacks=False) at remote 0x9f1cb2c>]), ('callback', <instancemethod at remote 0x9dbf98c>), ('self', <...>), ('args', (1, True)), ('current', <...>), ('item', ((<instancemethod at remote 0x9dbf98c>, (...), None), (<instancemethod at remote 0x9f1e11c>, (1, False), None))), ('finished', True), ('kw', {})], [('AlreadyCalledError', <type at remote 0x885dd6c>), ('DeferredFilesystemLock', <type...(truncated), throwflag=0) at Python/ceval.c:4099 #8 call_function (f= Frame 0x889ab84, for file /usr/lib/python2.7/site-packages/twisted/python/failure.py, line 452, in cleanFailure (self=<Failure(count=3460, tb=<traceback at remote 0x9884fa4>, value=exceptions.AttributeError("DeferredList instance has no attribute 'resultList'",), parents=['exceptions.AttributeError', 'exceptions.StandardError', 'exceptions.Exception', 'exceptions.BaseException', '__builtin__.object', 'exceptions.AttributeError'], frames=[['_runCallbacks', '/usr/lib/python2.7/site-packages/twisted/internet/defer.py', 542, [('chain', [<Deferred(_chainedTo=None, called=True, _canceller=None, callbacks=[], result=<...>, _runningCallbacks=False) at remote 0x9f1cb2c>]), ('callback', <instancemethod at remote 0x9dbf98c>), ('self', <...>), ('args', (1, True)), ('current', <...>), ('item', ((<instancemethod at remote 0x9dbf98c>, (...), None), (<instancemethod at remote 0x9f1e11c>, (1, False), None))), ('finished', True), ('kw', {})], [('AlreadyCalledError', <type at remote 0x885dd6c>), ('DeferredFilesystemLock', <type...(truncated), throwflag=0) at Python/ceval.c:4034 #9 PyEval_EvalFrameEx (f= Frame 0x889ab84, for file /usr/lib/python2.7/site-packages/twisted/python/failure.py, line 452, in cleanFailure (self=<Failure(count=3460, tb=<traceback at remote 0x9884fa4>, value=exceptions.AttributeError("DeferredList instance has no attribute 'resultList'",), parents=['exceptions.AttributeError', 'exceptions.StandardError', 'exceptions.Exception', 'exceptions.BaseException', '__builtin__.object', 'exceptions.AttributeError'], frames=[['_runCallbacks', '/usr/lib/python2.7/site-packages/twisted/internet/defer.py', 542, [('chain', [<Deferred(_chainedTo=None, called=True, _canceller=None, callbacks=[], result=<...>, _runningCallbacks=False) at remote 0x9f1cb2c>]), ('callback', <instancemethod at remote 0x9dbf98c>), ('self', <...>), ('args', (1, True)), ('current', <...>), ('item', ((<instancemethod at remote 0x9dbf98c>, (...), None), (<instancemethod at remote 0x9f1e11c>, (1, False), None))), ('finished', True), ('kw', {})], [('AlreadyCalledError', <type at remote 0x885dd6c>), ('DeferredFilesystemLock', <type...(truncated), throwflag=0) at Python/ceval.c:2666 #10 0xb772489a in fast_function (f= Frame 0x9e4fea4, for file /usr/lib/python2.7/site-packages/twisted/internet/defer.py, line 578, in _runCallbacks (self=<Deferred(_chainedTo=None, called=True, _canceller=None, callbacks=[], result=<Failure(count=3460, tb=<traceback at remote 0x9884fa4>, value=exceptions.AttributeError("DeferredList instance has no attribute 'resultList'",), parents=['exceptions.AttributeError', 'exceptions.StandardError', 'exceptions.Exception', 'exceptions.BaseException', '__builtin__.object', 'exceptions.AttributeError'], frames=[['_runCallbacks', '/usr/lib/python2.7/site-packages/twisted/internet/defer.py', 542, [('chain', [<...>]), ('callback', <instancemethod at remote 0x9dbf98c>), ('self', <...>), ('args', (1, True)), ('current', <...>), ('item', ((<instancemethod at remote 0x9dbf98c>, (...), None), (<instancemethod at remote 0x9f1e11c>, (1, False), None))), ('finished', True), ('kw', {})], [('AlreadyCalledError', <type at remote 0x885dd6c>), ('DeferredFilesystemLock', <type at remote 0x8860f54>), ('DebugInfo', <classob...(truncated), throwflag=0) at Python/ceval.c:4099 #11 call_function (f= Frame 0x9e4fea4, for file /usr/lib/python2.7/site-packages/twisted/internet/defer.py, line 578, in _runCallbacks (self=<Deferred(_chainedTo=None, called=True, _canceller=None, callbacks=[], result=<Failure(count=3460, tb=<traceback at remote 0x9884fa4>, value=exceptions.AttributeError("DeferredList instance has no attribute 'resultList'",), parents=['exceptions.AttributeError', 'exceptions.StandardError', 'exceptions.Exception', 'exceptions.BaseException', '__builtin__.object', 'exceptions.AttributeError'], frames=[['_runCallbacks', '/usr/lib/python2.7/site-packages---Type <return> to continue, or q <return> to quit--- /twisted/internet/defer.py', 542, [('chain', [<...>]), ('callback', <instancemethod at remote 0x9dbf98c>), ('self', <...>), ('args', (1, True)), ('current', <...>), ('item', ((<instancemethod at remote 0x9dbf98c>, (...), None), (<instancemethod at remote 0x9f1e11c>, (1, False), None))), ('finished', True), ('kw', {})], [('AlreadyCalledError', <type at remote 0x885dd6c>), ('DeferredFilesystemLock', <type at remote 0x8860f54>), ('DebugInfo', <classob...(truncated), throwflag=0) at Python/ceval.c:4034 #12 PyEval_EvalFrameEx (f= Frame 0x9e4fea4, for file /usr/lib/python2.7/site-packages/twisted/internet/defer.py, line 578, in _runCallbacks (self=<Deferred(_chainedTo=None, called=True, _canceller=None, callbacks=[], result=<Failure(count=3460, tb=<traceback at remote 0x9884fa4>, value=exceptions.AttributeError("DeferredList instance has no attribute 'resultList'",), parents=['exceptions.AttributeError', 'exceptions.StandardError', 'exceptions.Exception', 'exceptions.BaseException', '__builtin__.object', 'exceptions.AttributeError'], frames=[['_runCallbacks', '/usr/lib/python2.7/site-packages/twisted/internet/defer.py', 542, [('chain', [<...>]), ('callback', <instancemethod at remote 0x9dbf98c>), ('self', <...>), ('args', (1, True)), ('current', <...>), ('item', ((<instancemethod at remote 0x9dbf98c>, (...), None), (<instancemethod at remote 0x9f1e11c>, (1, False), None))), ('finished', True), ('kw', {})], [('AlreadyCalledError', <type at remote 0x885dd6c>), ('DeferredFilesystemLock', <type at remote 0x8860f54>), ('DebugInfo', <classob...(truncated), throwflag=0) at Python/ceval.c:2666 #13 0xb772489a in fast_function (f= Frame 0x9b745cc, for file /usr/lib/python2.7/site-packages/twisted/internet/defer.py, line 455, in _startRunCallbacks (self=<Deferred(_chainedTo=None, called=True, _canceller=None, callbacks=[], result=<Failure(count=3460, tb=<traceback at remote 0x9884fa4>, value=exceptions.AttributeError("DeferredList instance has no attribute 'resultList'",), parents=['exceptions.AttributeError', 'exceptions.StandardError', 'exceptions.Exception', 'exceptions.BaseException', '__builtin__.object', 'exceptions.AttributeError'], frames=[['_runCallbacks', '/usr/lib/python2.7/site-packages/twisted/internet/defer.py', 542, [('chain', [<...>]), ('callback', <instancemethod at remote 0x9dbf98c>), ('self', <...>), ('args', (1, True)), ('current', <...>), ('item', ((<instancemethod at remote 0x9dbf98c>, (...), None), (<instancemethod at remote 0x9f1e11c>, (1, False), None))), ('finished', True), ('kw', {})], [('AlreadyCalledError', <type at remote 0x885dd6c>), ('DeferredFilesystemLock', <type at remote 0x8860f54>), ('DebugInfo', <cl...(truncated), throwflag=0) at Python/ceval.c:4099 #14 call_function (f= Frame 0x9b745cc, for file /usr/lib/python2.7/site-packages/twisted/internet/defer.py, line 455, in _startRunCallbacks (self=<Deferred(_chainedTo=None, called=True, _canceller=None, callbacks=[], result=<Failure(count=3460, tb=<traceback at remote 0x9884fa4>, value=exceptions.AttributeError("DeferredList instance has no attribute 'resultList'",), parents=['exceptions.AttributeError', 'exceptions.StandardError', 'exceptions.Exception', 'exceptions.BaseException', '__builtin__.object', 'exceptions.AttributeError'], frames=[['_runCallbacks', '/usr/lib/python2.7/site-packages/twisted/internet/defer.py', 542, [('chain', [<...>]), ('callback', <instancemethod at remote 0x9dbf98c>), ('self', <...>), ('args', (1, True)), ('current', <...>), ('item', ((<instancemethod at remote 0x9dbf98c>, (...), None), (<instancemethod at remote 0x9f1e11c>, (1, False), None))), ('finished', True), ('kw', {})], [('AlreadyCalledError', <type at remote 0x885dd6c>), ('DeferredFilesystemLock', <type at remote 0x8860f54>), ('DebugInfo', <cl...(truncated), throwflag=0) at Python/ceval.c:4034 #15 PyEval_EvalFrameEx (f= Frame 0x9b745cc, for file /usr/lib/python2.7/site-packages/twisted/internet/defer.py, line 455, in _startRunCallbacks (self=<Deferred(_chainedTo=None, called=True, _canceller=None, callbacks=[], result=<Failure(count=3460, tb=<traceback at remote 0x9884fa4>, value=exceptions.AttributeError("DeferredList instance has no attribute 'resultList'",), parents=['exceptions.AttributeError', 'exceptions.StandardError', 'exceptions.Exception', 'exceptions.BaseException', '__builtin__.object', 'exceptions.AttributeError'], frames=[['_runCallbacks', '/usr/lib/python2.7/site-packages/twisted/internet/defer.py', 542, [('chain', [<...>]), ('callback', <instancemethod at remote 0x9dbf98c>), ('self', <...>), ('args', (1, True)), ('current', <...>), ('item', ((<instancemethod at remote 0x9dbf98c>, (...), None), (<instancemethod at remote 0x9f1e11c>, (1, False), None))), ('finished', True), ('kw', {})], [('AlreadyCalledError', <type at remote 0x885dd6c>), ('DeferredFilesystemLock', <type at remote 0x8860f54>), ('DebugInfo', <cl...(truncated), throwflag=0) at Python/ceval.c:2666 #16 0xb772489a in fast_function (f= Frame 0x9ee4c04, for file /usr/lib/python2.7/site-packages/twisted/internet/defer.py, line 361, in callback (self=<Deferred(_chainedTo=None, called=True, _canceller=None, callbacks=[], result=<Failure(count=3460, tb=<traceback at remote 0x9884fa4>, value=exceptions.AttributeError("DeferredList instance has no attribute 'resultList'",), parents=['exceptions.AttributeError', 'exceptions.StandardError', 'exceptions.Exception', 'exceptions.BaseException', '__builtin__.object', 'exceptions.AttributeError'], frames=[['_runCallbacks', '/usr/lib/python2.7/site-packages/twisted/internet/defer.py', 542, [('chain', [<...>]), ('callback', <instancemethod at remote 0x9dbf98c>), ('self', <...>), ('args', (1, True)), ('current', <...>), ('item', ((<instancemethod at remote 0x9dbf98c>, (...), None), (<instancemethod at remote 0x9f1e11c>, (1, False), None))), ('finished', True), ('kw', {})], [('AlreadyCalledError', <type at remote 0x885dd6c>), ('DeferredFilesystemLock', <type at remote 0x8860f54>), ('DebugInfo', <classobj at ...(truncated), throwflag=0) at Python/ceval.c:4099 #17 call_function (f= Frame 0x9ee4c04, for file /usr/lib/python2.7/site-packages/twisted/internet/defer.py, line 361, in callback (self=<Deferred(_chainedTo=None, called=True, _canceller=None, callbacks=[], result=<Failure(count=3460, tb=<traceback at remote 0x9884fa4>, value=exceptions.AttributeError("DeferredList instance has no attribute 'resultList'",), parents=['exceptions.AttributeError', 'exceptions.StandardError', 'exceptions.Exception', 'exceptions.BaseException', '__builtin__.object', 'exceptions.AttributeError'], frames=[['_runCallbacks', '/usr/lib/python2.7/site-packages/twisted/internet/defer.py', 542, [('chain', [<...>]), ('callback', <instancemethod at remote 0x9dbf98c>), ('self', <...>), ('args', (1, True)), ('current', <...>), ('item', ((<instancemethod at remote 0x9dbf98c>, (...), None), (<instancemethod at remote 0x9f1e11c>, (1, False), None))), ('finished', True), ('kw', {})], [('AlreadyCalledError', <type at remote 0x885dd6c>), ('DeferredFilesystemLock', <type at remote 0x8860f54>), ('DebugInfo', <classobj at ...(truncated), throwflag=0) at Python/ceval.c:4034 |
|||
msg149644 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * | Date: 2011-12-16 23:43 | |
Are you sure that the program is really stuck in the gc module? The loop you mention has to go through all objects of the process. It's possible that it allocated many objects, that one garbage collection takes a few seconds, and even that most of the time is spent inside the garbage collector. But can you check whether this function terminates? For example, you could put a breakpoint in the collect() function. |
|||
msg149690 - (view) | Author: Jesús Cea Avión (jcea) * | Date: 2011-12-17 18:38 | |
The only way of this thing actually happening is if the GC link list has actually a cycle. Without a testcase to try to reproduce, it can't be debugged. David, can you reproduce this consistently, even if it takes a few hours?. As Amaury pointed out, if you have tons of objects, GC can take a while. And if your memory is paged out to swap, this can actually be VERY slow. To know if this is actually a infinite loop, when you think it is stuck in the loop, just check current value of "gc" and set a breakpoint for that value in the loop. So GDB will stop again if "gc" gets the same value again and so we are actually in a loop. In the meantime, check your page-in rate. If you are paging-in furiously, you are not in a loop, you are (mechanically slow) bringing objects back to memory. Can you tell us how big your python process is (in memory), how much physical RAM you have, and if some other processes are competing for memory?. If you can confirm that we are actually in an infinite loop... there is little that we can do without some testcase we can reproduce. The key here is that if you can reproduce, we can slowly move forward. |
|||
msg149882 - (view) | Author: David Butler (David.Butler) | Date: 2011-12-19 18:07 | |
sorry for the delay, I had to wait until the problem occurred again... I gdb'ed into the process again, the backtrace is a little different this time... (gdb) bt #0 0xb76adfc6 in update_refs (containers=<optimized out>) at Modules/gcmodule.c:292 #1 collect (generation=2) at Modules/gcmodule.c:873 #2 0xb76ae5e3 in collect_generations () at Modules/gcmodule.c:996 #3 _PyObject_GC_Malloc (basicsize=16) at Modules/gcmodule.c:1457 #4 0xb76ae684 in _PyObject_GC_New (tp=0xb76f6aa0) at Modules/gcmodule.c:1467 #5 0xb761a8bd in list_iter (seq= [((None, u'to'), u'aves.rule'), ((None, u'rel'), u'ManyToOneRel'), ((None, u'name'), u'rule')]) at Objects/listobject.c:2873 #6 0xb75eef45 in PyObject_GetIter (o= [((None, u'to'), u'aves.rule'), ((None, u'rel'), u'ManyToOneRel'), ((None, u'name'), u'rule')]) at Objects/abstract.c:3083 #7 0xb7680f35 in PyEval_EvalFrameEx (f= Frame 0x9bcc0c4, for file /usr/lib/python2.7/xml/dom/pulldom.py, line 88, in startElementNS (self=<PullDOM(documentFactory=<DOMImplementation() at remote 0x9c217cc>, _ns_contexts=[{'http://www.w3.org/XML/1998/namespace': 'xml'}], pending_events=None, _current_context={...}, elementStack=[<Document(implementation=<...>, _elem_info={}, doctype=None, _id_search_stack=None, childNodes=<NodeList at remote 0xa2fac6c>, _id_cache={}) at remote 0xa2fac4c>, <Element(ownerDocument=<...>, nodeName=u'django-objects', nextSibling=None, parentNode=<...>, namespaceURI=None, prefix=None, _attrsNS={(None, u'version'): <Attr(ownerDocument=<...>, nodeName=u'version', ownerElement=<...>, value=u'1.0', namespaceURI=None, prefix=None, nodeValue=u'1.0', childNodes=<NodeList at remote 0xa2faccc>, name=u'version') at remote 0xa2facac>}, tagName=u'django-objects', childNodes=<NodeList at remote 0xa2fabcc>, _attrs={u'version': <...>}) at remote 0xa2faa8c>, <Element(ownerDocument=<...>, nodeName=u'object', namespaceURI=None, prefix=None,...(truncated), throwflag=0) at Python/ceval.c:2483 #8 0xb768189a in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=<optimized out>, func=<optimized out>) at Python/ceval.c:4099 #9 call_function (oparg=<optimized out>, pp_stack=<optimized out>) at Python/ceval.c:4034 #10 PyEval_EvalFrameEx (f= Frame 0x9ed8f74, for file /usr/lib/python2.7/xml/sax/expatreader.py, line 338, in start_element_ns (self=<ExpatParser(_namespaces=1, _parser=<pyexpat.xmlparser at remote 0xa22fe2c>, _external_ges=1, _source=<InputSource(_InputSource__charfile=None, _InputSource__bytefile=None, _InputSource__public_id=None, _InputSource__system_id=None, _InputSource__encoding=None) at remote 0xa2fabac>, _decl_handler_prop=None, _bufsize=65516, _cont_handler=<PullDOM(documentFactory=<DOMImplementation() at remote 0x9c217cc>, _ns_contexts=[{'http://www.w3.org/XML/1998/namespace': 'xml'}], pending_events=None, _current_context={...}, elementStack=[<Document(implementation=<...>, _elem_info={}, doctype=None, _id_search_stack=None, childNodes=<NodeList at remote 0xa2fac6c>, _id_cache={}) at remote 0xa2fac4c>, <Element(ownerDocument=<...>, nodeName=u'django-objects', nextSibling=None, parentNode=<...>, namespaceURI=None, prefix=None, _attrsNS={(None, u'version'): <Attr(ownerDocument=<...>, nodeName=u'version', ownerElement=<...>, va...(truncated), throwflag=0) at Python/ceval.c:2666 #11 0xb7683508 in PyEval_EvalCodeEx (co=0x9dd6c80, globals= {'SAXNotRecognizedException': <type at remote 0x9c6967c>, 'expat': <module at remote 0x96cbd94>, '__name__': 'xml.sax.expatreader', 'feature_namespace_prefixes': 'http://xml.org/sax/features/namespace-prefixes', 'AttributesNSImpl': <classobj at remote 0x9d8db0c>, 'feature_string_interning': 'http://xml.org/sax/features/string-interning', '__package__': 'xml.sax', 'versi---Type <return> to continue, or q <return> to quit---step on': '0.20', 'AttributesImpl': <classobj at remote 0x9d8dadc>, 'feature_namespaces': 'http://xml.org/sax/features/namespaces', '__doc__': "\nSAX driver for the pyexpat C module. This driver works with\npyexpat.__version__ == '2.22'.\n", 'feature_validation': 'http://xml.org/sax/features/validation', 'create_parser': <function at remote 0x9ddba74>, '__builtins__': {'bytearray': <type at remote 0xb76ed660>, 'IndexError': <type at remote 0xb76f1520>, 'all': <built-in function all>, 'help': <_Helper at remote 0xb73643ac>, 'vars': <built-in function vars>, 'SyntaxError': <type at remote 0xb76f11a0>, 'unicode': <type at remote 0xb7701340>, 'UnicodeD...(truncated), locals=0x0, args=0x9d94510, argcount=3, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3253 #12 0xb7614f2a in function_call (func=<function at remote 0x9dd502c>, arg= (<ExpatParser(_namespaces=1, _parser=<pyexpat.xmlparser at remote 0xa22fe2c>, _external_ges=1, _source=<InputSource(_InputSource__charfile=None, _InputSource__bytefile=None, _InputSource__public_id=None, _InputSource__system_id=None, _InputSource__encoding=None) at remote 0xa2fabac>, _decl_handler_prop=None, _bufsize=65516, _cont_handler=<PullDOM(documentFactory=<DOMImplementation() at remote 0x9c217cc>, _ns_contexts=[{'http://www.w3.org/XML/1998/namespace': 'xml'}], pending_events=None, _current_context={...}, elementStack=[<Document(implementation=<...>, _elem_info={}, doctype=None, _id_search_stack=None, childNodes=<NodeList at remote 0xa2fac6c>, _id_cache={}) at remote 0xa2fac4c>, <Element(ownerDocument=<...>, nodeName=u'django-objects', nextSibling=None, parentNode=<...>, namespaceURI=None, prefix=None, _attrsNS={(None, u'version'): <Attr(ownerDocument=<...>, nodeName=u'version', ownerElement=<...>, value=u'1.0', namespaceURI=None, prefix=None, nodeValue=u'1.0', childNodes=<NodeList at remote 0xa2faccc>,...(truncated), kw=0x0) at Objects/funcobject.c:526 --- truncated --- (gdb) b collect Breakpoint 1 at 0xb76ade3c: file Modules/gcmodule.c, line 822. (gdb) cont Continuing. ( here i let it run for a about a minute ) ^C Program received signal SIGINT, Interrupt. 0xb76adfcb in update_refs (containers=<optimized out>) at Modules/gcmodule.c:290 290 in Modules/gcmodule.c (gdb) print gc $1 = (PyGC_Head *) 0xb770df00 (gdb) step 292 in Modules/gcmodule.c (gdb) info locals gc = 0xb770df00 (gdb) step 290 in Modules/gcmodule.c (gdb) info locals gc = 0xb770df00 I was going to leave the process running, and the debugger up, but I forgot that a reboot script was still enabled, and it caused the machine to reboot... so i am recompiling python with CFLAGS="-O0 ..." to try to fix some "value optimized out" issue in gdb... need to try to figure out what object its getting caught on... Could this be caused by one of the loadable libraries that I am importing? ( pyside, psycopg2, ... ) and one of those is making an object that isn't playing nice with the gc? Also, I have attached a histogram of the cpu usage I'll try to answer more of your questions when it happens next |
|||
msg149895 - (view) | Author: Jesús Cea Avión (jcea) * | Date: 2011-12-20 00:23 | |
It seems to be a real infinite loop. Bad thing. Could be a bug in an extension, difficult to say. This is going to be VERY difficult to debug without a reproductible case we can try. Could you possibly check the object type of the "infinite loop" object?. If the loop is from an object to itself, maybe you can instrumentalize python to detect this loop when created. Add to your code "gc.collect()" is a frequent loop. It is going to suck CPU, but could reproduce the issue faster and make it easier to diagnose. |
|||
msg149900 - (view) | Author: David Butler (David.Butler) | Date: 2011-12-20 02:12 | |
2011/12/19 Jesús Cea Avión <report@bugs.python.org> I am willing to work toward a simplified test case, but its going to be difficult, I am hoping that I can narrow down the source of the problem... Forgive me, I'm gdb is actually a new thing to me... how could I check the object type? Also what do you mean exactly when you say "instrumentalize" I started some valgrind testing today, (to try to figure out whats causing some segfaults in a possibly unrelated bug, will this interfere with testing for this bug?) > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue13616> > _______________________________________ > |
|||
msg149901 - (view) | Author: Jesús Cea Avión (jcea) * | Date: 2011-12-20 02:27 | |
Instrumentalize: check for this pathological case (an object with a GC pointer back to itself) in the code that modify the GC pointers. Lets say, everytime code change the pointers, you test for this. Luckily you can learn the codepath creating this situation. Change and recompile python code for checking this. Read Include/object.h. You will see that ANY python object has, at least, two components: a reference counter and a pointer to its type. Follow that pointer to type and get its name. Let me try an example... """ gdb python GNU gdb (GDB) 7.3.1 Copyright (C) 2011 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i386-pc-solaris2.10". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /usr/local/bin/python...done. (gdb) br PyTuple_New Function "PyTuple_New" not defined. Make breakpoint pending on future shared library load? (y or [n]) y Breakpoint 1 (PyTuple_New) pending. (gdb) r Starting program: /usr/local/bin/python [Thread debugging using libthread_db enabled] [New Thread 1 (LWP 1)] [Switching to Thread 1 (LWP 1)] Breakpoint 1, PyTuple_New (size=0) at Objects/tupleobject.c:50 50 Objects/tupleobject.c: No such file or directory. in Objects/tupleobject.c (gdb) fin Run till exit from #0 PyTuple_New (size=0) at Objects/tupleobject.c:50 0xfee650e4 in PyType_Ready (type=0xfef5cd00) at Objects/typeobject.c:4077 4077 Objects/typeobject.c: No such file or directory. in Objects/typeobject.c Value returned is $1 = (PyObject *) 0x806202c (gdb) print $1->ob_type $2 = (struct _typeobject *) 0xfef5cb40 (gdb) print *($1->ob_type) $3 = {ob_refcnt = 1, ob_type = 0xfef5cde0, ob_size = 0, tp_name = 0xfef07542 "tuple", tp_basicsize = 12, tp_itemsize = 4, tp_dealloc = 0xfee568f0 <tupledealloc>, tp_print = 0xfee56c80 <tupleprint>, tp_getattr = 0, tp_setattr = 0, tp_compare = 0, tp_repr = 0xfee57380 <tuplerepr>, tp_as_number = 0x0, tp_as_sequence = 0xfef52500, tp_as_mapping = 0xfef52528, tp_hash = 0xfee56850 <tuplehash>, tp_call = 0, tp_str = 0, tp_getattro = 0xfee416d0 <PyObject_GenericGetAttr>, tp_setattro = 0, tp_as_buffer = 0x0, tp_flags = 67519979, tp_doc = 0xfef3b580 "tuple() -> empty tuple\ntuple(iterable) -> tuple initialized from iterable's items\n\nIf the argument is a tuple, the return value is the same object.", tp_traverse = 0xfee56440 <tupletraverse>, tp_clear = 0, tp_richcompare = 0xfee56b10 <tuplerichcompare>, tp_weaklistoffset = 0, tp_iter = 0xfee56a60 <tuple_iter>, tp_iternext = 0, tp_methods = 0xfef52540, tp_members = 0x0, tp_getset = 0x0, tp_base = 0x0, tp_dict = 0x0, tp_descr_get = 0, tp_descr_set = 0, tp_dictoffset = 0, tp_init = 0, tp_alloc = 0, tp_new = 0xfee57680 <tuple_new>, tp_free = 0xfeede6a0 <PyObject_GC_Del>, tp_is_gc = 0, tp_bases = 0x0, tp_mro = 0x0, tp_cache = 0x0, tp_subclasses = 0x0, tp_weaklist = 0x0, tp_del = 0, tp_version_tag = 0} """ I break in the tuple creation routine. Then I let it finish. It is going to return a PyObject, a generic python object. I follow the type pointer and get the name of the type: 'tp_name = 0xfef07542 "tuple"'. I already knew the result was going to be a tuple, but it is nice to cross-check :) Sometimes it is more complicated, and you have to cast pointers. Use the source code as your map. Sorry, it is difficult to teach how to do such a low level debug remotely :). PS: Remember to add "gc.collect()" in your mainloop. In this way you will hit the problem faster, because you don't have to wait for Python to invoke the collector implicitly. If you have tons of objects this can be actually slower. Try. |
|||
msg149902 - (view) | Author: Jesús Cea Avión (jcea) * | Date: 2011-12-20 02:36 | |
David, if you get desperate, let us know. If you can deal with Mercurial and compiling Python code, I could post a mercurial repository/branch with code modifications to help you to debug this. But it is almost Christmas and I am VERY busy and have to do a few long distance trips yet, so that better be the last option. First try to advance a bit yourself. At least you will learn a lot. But don't let this bug to languish... If you can manage to reproduce the issue more easily, that would be VERY useful. Tell us too what external modules are you using. Good luck. |
|||
msg149933 - (view) | Author: David Butler (David.Butler) | Date: 2011-12-20 22:20 | |
I have 10 identical test machines running this this code ( operating systems are cloned ). I am not usning valgrind in these tests, it was causing various issues... (gdb) info sharedlibrary From To Syms Read Shared Object Library 0xb75a02b0 0xb76baa38 Yes /usr/lib/libpython2.7.so.1.0 0xb75695a0 0xb7575878 Yes /lib/libpthread.so.0 0xb7422060 0xb7529374 Yes /lib/libc.so.6 0xb7404bd0 0xb7405b28 Yes /lib/libdl.so.2 0xb73ffbc0 0xb7400498 Yes /lib/libutil.so.1 0xb73dd360 0xb73f7e58 Yes /lib/libm.so.6 0xb77288d0 0xb773f97f Yes /lib/ld-linux.so.2 0xb7721310 0xb77226d8 Yes /usr/lib/python2.7/lib-dynload/time.so 0xb728c130 0xb728e4b8 Yes /usr/lib/python2.7/site-packages/zope/interface/_zope_interface_coptimizations.so 0xb7284060 0xb7286cf8 Yes /usr/lib/python2.7/lib-dynload/strop.so 0xb72772a0 0xb727d648 Yes /usr/lib/python2.7/lib-dynload/_socket.so 0xb7271c50 0xb72729e8 Yes /usr/lib/python2.7/lib-dynload/_functools.so 0xb726a7d0 0xb726deb8 Yes /usr/lib/python2.7/lib-dynload/_ssl.so 0xb721ec00 0xb7253f28 Yes /usr/lib/libssl.so.1.0.0 0xb70f47c0 0xb71d3dd8 Yes /usr/lib/libcrypto.so.1.0.0 0xb70a3b20 0xb70af7b8 Yes /lib/libz.so.1 0xb7264130 0xb72653c8 Yes /usr/lib/python2.7/lib-dynload/cStringIO.so 0xb709aab0 0xb709e108 Yes /usr/lib/python2.7/lib-dynload/_struct.so 0xb70934c0 0xb70958a8 Yes /usr/lib/python2.7/lib-dynload/operator.so 0xb708b7e0 0xb708e868 Yes /usr/lib/python2.7/lib-dynload/_collections.so 0xb707db00 0xb7083dd8 Yes /usr/lib/python2.7/lib-dynload/itertools.so 0xb725f7d0 0xb725ff98 Yes /usr/lib/python2.7/lib-dynload/_bisect.so 0xb7077920 0xb7078ee8 Yes /usr/lib/python2.7/lib-dynload/_heapq.so 0xb7073970 0xb7074028 Yes /usr/lib/python2.7/lib-dynload/grp.so 0xb771d510 0xb771d6f8 Yes /usr/lib/python2.7/site-packages/twisted/python/_initgroups.so 0xb7062bf0 0xb706d3f8 Yes /usr/lib/python2.7/lib-dynload/datetime.so 0xb705ab90 0xb705d358 Yes /usr/lib/python2.7/lib-dynload/binascii.so 0xb7055a00 0xb7057018 Yes /usr/lib/python2.7/lib-dynload/fcntl.so 0xb704d9a0 0xb7051258 Yes /usr/lib/python2.7/lib-dynload/math.so 0xb7046e80 0xb7047d48 Yes /usr/lib/python2.7/lib-dynload/_hashlib.so 0xb7042d90 0xb7043d78 Yes /usr/lib/python2.7/lib-dynload/_random.so 0xb703c460 0xb703e548 Yes /usr/lib/python2.7/lib-dynload/select.so 0xb7036360 0xb7036e18 Yes /usr/lib/python2.7/lib-dynload/termios.so 0xb7032660 0xb7032968 Yes /usr/lib/python2.7/site-packages/twisted/internet/_sigchld.so 0xb6ce7930 0xb6ed3188 Yes /usr/lib/python2.7/site-packages/PySide/QtCore.so 0xb6c903b0 0xb6ca3b58 Yes /usr/lib/libpyside-python2.7.so.1.0 0xb7016230 0xb7023058 Yes /usr/lib/libshiboken-python2.7.so.1.0 0xb69ba370 0xb6b4c358 Yes /usr/lib/qt4/libQtCore.so.4 0xb68cd7e0 0xb693dac8 Yes /usr/lib/gcc/i686-pc-linux-gnu/4.5.3/libstdc++.so.6 0xb686a390 0xb6880a18 Yes /usr/lib/gcc/i686-pc-linux-gnu/4.5.3/libgcc_s.so.1 0xb6860e90 0xb68655c8 Yes /lib/librt.so.1 ---Type <return> to continue, or q <return> to quit--- 0xb5d81c30 0xb65d9cf8 Yes /usr/lib/python2.7/site-packages/PySide/QtGui.so 0xb54d9e50 0xb5ba8a38 Yes /usr/lib/qt4/libQtGui.so.4 0xb5394b80 0xb53b0a58 Yes /usr/lib/libpng15.so.15 0xb5315a80 0xb5377748 Yes /usr/lib/libfreetype.so.6 0xb5305910 0xb530a278 Yes /usr/lib/libSM.so.6 0xb52eee00 0xb52fddd8 Yes /usr/lib/libICE.so.6 0xb52e3740 0xb52e8be8 Yes /usr/lib/libXrender.so.1 0xb52db5c0 0xb52dfa48 Yes /usr/lib/libXrandr.so.2 0xb52b0f50 0xb52ca548 Yes /usr/lib/libfontconfig.so.1 0xb529f220 0xb52a8e98 Yes /usr/lib/libXext.so.6 0xb519b580 0xb5220388 Yes /usr/lib/libX11.so.6 0xb5173200 0xb5181a38 Yes /lib/libbz2.so.1 0xb5151580 0xb5167338 Yes /usr/lib/libexpat.so.1 0xb513e0c0 0xb514b988 Yes /usr/lib/libxcb.so.1 0xb5131b60 0xb51328b8 Yes /usr/lib/libXau.so.6 0xb512c0f0 0xb512dab8 Yes /usr/lib/libXdmcp.so.6 0xb50417e0 0xb50f0cf8 Yes /usr/lib/python2.7/site-packages/PySide/QtNetwork.so 0xb4f13ff0 0xb4fd6f18 Yes /usr/lib/qt4/libQtNetwork.so.4 0xb702d1e0 0xb702ebd8 Yes /usr/lib/python2.7/lib-dynload/_locale.so 0xb4ed2ab0 0xb4ee4fa8 Yes /usr/lib/python2.7/site-packages/psycopg2/_psycopg.so 0xb4ea7d90 0xb4ebe638 Yes (*) /usr/lib/libpq.so.5 0xb4e8fc90 0xb4e9da98 Yes /usr/lib/python2.7/lib-dynload/cPickle.so 0xb4e84c70 0xb4e89498 Yes /usr/lib/python2.7/lib-dynload/array.so 0xb4e69d90 0xb4e7cb08 Yes /usr/lib/python2.7/lib-dynload/_ctypes.so 0xb4e5f310 0xb4e62638 Yes /usr/lib/libffi.so.5 0xb4ec8100 0xb4ec9ab8 Yes /lib/libuuid.so.1 0xb6fb2200 0xb6fb4618 Yes /usr/lib/libXfixes.so 0xb6fa93e0 0xb6fae248 Yes /usr/lib/libXcursor.so.1 0xb6f9ad70 0xb6fa4808 Yes /usr/lib/libXi.so 0xb4e316d0 0xb4e56308 Yes /usr/lib/python2.7/site-packages/PIL/_imaging.so 0xb4ddfff0 0xb4e134c8 Yes /usr/lib/libjpeg.so.8 0xb6f79d00 0xb6f8d748 Yes /usr/lib/python2.7/lib-dynload/_io.so 0xb6f6fef0 0xb6f724d8 Yes /usr/lib/python2.7/lib-dynload/zlib.so 0xb4d4c190 0xb4db81e8 Yes /usr/lib/python2.7/site-packages/PySide/QtWebKit.so 0xb4049e70 0xb4ade738 Yes /usr/lib/qt4/libQtWebKit.so.4 0xb3e22850 0xb3e90648 Yes /usr/lib/libsqlite3.so.0 0xb3de9c20 0xb3e13f28 Yes /usr/lib/qt4/libphonon.so.4 0xb3dcd0d0 0xb3dd0548 Yes /usr/lib/python2.7/site-packages/alsaaudio.so 0xb3d302c0 0xb3db2648 Yes /usr/lib/libasound.so.2 0xb6fb9b60 0xb6fbcef8 Yes /usr/lib/qt4/plugins/imageformats/libqgif.so 0xb3509f30 0xb350cd88 Yes /usr/lib/qt4/plugins/imageformats/libqico.so 0xaebf0710 0xaebf3c28 Yes /usr/lib/qt4/plugins/imageformats/libqjpeg.so 0xade9ad30 0xadea6eb8 Yes (*) /usr/lib/libresolv.so 0xade8dfa0 0xade955c8 Yes /lib/libnss_files.so.2 0xade86da0 0xade89e18 Yes /lib/libnss_dns.so.2 ---Type <return> to continue, or q <return> to quit--- 0xad63b080 0xad63c6e8 Yes /usr/lib/qt4/plugins/imageformats/libqsvg.so 0xad601ac0 0xad633038 Yes /usr/lib/qt4/libQtSvg.so.4 0xad4a5610 0xad4ab728 Yes /usr/lib/python2.7/lib-dynload/_json.so 0xad499640 0xad49f578 Yes /usr/lib/python2.7/lib-dynload/pyexpat.so 0xad018350 0xad01a708 Yes /usr/lib/python2.7/lib-dynload/unicodedata.so I am importing twisted, pyside, django, psycopg2, and datetuil 3 of the machines had a stall much like the last two backtraces, except now originating from my gc.collect() call: #0 0xb769e9ff in update_refs (containers=0xb7709328) at Modules/gcmodule.c:290 #1 0xb769f55f in collect (generation=2) at Modules/gcmodule.c:873 #2 0xb769f98d in gc_collect (self=0x0, args=(), kws=0x0) at Modules/gcmodule.c:1067 #3 0xb75f0e91 in PyCFunction_Call (func=<built-in function collect>, arg=(), kw=0x0) at Objects/methodobject.c:85 #4 0xb7666287 in call_function (pp_stack=0xbfa61960, oparg=0) at Python/ceval.c:4013 #5 0xb7662dff in PyEval_EvalFrameEx (f= Frame 0xb0b5b9c, for file /usr/lib/python2.7/site-packages/qt4reactor.py, line 261, in doIteration (self=<QtReactor(waker=<_UnixWaker(i=13, fileno=<function at remote 0xa688994>, reactor=<...>, o=14) at remote 0xa684dac>, threadCallQueue=[], _cancellations=0, _justStopped=False, _newTimedCalls=[], _pendingTimedCalls=[<DelayedCall(resetter=<instancemethod at remote 0xb00016c>, seconds=<built-in function time>, args=(), canceller=<instancemethod at remote 0xad81e64>, delayed_time=0, kw={}, func=<LoopingCall(a=(...), interval=2, clock=<...>, _expectNextCallAt=<float at remote 0xb03ac1c>, f=<instancemethod at remote 0xa97234c>, deferred=<Deferred(_canceller=None, callbacks=[((<function at remote 0x9e59f44>, None, None), (<instancemethod at remote 0xa972374>, (...), {}))]) at remote 0xa9710cc>, running=True, kw={}, starttime=<float at remote 0x9ed5f3c>, call=<...>, _runAtStart=True) at remote 0xa96e76c>, time=<float at remote 0xac30db4>, cancelled=0, called=0) at remote 0xb03482c>, <DelayedCall(resetter=<instancem...(truncated), throwflag=0) at Python/ceval.c:2666 this is weird, but it seems like the type pointer is 0x0 (no type?): (gdb) print {PyObject} ((PyObject *)(((PyGC_Head *)*gc)+1)) $1 = {ob_refcnt = 700, ob_type = 0x0} gc gets stuck again in update_refs, and if you look at its members, gc.gc_next == gc.gc_prev == gc .... (gdb) print {PyGC_Head} gc $3 = {gc = {gc_next = 0xb77a3300, gc_prev = 0xb77a3300, gc_refs = 700}, dummy = 1.2675072995965765876244656259641832e-4721} Two of the machines produced core files, first core: (Incidentally, this machine, once reset was one of the ones that was counted in the previous list that locked up at update_refs) Program terminated with signal 6, Aborted. #0 0xb76fe424 in __kernel_vsyscall () (gdb) info thread Id Target Id Frame 12 Thread 0xb0afab70 (LWP 2752) 0xb76fe424 in __kernel_vsyscall () 11 Thread 0xb12fbb70 (LWP 2751) 0xb76fe424 in __kernel_vsyscall () 10 Thread 0xaf2f7b70 (LWP 2755) 0xb76fe424 in __kernel_vsyscall () 9 Thread 0xb22fdb70 (LWP 2749) 0xb76fe424 in __kernel_vsyscall () 8 Thread 0xb2afeb70 (LWP 2748) 0xb76fe424 in __kernel_vsyscall () 7 Thread 0xb1afcb70 (LWP 2750) 0xb76fe424 in __kernel_vsyscall () 6 Thread 0xb32ffb70 (LWP 2747) 0xb76fe424 in __kernel_vsyscall () 5 Thread 0xaeaeeb70 (LWP 2757) 0xb76fe424 in __kernel_vsyscall () 4 Thread 0xb02f9b70 (LWP 2753) 0xb76fe424 in __kernel_vsyscall () 3 Thread 0xb3ce6b70 (LWP 2740) 0xb76fe424 in __kernel_vsyscall () 2 Thread 0xafaf8b70 (LWP 2754) 0xb7039cda in __i686.get_pc_thunk.bx () from /usr/lib/python2.7/lib-dynload/datetime.so * 1 Thread 0xb73af6c0 (LWP 2597) 0xb76fe424 in __kernel_vsyscall () I'm not at all sure whats going on here... but the file for the service ends with this: Fatal Python error: GC object already tracked The other core file is different: Program terminated with signal 11, Segmentation fault. #0 0xb766f3ee in call_function (pp_stack=0xbfda38c0, oparg=0) at Python/ceval.c:4047 4047 Python/ceval.c: No such file or directory. in Python/ceval.c #0 0xb766f3ee in call_function (pp_stack=0xbfda38c0, oparg=0) at Python/ceval.c:4047 #1 0xb766bdff in PyEval_EvalFrameEx (f=<unknown at remote 0xa0bf31c>, throwflag=0) at Python/ceval.c:2666 #2 0xb766f542 in fast_function (func=<function at remote 0x922733c>, pp_stack=0xbfda3ac0, n=1, na=1, nk=0) at Python/ceval.c:4099 #3 0xb766f383 in call_function (pp_stack=0xbfda3ac0, oparg=0) at Python/ceval.c:4034 #4 0xb766bdff in PyEval_EvalFrameEx (f=<unknown at remote 0xa0a69ac>, throwflag=0) at Python/ceval.c:2666 #5 0xb766f542 in fast_function (func=<function at remote 0x92272cc>, pp_stack=0xbfda3cc0, n=2, na=2, nk=0) at Python/ceval.c:4099 #6 0xb766f383 in call_function (pp_stack=0xbfda3cc0, oparg=1) at Python/ceval.c:4034 #7 0xb766bdff in PyEval_EvalFrameEx (f=<unknown at remote 0xa0c27a4>, throwflag=0) at Python/ceval.c:2666 #8 0xb766d7ad in PyEval_EvalCodeEx (co=0x9212cc8, globals= {'DeferredFilesystemLock': <type at remote 0x92691b4>, 'AlreadyCalledError': <type at remote 0x912420c>, 'returnValue': <function at remote 0x9227684>, '_deferGenerator': <function at remote 0x92275a4>, 'DebugInfo': <classobj at remote 0x922032c>, '_parseDListResult': <function at remote 0x922702c>, 'FirstError': <type at remote 0x91253a4>, 'deferredGenerator': <function at remote 0x922764c>, 'fail': <function at remote 0x9217e64>, 'waitForDeferred': <classobj at remote 0x922044c>, 'inlineCallbacks': <function at remote 0x922772c>, 'logError': <function at remote 0x9217ca4>, 'log': <module at remote 0xb6ff5a64>, '__all__': ['Deferred', 'DeferredList', 'succeed', 'fail', 'FAILURE', 'SUCCESS', 'AlreadyCalledError', 'TimeoutError', 'gatherResults', 'maybeDeferred', 'waitForDeferred', 'deferredGenerator', 'inlineCallbacks', 'returnValue', 'DeferredLock', 'DeferredSemaphore', 'DeferredQueue', 'DeferredFilesystemLock', 'AlreadyTryingToLockError'], 'CancelledError': <type at remote 0x91243e4>, '__package__': 'twiste...(truncated), locals=0x0, args=0xa2fd8d8, argcount=2, kws=0xb73a1038, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3253 (past this point in the backtrace, you get alot of python tracebacks ( something in the core file is messing with the python-gdb extension ) na = 0 nk = 0 n = 0 pfunc = 0xa0bf47c func = <classobj at remote 0x931c9bc> x = <Failure(count=60, tb=None, value=exceptions.SystemError('Python/traceback.c:89: bad argument to internal function',), parents=['exceptions.SystemError', 'exceptions.StandardError', 'exceptions.Exception', 'exceptions.BaseException', '__builtin__.object', 'exceptions.SystemError'], frames=[], type=<type at remote 0xb76ee1e0>, stack=[]) at remote 0xa2fdb8c> w = 0x0 I don't know if this will be helpful? http://mail.python.org/pipermail/python-dev/2001-October/018002.html |
|||
msg149934 - (view) | Author: David Butler (David.Butler) | Date: 2011-12-20 22:37 | |
This also looks familiar: http://bytes.com/topic/python/answers/36901-fatal-error-gc-object-already-tracked "semi-randomly locks somewhere outside my C-code (after returning to python), and semi-randomly issues a "Fatal error: GC object already tracked... " Someone replies with: "... should be calling PyObject_GC_UnTrack before doing anything else..." So, perhaps one of the extensions is not calling PyObject_GC_UnTrack? but I need to know which one to go further... |
|||
msg149935 - (view) | Author: David Butler (David.Butler) | Date: 2011-12-20 22:50 | |
I think I have found the problem! I took a closer look at the Fatal error core: #0 0xb76fe424 in __kernel_vsyscall () #1 0xb740ccb1 in *__GI_raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #2 0xb740e3f2 in *__GI_abort () at abort.c:92 #3 0xb766193f in Py_FatalError (msg=0xb76949de "GC object already tracked") at Python/pythonrun.c:1670 #4 0xb7595a3a in PyMethod_New (func=<function at remote 0x8d649cc>, self= <QtReactor(waker=<_UnixWaker(i=13, fileno=<function at remote 0x93f6994>, reactor=<...>, o=14) at remote 0x93f3dac>, threadCallQueue=[(<instancemethod at remote 0x9b5639c>, (0,), {})], _cancellations=0, _justStopped=False, _newTimedCalls=[], _pendingTimedCalls=[<DelayedCall(resetter=<instancemethod at remote 0x9b5d504>, seconds=<built-in function time>, args=(), canceller=<instancemethod at remote 0x9cb48ec>, delayed_time=0, kw={}, func=<LoopingCall(a=(...), interval=2, clock=<...>, _expectNextCallAt=<float at remote 0x9ab7b44>, f=<instancemethod at remote 0x96db34c>, deferred=<Deferred(_canceller=None, callbacks=[((<function at remote 0x8bc6f44>, None, None), (<instancemethod at remote 0x96db374>, (...), {}))]) at remote 0x96da0cc>, running=True, kw={}, starttime=<float at remote 0x8c3f0f4>, call=<...>, _runAtStart=True) at remote 0x96d776c>, time=<float at remote 0x9d4cb74>, cancelled=0, called=0) at remote 0x9d1184c>, <DelayedCall(resetter=<instancemethod at remote 0xa2c4edc>, seconds=<built-in function ti...(truncated), klass=<type at remote 0x8e5a3dc>) at Objects/classobject.c:2250 #5 0xb6c71225 in PySide::DynamicSlotDataV2::callback (this=0x96b6440) at /var/tmp/portage/dev-python/pyside-1.0.9/work/pyside-qt4.7+1.0.9/libpyside/globalreceiverv2.cpp:129 #6 0xb6c718fc in PySide::GlobalReceiverV2::qt_metacall (this=0x9519b88, call=QMetaObject::InvokeMetaMethod, id=5, args=0xbffbeef4) at /var/tmp/portage/dev-python/pyside-1.0.9/work/pyside-qt4.7+1.0.9/libpyside/globalreceiverv2.cpp:289 #7 0xb6ab76a7 in QMetaObject::metacall (object=0x9519b88, cl=QMetaObject::InvokeMetaMethod, idx=5, argv=0xbffbeef4) at kernel/qmetaobject.cpp:237 #8 0xb6ac9b8c in QMetaObject::activate (sender=0x9760e40, m=0xb6c5d644, local_signal_index=<optimized out>, argv=<optimized out>) at kernel/qobject.cpp:3278 #9 0xb6b21b27 in QTimer::timeout (this=0x9760e40) at .moc/debug-shared/moc_qtimer.cpp:128 #10 0xb6acfb01 in QTimer::timerEvent (this=0x9760e40, e=0xbffbf41c) at kernel/qtimer.cpp:271 #11 0xb6e35efb in QTimerWrapper::timerEvent (this=0x9760e40, arg__1=0xbffbf41c) at /var/tmp/portage/dev-python/pyside-1.0.9/work/pyside-1.0.9_build/PySide/QtCore/PySide/QtCore/qtimer_wrapper.cpp:233 #12 0xb6ac2e67 in QObject::event (this=0x9760e40, e=0x6) at kernel/qobject.cpp:1181 #13 0xb6e37717 in QTimerWrapper::event (this=0x9760e40, arg__1=0xbffbf41c) at /var/tmp/portage/dev-python/pyside-1.0.9/work/pyside-1.0.9_build/PySide/QtCore/PySide/QtCore/qtimer_wrapper.cpp:164 If this would explain the update_refs lock up then I will close this bug... (i also opening on on pyside) |
|||
msg149936 - (view) | Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * | Date: 2011-12-20 23:13 | |
Hey, you found it! PySide::DynamicSlotDataV2::callback() calls PyMethod_New() without getting the GIL. The Python allocator is not thread-safe, operations are supposed to be serialized by this Global Interpreter Lock. I suggest to modify this DynamicSlotDataV2::callback() and add the line "Shiboken::GilState gil;" at the beginning of the function. |
|||
msg149937 - (view) | Author: David Butler (David.Butler) | Date: 2011-12-20 23:18 | |
resolved as wont fix, because its not python's fault :) |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:24 | admin | set | github: 57825 |
2011-12-20 23:18:43 | David.Butler | set | status: open -> closed resolution: wont fix messages: + msg149937 |
2011-12-20 23:13:20 | amaury.forgeotdarc | set | messages: + msg149936 |
2011-12-20 22:50:03 | David.Butler | set | messages: + msg149935 |
2011-12-20 22:37:37 | David.Butler | set | messages: + msg149934 |
2011-12-20 22:20:31 | David.Butler | set | messages: + msg149933 |
2011-12-20 02:36:07 | jcea | set | messages: + msg149902 |
2011-12-20 02:27:47 | jcea | set | messages: + msg149901 |
2011-12-20 02:12:24 | David.Butler | set | messages: + msg149900 |
2011-12-20 00:23:28 | jcea | set | messages: + msg149895 |
2011-12-19 18:07:14 | David.Butler | set | files:
+ cpu_usage.png messages: + msg149882 |
2011-12-17 18:38:20 | jcea | set | nosy:
+ jcea messages: + msg149690 |
2011-12-16 23:43:33 | amaury.forgeotdarc | set | nosy:
+ amaury.forgeotdarc messages: + msg149644 |
2011-12-16 23:27:12 | David.Butler | create |