classification
Title: Segfault in PyTrash_destroy_chain
Type: crash Stage: resolved
Components: Interpreter Core Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Aaron.Staley, amaury.forgeotdarc, asvetlov, benjamin.peterson, cupcicm, fvisconte, georg.brandl, neologix, pitrou, python-dev
Priority: release blocker Keywords: patch

Created on 2012-02-11 12:55 by Aaron.Staley, last changed 2012-09-09 09:18 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
thread_all_apply_bt.txt fvisconte, 2012-08-07 08:12 thread all apply bt output
trashcan-2.7.patch pitrou, 2012-08-30 21:05
Messages (31)
msg153123 - (view) Author: Aaron Staley (Aaron.Staley) Date: 2012-02-11 12:55
Am frequently getting segmentation faults with my python program.  I am utilizing heavy use of threading and sockets.

All sorts of ones are showing in syslog:

kernel: [7763578.475590] python[10097]: segfault at 88e5a0 ip 000000000088e5a0 sp 00007f640efd4028 error 15 in python2.7[833000+69000]
Feb 11 11:39:41 ip-10-8-197-187 kernel: [7762910.806782] python[6228]: segfault at 16 ip 0000000000000016 sp 00007fe488fb0028 error 14 in python2.7[400000+233000]
Feb 11 07:36:09 ip-10-8-197-187 kernel: [7748298.747145] python[11376]: segfault at 16 ip 0000000000000016 sp 00007f28767ab028 error 14 in python2.7[400000+233000]

Managed to capture one coredump:

Program terminated with signal 11, Segmentation fault.
#0  0x0000000000000009 in ?? ()
(gdb) where
#0  0x0000000000000009 in ?? ()
#1  0x000000000046670f in _PyTrash_destroy_chain () at ../Objects/object.c:2448
#2  0x00000000004bd0df in PyEval_EvalCodeEx (co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kws=<optimized out>, kwcount=0, defs=0x0, defcount=0, closure=
    (<cell at remote 0x293ef68>, <cell at remote 0x21c78d8>, <cell at remote 0x239e168>, <cell at remote 0x219ef30>, <cell at remote 0x2397a28>, <cell at remote 0x2129868>, <cell at remote 0x2397590>, <cell at remote 0x2397be8>, <cell at remote 0x2397fa0>)) at ../Python/ceval.c:3264
#3  0x00000000004b6a5b in fast_function (nk=<optimized out>, na=0, n=<optimized out>, pp_stack=0x7f778ef9c1b0, func=<function at remote 0x20afa28>) at ../Python/ceval.c:4117
#4  call_function (oparg=<optimized out>, pp_stack=0x7f778ef9c1b0) at ../Python/ceval.c:4042
#5  PyEval_EvalFrameEx (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:2666
#6  0x00000000004bcd2d in PyEval_EvalCodeEx (co=0x168f6b0, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kws=<optimized out>, kwcount=0, defs=0x169f260, defcount=2, closure=
    0x0) at ../Python/ceval.c:3253
#7  0x00000000004b6a5b in fast_function (nk=<optimized out>, na=3, n=<optimized out>, pp_stack=0x7f778ef9c3a0, func=<function at remote 0x16a00c8>) at ../Python/ceval.c:4117
#8  call_function (oparg=<optimized out>, pp_stack=0x7f778ef9c3a0) at ../Python/ceval.c:4042
#9  PyEval_EvalFrameEx (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:2666
#10 0x00000000004bd325 in PyEval_EvalCodeEx (co=0x168f8b0, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kws=<optimized out>, kwcount=5, defs=0x169a6c8, defcount=5, closure=
    0x0) at ../Python/ceval.c:3253
#11 0x00000000004b6a5b in fast_function (nk=<optimized out>, na=1, n=<optimized out>, pp_stack=0x7f778ef9c590, func=<function at remote 0x16a0140>) at ../Python/ceval.c:4117
#12 call_function (oparg=<optimized out>, pp_stack=0x7f778ef9c590) at ../Python/ceval.c:4042
#13 PyEval_EvalFrameEx (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:2666
#14 0x00000000004bcd2d in PyEval_EvalCodeEx (co=0x168a130, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kws=<optimized out>, kwcount=0, defs=0x1684d28, defcount=1, closure=
    0x0) at ../Python/ceval.c:3253
#15 0x00000000004b6a5b in fast_function (nk=<optimized out>, na=2, n=<optimized out>, pp_stack=0x7f778ef9c780, func=<function at remote 0x1763758>) at ../Python/ceval.c:4117
#16 call_function (oparg=<optimized out>, pp_stack=0x7f778ef9c780) at ../Python/ceval.c:4042
#17 PyEval_EvalFrameEx (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:2666
#18 0x00000000004b6d77 in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7f778ef9c8c0, func=<function at remote 0x1ebf050>) at ../Python/ceval.c:4107
#19 call_function (oparg=<optimized out>, pp_stack=0x7f778ef9c8c0) at ../Python/ceval.c:4042
#20 PyEval_EvalFrameEx (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:2666
#21 0x00000000004b6d77 in fast_function (nk=<optimized out>, na=<optimized out>, n=<optimized out>, pp_stack=0x7f778ef9ca00, func=<function at remote 0xc5f5f0>) at ../Python/ceval.c:4107
#22 call_function (oparg=<optimized out>, pp_stack=0x7f778ef9ca00) at ../Python/ceval.c:4042
#23 PyEval_EvalFrameEx (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:2666
#24 0x00000000004bcd2d in PyEval_EvalCodeEx (co=0xc0d9b0, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, kws=<optimized out>, kwcount=0, defs=0x0, defcount=0, closure=0x0)
    at ../Python/ceval.c:3253
#25 0x0000000000448edf in function_call (func=<function at remote 0xc5f500>, arg=
    (<ProvisionThread(_Thread__ident=140151476573952, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33d2fd0>, acquire=<built-in method acquire of thread.lock object at remote 0x33d2fd0>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33d2fd0>) at remote 0x327a790>, _Thread__name='Thread-193', _Thread__daemonic=True, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33d2b10>, acquire=<built-in method acquire of thread.lock object at remote 0x33d2b10>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33d2b10>) at remote 0x327ab10>) at remote 0x33a6350>, _Thread__stderr=<file at remote 0x7f77cffb0270>, instance=<Instance(kernel=u'aki-825ea7eb', root_device_type=u'ebs', private_dns_name=u'ip-10-64-50-6.ec2.internal', previous_state=None, spot_instance_request_id=None, hypervis...(truncated), kw=0x0) at ../Objects/funcobject.c:526
#26 0x000000000041ad2a in PyObject_Call (func=<function at remote 0xc5f500>, arg=<optimized out>, kw=<optimized out>) at ../Objects/abstract.c:2529
#27 0x000000000043074e in instancemethod_call (func=<function at remote 0xc5f500>, arg=
    (<ProvisionThread(_Thread__ident=140151476573952, _Thread__block=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33d2fd0>, acquire=<built-in method acquire of thread.lock object at remote 0x33d2fd0>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33d2fd0>) at remote 0x327a790>, _Thread__name='Thread-193', _Thread__daemonic=True, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x33d2b10>, acquire=<built-in method acquire of thread.lock object at remote 0x33d2b10>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x33d2b10>) at remote 0x327ab10>) at remote 0x33a6350>, _Thread__stderr=<file at remote 0x7f77cffb0270>, instance=<Instance(kernel=u'aki-825ea7eb', root_device_type=u'ebs', private_dns_name=u'ip-10-64-50-6.ec2.internal', previous_state=None, spot_instance_request_id=None, hypervis...(truncated), kw=0x0) at ../Objects/classobject.c:2578
#28 0x000000000041ad2a in PyObject_Call (func=<instancemethod at remote 0x34fa320>, arg=<optimized out>, kw=<optimized out>) at ../Objects/abstract.c:2529
#29 0x00000000004b5d76 in PyEval_CallObjectWithKeywords (func=<instancemethod at remote 0x34fa320>, arg=(), kw=<optimized out>) at ../Python/ceval.c:3890
#30 0x00000000004f06c0 in t_bootstrap (boot_raw=<optimized out>) at ../Modules/threadmodule.c:614
#31 0x00007f77cfbafefc in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#32 0x00007f77cea4789d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#33 0x0000000000000000 in ?? ()
msg153125 - (view) Author: Aaron Staley (Aaron.Staley) Date: 2012-02-11 13:48
For some more context:

Python 2.7.2
Running on Amazon EC2
Linux 3.0.0-14-virtual x86_64 in ubuntu 11.10
170 threads in this particular core dump (another similar crash has 135)
msg153128 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2012-02-11 14:54
The crash occurs in an oject's destructor.
Are you using any third-party module (lsof/pmap can help)?

Also, it'd help if you tried reproducing the crash with a debug build (./configure --with-pydebug && make).
msg153196 - (view) Author: Aaron Staley (Aaron.Staley) Date: 2012-02-12 11:05
Active extension modules are MySQL-python, numpy, and crypto.

Here is the output from the non-optimized debug build. Slightly different trace, but still some sort of deallocator crashing AFAIK:


#0  0x000000000046247c in _Py_ForgetReference (op=
    <Channel(origin_addr=None, in_window_size=65536, in_window_threshold=6553, lock=<thread.lock at remote 0x571bf90>, _pipe=None, eof_received=0, in_max_packet_size=34816, out_buffer_cv=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x571bf90>, acquire=<built-in method acquire of thread.lock object at remote 0x571bf90>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x571bf90>) at remote 0x593d3e0>, event=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x59138b0>, acquire=<built-in method acquire of thread.lock object at remote 0x59138b0>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x59138b0>) at remote 0x5a333e0>) at remote 0x593ded0>, transport=<Transport(_Thread__ident=140009885591296, host_key_type='ssh-rsa', _channels=<ChannelMap(_lock=<thread.lock at remote 0x5928f90>, _map=<WeakValueDictionary(_re...(truncated)) at Objects/object.c:2220
#1  0x00000000004624ed in _Py_Dealloc (op=
    <Channel(origin_addr=None, in_window_size=65536, in_window_threshold=6553, lock=<thread.lock at remote 0x571bf90>, _pipe=None, eof_received=0, in_max_packet_size=34816, out_buffer_cv=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x571bf90>, acquire=<built-in method acquire of thread.lock object at remote 0x571bf90>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x571bf90>) at remote 0x593d3e0>, event=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x59138b0>, acquire=<built-in method acquire of thread.lock object at remote 0x59138b0>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x59138b0>) at remote 0x5a333e0>) at remote 0x593ded0>, transport=<Transport(_Thread__ident=140009885591296, host_key_type='ssh-rsa', _channels=<ChannelMap(_lock=<thread.lock at remote 0x5928f90>, _map=<WeakValueDictionary(_re...(truncated)) at Objects/object.c:2240
#2  0x0000000000442244 in list_dealloc (op=0x66d7ab0) at Objects/listobject.c:309
#3  0x00000000004624fa in _Py_Dealloc (op=
    [<Channel(origin_addr=None, in_window_size=65536, in_window_threshold=6553, lock=<thread.lock at remote 0x571bf90>, _pipe=None, eof_received=0, in_max_packet_size=34816, out_buffer_cv=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x571bf90>, acquire=<built-in method acquire of thread.lock object at remote 0x571bf90>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x571bf90>) at remote 0x593d3e0>, event=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x59138b0>, acquire=<built-in method acquire of thread.lock object at remote 0x59138b0>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x59138b0>) at remote 0x5a333e0>) at remote 0x593ded0>, transport=<Transport(_Thread__ident=140009885591296, host_key_type='ssh-rsa', _channels=<ChannelMap(_lock=<thread.lock at remote 0x5928f90>, _map=<WeakValueDictionary(_r...(truncated)) at Objects/object.c:2241
#4  0x0000000000448bc4 in listiter_next (it=0x5d1c530) at Objects/listobject.c:2917
#5  0x00000000004ce425 in PyEval_EvalFrameEx (f=
    Frame 0x7f56a050aea0, for file /usr/lib/python2.7/dist-packages/paramiko/transport.py, line 1586, in run (self=<Transport(_Thread__ident=140009885591296, host_key_type='ssh-rsa', _channels=<ChannelMap(_lock=<thread.lock at remote 0x5928f90>, _map=<WeakValueDictionary(_remove=<function at remote 0x56355a0>, data={}) at remote 0x5939588>) at remote 0x5912bc0>, lock=<thread.lock at remote 0x5928d60>, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x521ff40>, acquire=<built-in method acquire of thread.lock object at remote 0x521ff40>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x521ff40>) at remote 0x5223300>) at remote 0x47f56f0>, _channel_counter=22, active=False, _preferred_compression=('none',), server_object=None, kex_engine=None, log_name='paramiko.transport', _x11_handler=None, remote_compression='none', _Thread__initialized=True, server_accepts=[], s...(truncated), throwflag=0) at Python/ceval.c:2497
#6  0x00000000004d41c3 in fast_function (func=<function at remote 0x4716300>, pp_stack=0x7f56977ed400, n=1, na=1, nk=0) at Python/ceval.c:4099
#7  0x00000000004d3ed3 in call_function (pp_stack=0x7f56977ed400, oparg=0) at Python/ceval.c:4034
#8  0x00000000004cec2b in PyEval_EvalFrameEx (f=
    Frame 0x5756c40, for file /root/python2.7-2.7.2/Lib/threading.py, line 552, in __bootstrap_inner (self=<Transport(_Thread__ident=140009885591296, host_key_type='ssh-rsa', _channels=<ChannelMap(_lock=<thread.lock at remote 0x5928f90>, _map=<WeakValueDictionary(_remove=<function at remote 0x56355a0>, data={}) at remote 0x5939588>) at remote 0x5912bc0>, lock=<thread.lock at remote 0x5928d60>, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x521ff40>, acquire=<built-in method acquire of thread.lock object at remote 0x521ff40>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x521ff40>) at remote 0x5223300>) at remote 0x47f56f0>, _channel_counter=22, active=False, _preferred_compression=('none',), server_object=None, kex_engine=None, log_name='paramiko.transport', _x11_handler=None, remote_compression='none', _Thread__initialized=True, server_accepts=[], saved_exc...(truncated), throwflag=0) at Python/ceval.c:2666
#9  0x00000000004d41c3 in fast_function (func=<function at remote 0x16ff060>, pp_stack=0x7f56977ed760, n=1, na=1, nk=0) at Python/ceval.c:4099
#10 0x00000000004d3ed3 in call_function (pp_stack=0x7f56977ed760, oparg=0) at Python/ceval.c:4034
#11 0x00000000004cec2b in PyEval_EvalFrameEx (f=
    Frame 0x53b9380, for file /root/python2.7-2.7.2/Lib/threading.py, line 525, in __bootstrap (self=<Transport(_Thread__ident=140009885591296, host_key_type='ssh-rsa', _channels=<ChannelMap(_lock=<thread.lock at remote 0x5928f90>, _map=<WeakValueDictionary(_remove=<function at remote 0x56355a0>, data={}) at remote 0x5939588>) at remote 0x5912bc0>, lock=<thread.lock at remote 0x5928d60>, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x521ff40>, acquire=<built-in method acquire of thread.lock object at remote 0x521ff40>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x521ff40>) at remote 0x5223300>) at remote 0x47f56f0>, _channel_counter=22, active=False, _preferred_compression=('none',), server_object=None, kex_engine=None, log_name='paramiko.transport', _x11_handler=None, remote_compression='none', _Thread__initialized=True, server_accepts=[], saved_exception...(truncated), throwflag=0) at Python/ceval.c:2666
#12 0x00000000004d1533 in PyEval_EvalCodeEx (co=0x16cb510, globals=
    {'current_thread': <function at remote 0x16ffe28>, '_BoundedSemaphore': <type at remote 0x1680cd0>, 'currentThread': <function at remote 0x16ffe28>, '_Timer': <type at remote 0x1682640>, '_format_exc': <function at remote 0x1538e28>, 'Semaphore': <function at remote 0x16fc840>, 'activeCount': <function at remote 0x1700108>, '_profile_hook': None, '_sleep': <built-in function sleep>, '_trace_hook': None, 'ThreadError': <type at remote 0x165f7f0>, '_enumerate': <function at remote 0x17001b0>, '_start_new_thread': <built-in function start_new_thread>, 'BoundedSemaphore': <function at remote 0x16fcf78>, '_shutdown': <instancemethod at remote 0x14a2960>, '__all__': ['activeCount', 'active_count', 'Condition', 'currentThread', 'current_thread', 'enumerate', 'Event', 'Lock', 'RLock', 'Semaphore', 'BoundedSemaphore', 'Thread', 'Timer', 'setprofile', 'settrace', 'local', 'stack_size'], '_Event': <type at remote 0x1681730>, 'active_count': <function at remote 0x1700108>, '__package__': None, '_Condition': <type at remo...(truncated), locals=0x0, args=0x475f088, argcount=1, kws=0x0, kwcount=0, defs=0x0, 
    defcount=0, closure=0x0) at Python/ceval.c:3253
#13 0x000000000055c7e9 in function_call (func=<function at remote 0x16feed0>, arg=
    (<Transport(_Thread__ident=140009885591296, host_key_type='ssh-rsa', _channels=<ChannelMap(_lock=<thread.lock at remote 0x5928f90>, _map=<WeakValueDictionary(_remove=<function at remote 0x56355a0>, data={}) at remote 0x5939588>) at remote 0x5912bc0>, lock=<thread.lock at remote 0x5928d60>, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x521ff40>, acquire=<built-in method acquire of thread.lock object at remote 0x521ff40>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x521ff40>) at remote 0x5223300>) at remote 0x47f56f0>, _channel_counter=22, active=False, _preferred_compression=('none',), server_object=None, kex_engine=None, log_name='paramiko.transport', _x11_handler=None, remote_compression='none', _Thread__initialized=True, server_accepts=[], saved_exception=exceptions.EOFError(), packetizer=<Packetizer(_Packetizer__remainder='', _Packetizer__block_siz...(truncated), kw=0x0) at Objects/funcobject.c:526
#14 0x0000000000420ae3 in PyObject_Call (func=<function at remote 0x16feed0>, arg=
    (<Transport(_Thread__ident=140009885591296, host_key_type='ssh-rsa', _channels=<ChannelMap(_lock=<thread.lock at remote 0x5928f90>, _map=<WeakValueDictionary(_remove=<function at remote 0x56355a0>, data={}) at remote 0x5939588>) at remote 0x5912bc0>, lock=<thread.lock at remote 0x5928d60>, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x521ff40>, acquire=<built-in method acquire of thread.lock object at remote 0x521ff40>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x521ff40>) at remote 0x5223300>) at remote 0x47f56f0>, _channel_counter=22, active=False, _preferred_compression=('none',), server_object=None, kex_engine=None, log_name='paramiko.transport', _x11_handler=None, remote_compression='none', _Thread__initialized=True, server_accepts=[], saved_exception=exceptions.EOFError(), packetizer=<Packetizer(_Packetizer__remainder='', _Packetizer__block_siz...(truncated), kw=0x0) at Objects/abstract.c:2529
#15 0x000000000042b35f in instancemethod_call (func=<function at remote 0x16feed0>, arg=
    (<Transport(_Thread__ident=140009885591296, host_key_type='ssh-rsa', _channels=<ChannelMap(_lock=<thread.lock at remote 0x5928f90>, _map=<WeakValueDictionary(_remove=<function at remote 0x56355a0>, data={}) at remote 0x5939588>) at remote 0x5912bc0>, lock=<thread.lock at remote 0x5928d60>, _Thread__started=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x521ff40>, acquire=<built-in method acquire of thread.lock object at remote 0x521ff40>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x521ff40>) at remote 0x5223300>) at remote 0x47f56f0>, _channel_counter=22, active=False, _preferred_compression=('none',), server_object=None, kex_engine=None, log_name='paramiko.transport', _x11_handler=None, remote_compression='none', _Thread__initialized=True, server_accepts=[], saved_exception=exceptions.EOFError(), packetizer=<Packetizer(_Packetizer__remainder='', _Packetizer__block_siz...(truncated), kw=0x0) at Objects/classobject.c:2578
#16 0x0000000000420ae3 in PyObject_Call (func=<instancemethod at remote 0x59377e0>, arg=(), kw=0x0) at Objects/abstract.c:2529
#17 0x00000000004d3496 in PyEval_CallObjectWithKeywords (func=<instancemethod at remote 0x59377e0>, arg=(), kw=0x0) at Python/ceval.c:3882
#18 0x000000000051e193 in t_bootstrap (boot_raw=0x3828988) at ./Modules/threadmodule.c:614
#19 0x00007f56b1f17efc in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#20 0x00007f56b15c789d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#21 0x0000000000000000 in ?? ()


The op object is empty, which caused the segfault:

(gdb) f 0
2220            op->_ob_prev->_ob_next != op || op->_ob_next->_ob_prev != op)
(gdb) p *op
$5 = {_ob_next = 0x0, _ob_prev = 0x0, ob_refcnt = 0, ob_type = 0x3c6d9f0}


I believe this is the python decoded version?:
(gdb) p op
$18 = 
    <Channel(origin_addr=None, in_window_size=65536, in_window_threshold=6553, lock=<thread.lock at remote 0x571bf90>, _pipe=None, eof_received=0, in_max_packet_size=34816, out_buffer_cv=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x571bf90>, acquire=<built-in method acquire of thread.lock object at remote 0x571bf90>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x571bf90>) at remote 0x593d3e0>, event=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x59138b0>, acquire=<built-in method acquire of thread.lock object at remote 0x59138b0>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x59138b0>) at remote 0x5a333e0>) at remote 0x593ded0>, transport=<Transport(_Thread__ident=140009885591296, host_key_type='ssh-rsa', _channels=<ChannelMap(_lock=<thread.lock at remote 0x5928f90>, _map=<WeakValueDictionary(_re...(truncated

Possibly useful?

(gdb) p (void*)op
$31 = (void *) 0x593bd80
(gdb) p &refchain
$32 = <unknown at remote 0x7e8ca0>


Possibly a double free?
msg153207 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2012-02-12 15:47
> Possibly a double free?

> (gdb) p *op
> $5 = {_ob_next = 0x0, _ob_prev = 0x0, ob_refcnt = 0, ob_type = 0x3c6d9f0}
>

It indeed looks like a double free (see the previous and next pointers
have already been set to NULL).

Could you check if there's not another thread active inside the
interpreter at the same time ("thread apply all bt")?

But I'm afraid it'll be hard to debug with just a core dump.
Do you have any test script that could be used to reproduce the crash?
msg153241 - (view) Author: Aaron Staley (Aaron.Staley) Date: 2012-02-13 01:01
As far as I can tell, no other thread is active.  Their backtraces are all either:


#0  0x00007f283dedd300 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x0000000000519295 in PyThread_acquire_lock (lock=0xe7dd50, waitflag=1) at Python/thread_pthread.h:321
#2  0x00000000004c734f in PyEval_RestoreThread (tstate=0xfbfb40) at Python/ceval.c:357


or

Thread 77 (Thread 0x7f2808fb1700 (LWP 30292)):
#0  0x00007f283d57a773 in poll () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f283c458739 in internal_select (s=0x7f2831432ca0, writing=0) at /root/python2.7-2.7.2/Modules/socketmodule.c:692
#2  0x00007f283c45acdf in sock_recv_guts (s=0x7f2831432ca0, cbuf=0x339bf7c "\313\313\313\313\313\313\313\313\313\313\313\313\313\313\313", <incomplete sequence \313>, len=16, flags=0) at /root/python2.7-2.7.2/Modules/socketmodule.c:2356
(More stack frames follow...)

or

Thread 17 (Thread 0x7f2829ff3700 (LWP 19252)):
#0  0x00007f283d57f913 in select () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f283d2a1fac in floatsleep (secs=0) at /root/python2.7-2.7.2/Modules/timemodule.c:943
#2  0x00007f283d2a0cd9 in time_sleep (self=0x0, args=(<float at remote 0xb3ed40>,)) at /root/python2.7-2.7.2/Modules/timemodule.c:206
(More stack frames follow...)

or
Thread 12 (Thread 0x7f2834e5f700 (LWP 19223)):
#0  0x00007f283d57f913 in select () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f2839cb275e in select_select (self=0x0, args=([<_socketobject at remote 0x7f2830f4d1a0>, <_socketobject at remote 0x2106ec0>], [], [], <float at remote 0x7f283034d510>)) at /root/python2.7-2.7.2/Modules/selectmodule.c:278
#2  0x000000000055f0ee in PyCFunction_Call (func=<built-in function select>, arg=([<_socketobject at remote 0x7f2830f4d1a0>, <_socketobject at remote 0x2106ec0>], [], [], <float at remote 0x7f283034d510>), kw=0x0) at Objects/methodobject.c:81
(More stack frames follow...)


Unfortunately, I've only been able to produce the segfault with this program (an EC2 server monitor and deployer).  It's very site specific and couldn't run on another machine.

That said, I can trigger these segfaults with high reliability, so I can certainly add additional debugging logs if requested.
msg153243 - (view) Author: Aaron Staley (Aaron.Staley) Date: 2012-02-13 02:57
I should note that my program is also affected by this bug:
http://bugs.python.org/issue13817
(couldn't isolate it until I used the pydebug configure info).
msg153247 - (view) Author: Aaron Staley (Aaron.Staley) Date: 2012-02-13 04:19
Used latest mercurial 2.7 branch and segfault still occurs.
msg153248 - (view) Author: Aaron Staley (Aaron.Staley) Date: 2012-02-13 04:22
Also, the only particular things by code does is have a lot of threads (100+), holds open many ssh connections (with paramiko) and I do occasionally use PyThreadState_SetAsyncExc to asynchronously terminate threads.
msg153254 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2012-02-13 09:10
(I'm adding Antoine to the noisy list).

You could maybe try building with '--without-pymalloc', and run under valgrind (--suppressions=Misc/valgrind-python.supp), see http://hg.python.org/cpython/file/6653328a02d0/Misc/README.valgrind
msg153263 - (view) Author: Aaron Staley (Aaron.Staley) Date: 2012-02-13 12:31
Unfortunately, my application is running so slow under valgrind that behavior is changing, namely the sockets are all timing out.
msg153268 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2012-02-13 13:51
> Unfortunately, my application is running so slow under valgrind that 
> behavior is changing, namely the sockets are all timing out.

Did you see any invalid read/write?

By the way, a google search returned me this:
https://github.com/paramiko/paramiko/issues/17#issuecomment-2382818

The commenter says the crashes don't happen with Python 2.6.7.
If that's true, then you could try bisecting the commit (but it'll take some time).
msg153269 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2012-02-13 13:54
> and I do occasionally use PyThreadState_SetAsyncExc to asynchronously
> terminate threads.
Did you write an extension module for this?  It is possible that there is a refcount error there.
msg153312 - (view) Author: Aaron Staley (Aaron.Staley) Date: 2012-02-14 00:28
I see no invalid read/write. Only warnings are some conditional jump depends on uninitialized value in pycrypto's libraries very early on though.

The link seems to be a different bug from mine.  I did test in python 2.6.7 and the crash still occurs.  (I also never get a segfault on my system with the link's test case - perhaps that only occurs in osx?)

To do the async exception, I use ctypes.pythonapi.PyThreadState_SetAsyncExc in python itself.  This is relatively rarely used however.
msg153322 - (view) Author: Aaron Staley (Aaron.Staley) Date: 2012-02-14 04:01
BTW, I take back what I said about using PyThreadState_SetAsyncExc.  Turns out I had a bug where this function would never succeed (was always returning 0).
msg153355 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2012-02-14 18:10
I'm running out of ideas to debug this, maybe Antoine or Amaury can help :-)

One last idea (not sure it will work though):
If Channel's finalizer gets called twice, inside Channel.__del__, you
could save a string representation of the current backtrace as an
attribute, and print the previous and current backtrace is there is
already one such attribute (i.e. print the backtrace if __del__ is
called twice).
There may be false positives (for example a finalizer resurecting the
object would end up in __del__ being called twice), but it might yield
useful info.
You might also want to save the thread ID and time.

But this is really strange...
msg167095 - (view) Author: Manu Cupcic (cupcicm) Date: 2012-08-01 07:39
I am also hitting this issue.

I have been able to reproduce it in python 2.5 and 2.7, on debian 6 (64 bits)
Exactly like the reporter, I am making extensive use of threads. I am also using SQLalchemy.

All my threads but one are waiting (probably for the GIL) :
(gdb) info threads
  15 Thread 0x7fffe3ff7700 (LWP 419)  0x00007ffff7bd03c0 in sem_wait () from /lib/libpthread.so.0
  14 Thread 0x7fffe47f8700 (LWP 418)  0x00007ffff7bd03c0 in sem_wait () from /lib/libpthread.so.0
  13 Thread 0x7fffe4ff9700 (LWP 417)  0x00007ffff7bd03c0 in sem_wait () from /lib/libpthread.so.0
  12 Thread 0x7fffe57fa700 (LWP 416)  0x00007ffff7bd03c0 in sem_wait () from /lib/libpthread.so.0
  11 Thread 0x7fffe5ffb700 (LWP 415)  0x00007ffff7bd03c0 in sem_wait () from /lib/libpthread.so.0
* 10 Thread 0x7fffe67fc700 (LWP 414)  0x000000000046479f in _Py_ForgetReference (op=0x4dc7bc0)
    at Objects/object.c:2222
  9 Thread 0x7fffe6ffd700 (LWP 413)  0x00007ffff7bd03c0 in sem_wait () from /lib/libpthread.so.0
  8 Thread 0x7fffe77fe700 (LWP 412)  0x00007ffff7bd03c0 in sem_wait () from /lib/libpthread.so.0
  7 Thread 0x7fffe7fff700 (LWP 411)  0x00007ffff7bd03c0 in sem_wait () from /lib/libpthread.so.0
  6 Thread 0x7fffec81c700 (LWP 410)  0x00007ffff72a11a3 in select () from /lib/libc.so.6
  5 Thread 0x7fffed01d700 (LWP 409)  0x00007ffff7bd03c0 in sem_wait () from /lib/libpthread.so.0
  4 Thread 0x7fffed81e700 (LWP 408)  0x00007ffff7bd03c0 in sem_wait () from /lib/libpthread.so.0
  3 Thread 0x7fffee01f700 (LWP 407)  0x00007ffff7bd03c0 in sem_wait () from /lib/libpthread.so.0
  2 Thread 0x7fffee820700 (LWP 406)  0x00007ffff729c9d3 in poll () from /lib/libc.so.6
  1 Thread 0x7ffff7fec700 (LWP 393)  0x00007ffff72a11a3 in select () from /lib/libc.so.6

And the segfault happens in _Py_ForgetReference :

(gdb) bt
#0  0x000000000046479f in _Py_ForgetReference (op=0x4dc7bc0) at Objects/object.c:2222
#1  0x0000000000464810 in _Py_Dealloc (op=0x4dc7bc0) at Objects/object.c:2242
#2  0x0000000000559a68 in frame_dealloc (f=0x4997ab0) at Objects/frameobject.c:458
#3  0x000000000046481d in _Py_Dealloc (op=0x4997ab0) at Objects/object.c:2243
...

probably because something else zeroed out the op object :
(gdb) select-frame 0
(gdb) print *op
$6 = {_ob_next = 0x0, _ob_prev = 0x0, ob_refcnt = 0, ob_type = 0x2364020}

Is there something else I can do to help resolve this bug ? I can help debugging this, but I am not sure how to proceed. I have a --with-pydebug interpreter stopped at the segfault in gdb.
msg167114 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2012-08-01 11:48
> Is there something else I can do to help resolve this bug ? I can help debugging this, but I am not sure how to proceed. I have a --with-pydebug interpreter stopped at the segfault in gdb.

Could you post the output of :
"""
p op
"""

and

"""
thread apply all bt
"""
msg167115 - (view) Author: Manu Cupcic (cupcicm) Date: 2012-08-01 11:53
(gdb) p op
$12 = (PyObject *) 0x4dc7bc0

thread all apply bt is going to be very long no ? Could you tell me what you want to know that is not in info threads maybe ? I can definitely post it here, but it's going to be a lot of output.

I can be available on IRC if you want to debug it live.

Thanks !
msg167295 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2012-08-03 07:59
> (gdb) p op
> $12 = (PyObject *) 0x4dc7bc0

Don't you have GDB 7, to decode Python objects ?
You can check the type of the object with :
"""
p op->ob_type.tp_name
"""

>
> thread all apply bt is going to be very long no ? Could you tell me what you want to know that is not in info threads maybe ? I can definitely post it here, but it's going to be a lot of output.

You can add it as an attachment.
msg167609 - (view) Author: Francois VISCONTE (fvisconte) Date: 2012-08-07 08:12
Hi, 

Following Manu's information I complete with what you asked. 
I repeated the bug described by Manu, with the same software in exactly the same conditions.

The corrupted object is one of our SQLAlchemy mapped object. Chances are that the bug came from SQLAlchemy.

Find joined the "thread all apply bt" output. I will investigate today on this bug and will try to find a quicker way to reproduce it in order to use an hardware watchpoint on object refcount.
msg169491 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-08-30 21:05
Can someone witnessing the problem try to apply the attached patch?
msg169492 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-08-30 21:08
(for context about the patch, see my hypothesis at http://mail.python.org/pipermail/python-dev/2012-August/121533.html )
msg169540 - (view) Author: Manu Cupcic (cupcicm) Date: 2012-08-31 14:18
I am testing this patch right now. Will report back.
msg169750 - (view) Author: Manu Cupcic (cupcicm) Date: 2012-09-03 07:40
Hi,

Antoine, thank you for the patch you proposed. It indeed fixed the bug I was seeing (I ran a job that usually crashes in one to two hours over the week end and it is still running).

I am not sure what we should do now. Can this patch be integrated as is ? What can I do to help ?

Thanks

Manu
msg169753 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-09-03 10:06
> I am not sure what we should do now. Can this patch be integrated as is 
> ? What can I do to help ?

Well, you already helped by testing it, thank you.
I have to add a test case to the patch, then hopefully it can be committed.
msg169889 - (view) Author: Roundup Robot (python-dev) Date: 2012-09-05 23:03
New changeset 5a2ef447b80d by Antoine Pitrou in branch '2.7':
Issue #13992: The trashcan mechanism is now thread-safe.  This eliminates
http://hg.python.org/cpython/rev/5a2ef447b80d
msg169890 - (view) Author: Roundup Robot (python-dev) Date: 2012-09-05 23:21
New changeset e72cda3f40a2 by Antoine Pitrou in branch '3.2':
Issue #13992: The trashcan mechanism is now thread-safe.  This eliminates
http://hg.python.org/cpython/rev/e72cda3f40a2

New changeset 6bcaba9e8df2 by Antoine Pitrou in branch 'default':
Issue #13992: The trashcan mechanism is now thread-safe.  This eliminates
http://hg.python.org/cpython/rev/6bcaba9e8df2
msg169893 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-09-05 23:30
Oops, forgot to make Georg nosy.
Georg, I think this should go in 3.3.0, what do you think?
msg170034 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2012-09-08 06:06
Picked as 5aa1bc2f00ad.
msg170082 - (view) Author: Roundup Robot (python-dev) Date: 2012-09-09 09:18
New changeset 5aa1bc2f00ad by Antoine Pitrou in branch 'default':
Issue #13992: The trashcan mechanism is now thread-safe.  This eliminates
http://hg.python.org/cpython/rev/5aa1bc2f00ad
History
Date User Action Args
2012-09-09 09:18:50python-devsetmessages: + msg170082
2012-09-08 06:06:53georg.brandlsetstatus: open -> closed

messages: + msg170034
2012-09-05 23:45:26Arfreversetpriority: normal -> release blocker
nosy: + benjamin.peterson
2012-09-05 23:30:52pitrousetmessages: - msg169891
2012-09-05 23:30:46pitrousetnosy: + georg.brandl
messages: + msg169893
2012-09-05 23:22:28pitrousetresolution: fixed
messages: + msg169891
stage: patch review -> resolved
2012-09-05 23:21:41python-devsetmessages: + msg169890
2012-09-05 23:03:49python-devsetnosy: + python-dev
messages: + msg169889
2012-09-03 10:06:16pitrousetstage: patch review
messages: + msg169753
versions: + Python 3.2, Python 3.3
2012-09-03 07:40:02cupcicmsetmessages: + msg169750
2012-08-31 14:18:50cupcicmsetmessages: + msg169540
2012-08-31 10:26:25asvetlovsetnosy: + asvetlov
2012-08-30 21:08:56pitrousetmessages: + msg169492
2012-08-30 21:05:53pitrousetfiles: + trashcan-2.7.patch
keywords: + patch
messages: + msg169491
2012-08-07 08:12:15fviscontesetfiles: + thread_all_apply_bt.txt
nosy: + fvisconte
messages: + msg167609

2012-08-03 07:59:58neologixsetmessages: + msg167295
2012-08-01 11:53:35cupcicmsetmessages: + msg167115
2012-08-01 11:48:34neologixsetmessages: + msg167114
2012-08-01 07:39:36cupcicmsetnosy: + cupcicm
messages: + msg167095
2012-02-14 18:10:31neologixsetmessages: + msg153355
2012-02-14 04:01:54Aaron.Staleysetmessages: + msg153322
2012-02-14 00:28:52Aaron.Staleysetmessages: + msg153312
2012-02-13 13:54:39amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg153269
2012-02-13 13:51:27neologixsetmessages: + msg153268
2012-02-13 12:31:16Aaron.Staleysetmessages: + msg153263
2012-02-13 09:10:24neologixsetnosy: + pitrou
messages: + msg153254
2012-02-13 04:22:22Aaron.Staleysetmessages: + msg153248
2012-02-13 04:19:15Aaron.Staleysetmessages: + msg153247
2012-02-13 02:57:33Aaron.Staleysetmessages: + msg153243
2012-02-13 01:01:29Aaron.Staleysetmessages: + msg153241
2012-02-12 15:47:29neologixsetmessages: + msg153207
2012-02-12 11:05:11Aaron.Staleysetmessages: + msg153196
2012-02-11 14:54:10neologixsetnosy: + neologix
messages: + msg153128
2012-02-11 13:48:48Aaron.Staleysetmessages: + msg153125
2012-02-11 12:55:52Aaron.Staleycreate