Issue28673
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2016-11-12 08:31 by pwp333, last changed 2022-04-11 14:58 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
core_python2.7.10.gz | pwp333, 2016-11-15 22:07 | |||
28673-reproduce.py | methane, 2016-11-24 11:58 | |||
28673-reproduce2.py | methane, 2016-11-25 03:07 | |||
finish-deadlock.py | methane, 2016-11-25 13:39 | |||
28673-fix-deadlock.patch | methane, 2016-11-27 07:38 | review |
Messages (19) | |||
---|---|---|---|
msg280640 - (view) | Author: Michael Hu (pwp333) | Date: 2016-11-12 08:31 | |
When using pyro4 with more than 15 threads, python 2.7.12 cores frequently (>60% time) Note "v" (op in frame 1) in frame 2 is NULL which has some value in frame 3. So some other thread cleans it. === gdb === Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `python /etc/remoting/remoting_agent.zip run --system'. Program terminated with signal SIGSEGV, Segmentation fault. #0 PyDict_SetItem (op=op@entry=0x0, key=key@entry=0x7f0aa11cd420, value=value@entry=0x920ce0 <_PyExc_AttributeError.12478>) at ../Objects/dictobject.c:832 warning: Source file is more recent than executable. 832 if (!PyDict_Check(op)) { (gdb) bt #0 PyDict_SetItem (op=op@entry=0x0, key=key@entry='exc_type', value=value@entry=<type at remote 0x920ce0>) at ../Objects/dictobject.c:832 #1 0x0000000000529e86 in PyDict_SetItemString (item=<type at remote 0x920ce0>, key=0x61d02a "exc_type", v=0x0) at ../Objects/dictobject.c:2469 #2 PySys_SetObject (name=name@entry=0x61d02a "exc_type", v=v@entry=<type at remote 0x920ce0>) at ../Python/sysmodule.c:83 #3 0x000000000055acc0 in set_exc_info (tb=<traceback at remote 0x7f0aa1232878>, value=exceptions.AttributeError("'NoneType' object has no attribute 'SHUT_RDWR'",), type=<type at remote 0x920ce0>, tstate=0x95ad10) at ../Python/ceval.c:3736 #4 PyEval_EvalFrameEx (f=f@entry=Frame 0x7f0a9e929b60, for file /etc/remoting/remoting_agent.zip/Pyro4/socketutil.py, line 463, in close (self=<SocketConnection at remote 0x7f0a9e924510>), throwflag=throwflag@entry=0) at ../Python/ceval.c:3251 #5 0x0000000000559c27 in fast_function (nk=<optimized out>, na=<optimized out>, n=1, pp_stack=0x7ffce435ab00, func=<function at remote 0x7f0a9f6c0230>) at ../Python/ceval.c:4435 #6 call_function (oparg=<optimized out>, pp_stack=0x7ffce435ab00) at ../Python/ceval.c:4370 #7 PyEval_EvalFrameEx (f=f@entry=Frame 0x7f0a40000b50, for file /etc/remoting/remoting_agent.zip/Pyro4/socketutil.py, line 453, in __del__ (self=<SocketConnection at remote 0x7f0a9e924510>), throwflag=throwflag@entry=0) at ../Python/ceval.c:2987 #8 0x000000000056aa8a in PyEval_EvalCodeEx (closure=<optimized out>, defcount=<optimized out>, defs=0x0, kwcount=<optimized out>, kws=<optimized out>, argcount=1073744720, args=<optimized out>, locals=0x0, globals=<optimized out>, co=<optimized out>) at ../Python/ceval.c:3582 #9 function_call (func=func@entry=<function at remote 0x7f0a9f6c00c8>, arg=arg@entry=(<SocketConnection at remote 0x7f0a9e924510>,), kw=kw@entry=0x0) at ../Objects/funcobject.c:523 #10 0x00000000004be724 in PyObject_Call (kw=0x0, arg=(<SocketConnection at remote 0x7f0a9e924510>,), func=<function at remote 0x7f0a9f6c00c8>) at ../Objects/abstract.c:2546 #11 instancemethod_call.8803 (func=<function at remote 0x7f0a9f6c00c8>, arg=(<SocketConnection at remote 0x7f0a9e924510>,), kw=0x0) at ../Objects/classobject.c:2602 #12 0x00000000004c4683 in PyObject_Call (kw=0x0, arg=(), func=<instancemethod at remote 0x7f0a9f7b5b90>) at ../Objects/abstract.c:2546 #13 PyEval_CallObjectWithKeywords (kw=0x0, arg=(), func=<instancemethod at remote 0x7f0a9f7b5b90>) at ../Python/ceval.c:4219 #14 slot_tp_del.25647 (self=self@entry=<SocketConnection at remote 0x7f0a9e924510>) at ../Objects/typeobject.c:5844 #15 0x00000000004c5880 in subtype_dealloc.25650 (self=<SocketConnection at remote 0x7f0a9e924510>) at ../Objects/typeobject.c:1002 #16 0x00000000005336cf in dict_dealloc.18423 (mp=0x7f0a9e93c398) at ../Objects/dictobject.c:1040 #17 0x00000000004c56e7 in subtype_dealloc.25650 ( self=<ClientConnectionJob(daemon=<Daemon(_Daemon__mustshutdown=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x7f0a9f759790>, acquire=<built-in method acquire of thread.lock object at remote 0x7f0a9f759790>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x7f0a9f759790>) at remote 0x7f0a9e9241d0>) at remote 0x7f0a9e924190>, objectsById={'root': <RootService(_scheduler=<Scheduler(_lock=<thread.lock at remote 0x7f0aa1211d50>, _operations={'cookie_9de0e8d93ed84452a1ce8cc92268979e': <Operation(_exception=None, _result='obj_450f0972022e41b2b7c1f29f49bdca83', _action=<function at remote 0x7f0a9f7b89b0>, _lock=<_RLock(_Verbose__verbose=False, _RLock__owner=None, _RLock__block=<thread.lock at remote 0x7f0a9f759370>, _RLock__count=0) at remote 0x7f0a9e936190>, _started_at=<float at remote 0xab90f8>, cookie='cookie_9de0e8d93ed84452a1ce8cc92268979e', _complete=<_Condition(_Condition__lock=...(truncated)) at ../Objects/typeobject.c:1035 #18 0x0000000000534b93 in frame_dealloc.14921 (f=Frame 0x7f0a9f6d49d8, for file /etc/remoting/remoting_agent.zip/Pyro4/socketserver/threadpool.py, line 39, in run ()) at ../Objects/frameobject.c:458 #19 0x00000000004d86c6 in tb_dealloc.46270 (tb=0x7f0aa1278f38) at ../Python/traceback.c:28 #20 0x00000000004d87e1 in tb_dealloc.46270 (tb=0x7f0aa1278ea8) at ../Python/traceback.c:27 #21 0x00000000005336cf in dict_dealloc.18423 (mp=0x7f0aa1277280) at ../Objects/dictobject.c:1040 #22 0x00000000005a52db in PyInterpreterState_Clear (interp=0x95ac80) at ../Python/pystate.c:111 #23 0x0000000000423aea in Py_Finalize () at ../Python/pythonrun.c:500 #24 0x00000000004672c5 in Py_Main (argc=<optimized out>, argv=0x7ffce435b2d8) at ../Modules/main.c:665 #25 0x00007f0aa0aa7f45 in __libc_start_main (main=0x4672f4 <main>, argc=4, argv=0x7ffce435b2d8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffce435b2c8) at libc-start.c:287 #26 0x000000000057c581 in _start () ==== Google shows similar trace has been reported back in 2009 for python 2.6 https://bugs.launchpad.net/ubuntu/+source/system-config-printer/+bug/478071 |
|||
msg280664 - (view) | Author: Eric V. Smith (eric.smith) * ![]() |
Date: 2016-11-12 17:30 | |
Can you provide a short sample that causes this error? Without a way to reproduce it, there's not a lot we can do. |
|||
msg280750 - (view) | Author: Inada Naoki (methane) * ![]() |
Date: 2016-11-14 11:04 | |
Traceback tells: #22 0x00000000005a52db in PyInterpreterState_Clear (interp=0x95ac80) at ../Python/pystate.c:111 Py_CLEAR(interp->sysdict); While cleaning up interp->sysdict... #19 0x00000000004d86c6 in tb_dealloc.46270 (tb=0x7f0aa1278f38) at ../Python/traceback.c:28 #20 0x00000000004d87e1 in tb_dealloc.46270 (tb=0x7f0aa1278ea8) at ../Python/traceback.c:27 Removing some traceback from sysdict. Maybe, sys.exc_traceback. #17 0x00000000004c56e7 in subtype_dealloc.25650 ( self=<ClientConnectionJob(daemon=<Daemon(_Daemon__mustshutdown=<_Event(_Verbose__verbose=False, _Event__flag=True, _Event__cond=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x7f0a9f759790>, acquire=<built-in method acquire of thread.lock object at remote 0x7f0a9f759790>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x7f0a9f759790>) at remote 0x7f0a9e9241d0>) at remote 0x7f0a9e924190>, objectsById={'root': <RootService(_scheduler=<Scheduler(_lock=<thread.lock at remote 0x7f0aa1211d50>, _operations={'cookie_9de0e8d93ed84452a1ce8cc92268979e': <Operation(_exception=None, _result='obj_450f0972022e41b2b7c1f29f49bdca83', _action=<function at remote 0x7f0a9f7b89b0>, _lock=<_RLock(_Verbose__verbose=False, _RLock__owner=None, _RLock__block=<thread.lock at remote 0x7f0a9f759370>, _RLock__count=0) at remote 0x7f0a9e936190>, _started_at=<float at remote 0xab90f8>, cookie='cookie_9de0e8d93ed84452a1ce8cc92268979e', _complete=<_Condition(_Condition__lock=...(truncated)) at ../Objects/typeobject.c:1035 #18 0x0000000000534b93 in frame_dealloc.14921 (f=Frame 0x7f0a9f6d49d8, for file /etc/remoting/remoting_agent.zip/Pyro4/socketserver/threadpool.py, line 39, in run ()) at ../Objects/frameobject.c:458 Removing frame object owned by the traceback. ... #7 PyEval_EvalFrameEx (f=f@entry=Frame 0x7f0a40000b50, for file /etc/remoting/remoting_agent.zip/Pyro4/socketutil.py, line 453, in __del__ (self=<SocketConnection at remote 0x7f0a9e924510>), throwflag=throwflag@entry=0) at ../Python/ceval.c:2987 #4 PyEval_EvalFrameEx (f=f@entry=Frame 0x7f0a9e929b60, for file /etc/remoting/remoting_agent.zip/Pyro4/socketutil.py, line 463, in close (self=<SocketConnection at remote 0x7f0a9e924510>), throwflag=throwflag@entry=0) at ../Python/ceval.c:3251 Calling SocketConnection.__del__. It calls SocketConnection.close. #3 0x000000000055acc0 in set_exc_info (tb=<traceback at remote 0x7f0aa1232878>, value=exceptions.AttributeError("'NoneType' object has no attribute 'SHUT_RDWR'",), type=<type at remote 0x920ce0>, tstate=0x95ad10) at ../Python/ceval.c:3736 SocketConnection.close raise AttributeError. #2 PySys_SetObject (name=name@entry=0x61d02a "exc_type", v=v@entry=<type at remote 0x920ce0>) at ../Python/sysmodule.c:83 Trying to register the exception to sys. #0 PyDict_SetItem (op=op@entry=0x0, key=key@entry='exc_type', value=value@entry=<type at remote 0x920ce0>) at ../Objects/dictobject.c:832 But interp->sysdict is NULL already. |
|||
msg280896 - (view) | Author: Michael Hu (pwp333) | Date: 2016-11-15 22:07 | |
Core is uploaded for python 2.7.10 to assist debugging. |
|||
msg280897 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2016-11-15 22:11 | |
Try to get the Python traceback of thread on the crash: try the faulthandler module. |
|||
msg280909 - (view) | Author: Michael Hu (pwp333) | Date: 2016-11-16 02:31 | |
(gdb) py-bt #4 Frame 0x7f0ab7a2db60, for file /etc/remoting/remoting_agent.zip/Pyro4/socketutil.py, line 463, in close (self=<SocketConnection at remote 0x7f0ab7a24190>) self.sock.shutdown(socket.SHUT_RDWR) #7 Frame 0x7f0ab0001760, for file /etc/remoting/remoting_agent.zip/Pyro4/socketutil.py, line 453, in __del__ (self=<SocketConnection at remote 0x7f0ab7a24190>) self.close() (gdb) py-list 458 def recv(self, size): 459 return receiveData(self.sock, size) 460 461 def close(self): 462 try: >463 self.sock.shutdown(socket.SHUT_RDWR) 464 except (OSError, socket.error): 465 pass 466 try: 467 self.sock.close() 468 except AttributeError: (gdb) f 7 #7 PyEval_EvalFrameEx ( f=f@entry=Frame 0x7f0ab0001760, for file /etc/remoting/remoting_agent.zip/Pyro4/socketutil.py, line 453, in __del__ (self=<SocketConnection at remote 0x7f0ab7a24190>), throwflag=throwflag@entry=0) at ../Python/ceval.c:2681 2681 in ../Python/ceval.c (gdb) py-list 448 def __init__(self, sock, objectId=None): 449 self.sock = sock 450 self.objectId = objectId 451 452 def __del__(self): >453 self.close() 454 455 def send(self, data): 456 sendData(self.sock, data) 457 458 def recv(self, size): (gdb) |
|||
msg280911 - (view) | Author: Inada Naoki (methane) * ![]() |
Date: 2016-11-16 04:51 | |
Can you check backtrace of main thread? |
|||
msg280921 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2016-11-16 07:13 | |
Ok but can you also try faulthandler to see all frames of all threads? |
|||
msg280922 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2016-11-16 07:15 | |
By the way, you can try your application on Python 3.6 using PYTHONMALLOC=debug env var. Or retry your application on Python 2.7 recompiled using ./configure --with-pydebug. I would like to know if Python sees a buffer overflow. |
|||
msg281067 - (view) | Author: Michael Hu (pwp333) | Date: 2016-11-17 23:39 | |
Could you please look into the core (attached into this bug) to figure out? This issue does not happen always. We are tight on resource to reproduce this. |
|||
msg281579 - (view) | Author: Irmen de Jong (irmen) ![]() |
Date: 2016-11-23 19:00 | |
Due to lack of example code to reproduce the issue, and because I'm mildly interested in this bug because it mentions Pyro4 (because I'm the author of that) I've tried to crash my system myself using Pyro4 and a simple torture test but it trucked on just fine. Pyro4 is not doing any "strange" things as far as I am aware. It does have its own (simple) thread pool of regular Python threads that are handling incoming proxy connections. (Had Pyro4 been doing weird things I suppose Python itself still should never core dump on the user but rather raise a regular exception if something was wrong) |
|||
msg281630 - (view) | Author: Inada Naoki (methane) * ![]() |
Date: 2016-11-24 11:58 | |
I can't get same C traceback yet. But attached script segfaults by different C traceback. I think this issue is dup of #1856 |
|||
msg281665 - (view) | Author: Irmen de Jong (irmen) ![]() |
Date: 2016-11-24 22:28 | |
The 28673-reproduce.py didn't crash on any of the systems I've tried it on. Are you sure it is complete? It looks like a part is missing. |
|||
msg281676 - (view) | Author: Inada Naoki (methane) * ![]() |
Date: 2016-11-25 03:07 | |
I'm sorry, I had uploaded script with wrong parameter. Additionally, this script fails by chance. $ while true; do python2 28673-reproduce2.py || break; echo -n .; done .....Segmentation fault (core dumped) |
|||
msg281680 - (view) | Author: Inada Naoki (methane) * ![]() |
Date: 2016-11-25 06:52 | |
Python 3 would have #21963 too. If we can fix it, we can fix this issue too. |
|||
msg281718 - (view) | Author: Inada Naoki (methane) * ![]() |
Date: 2016-11-25 13:39 | |
This script can cause deadlock on Python 3.6. 1. Other threads can register traceback to sys module after sys module dict is cleaned up. 2. In Py_Finalize, __del__ method of arbitrary object in the traceback can be called. 3. The __del__ method calls threading.Thread.start() 4. Thread.start() waits self.__started event which is set by Thread.__bootstrap 5. While finalizing, threads other than main thread exits when getting GIL. So Thread.__bootstrap() won't be called. 6. Thread.start() in main thread waits self.__started event forever. I suggest thread_PyThread_start_new_thread() raise RuntimeError("can't start thread: finalizing") |
|||
msg281721 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2016-11-25 14:25 | |
> I suggest thread_PyThread_start_new_thread() raise RuntimeError("can't start thread: finalizing") Where *exactly*? Py_FinalizeEx() first calls wait_for_thread_shutdown() and *then* sets _Py_Finalizing to tstate. Does wait_for_thread_shutdown() complete on this bug? Note: It's not the first time that I noticed a deadlock around this code :-/ I recall a silly bug when you press CTRL+c while wait_for_thread_shutdown() is running. See for example threading_shutdown_interrupted.py attached to the issue #20526. Run it and quickly press CTRL+c. Sometime, it does crash Python with a segfault... --- haypo@selma$ ./python threading_shutdown_interrupted.py ..........................................................................................................................................................................................................Exception ignored in: <module 'threading' from '/home/haypo/prog/python/default/Lib/threading.py'> Traceback (most recent call last): File "/home/haypo/prog/python/default/Lib/threading.py", line 1290, in _shutdown t.join() File "/home/haypo/prog/python/default/Lib/threading.py", line 1056, in join . self._wait_for_tstate_lock() File "/home/haypo/prog/python/default/Lib/threading.py", line 1072, in _wait_for_tstate_lock ... elif lock.acquire(block, timeout): File "threading_shutdown_interrupted.py", line 29, in killer .. raise KeyboardInterrupt() KeyboardInterrupt: ^CSegmentation fault (core dumped) --- Python finalization is a very fragile and complex task :-/ |
|||
msg281811 - (view) | Author: Inada Naoki (methane) * ![]() |
Date: 2016-11-27 07:38 | |
> Where *exactly*? See attached patch. But python with this patch can deadlock in other state. Main thread wait GIL but no other living threads have GIL. It seems caused by other issue around finalization and multithreading. > Py_FinalizeEx() first calls wait_for_thread_shutdown() and *then* sets _Py_Finalizing to tstate. > Does wait_for_thread_shutdown() complete on this bug? Yes. wait_for_thread_shutdown() waits only daemon threads. After _Py_Finalizing = tstate, main thread calls many __del__ methods while shutdown. If one of them starts new thread, this deadlock happens. |
|||
msg335594 - (view) | Author: Inada Naoki (methane) * ![]() |
Date: 2019-02-15 09:56 | |
I think this issue is fixed by #21963. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:58:39 | admin | set | github: 72859 |
2019-02-15 09:56:24 | methane | set | status: open -> closed resolution: fixed messages: + msg335594 stage: test needed -> resolved |
2016-11-27 07:38:24 | methane | set | files:
+ 28673-fix-deadlock.patch keywords: + patch messages: + msg281811 |
2016-11-25 14:25:22 | vstinner | set | messages: + msg281721 |
2016-11-25 13:39:14 | methane | set | files:
+ finish-deadlock.py messages: + msg281718 |
2016-11-25 06:52:40 | methane | set | messages: + msg281680 |
2016-11-25 03:07:22 | methane | set | files:
+ 28673-reproduce2.py messages: + msg281676 |
2016-11-24 22:28:24 | irmen | set | messages: + msg281665 |
2016-11-24 11:58:59 | methane | set | files:
+ 28673-reproduce.py messages: + msg281630 |
2016-11-23 19:00:53 | irmen | set | nosy:
+ irmen messages: + msg281579 |
2016-11-18 17:35:55 | terry.reedy | set | title: When using pyro4 with more than 15 threads, python 2.7.12 cores frequently -> pyro4 with more than 15 threads often crashes 2.7.12 stage: test needed |
2016-11-17 23:39:05 | pwp333 | set | messages: + msg281067 |
2016-11-16 07:15:23 | vstinner | set | messages: + msg280922 |
2016-11-16 07:13:30 | vstinner | set | messages: + msg280921 |
2016-11-16 04:51:16 | methane | set | messages: + msg280911 |
2016-11-16 02:31:06 | pwp333 | set | messages: + msg280909 |
2016-11-15 22:11:00 | vstinner | set | nosy:
+ vstinner messages: + msg280897 |
2016-11-15 22:07:44 | pwp333 | set | files:
+ core_python2.7.10.gz messages: + msg280896 |
2016-11-14 11:04:36 | methane | set | nosy:
+ methane messages: + msg280750 |
2016-11-12 17:30:22 | eric.smith | set | nosy:
+ eric.smith messages: + msg280664 |
2016-11-12 08:31:38 | pwp333 | create |