Issue17263
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2013-02-21 02:37 by Albert.Zeyer, last changed 2022-04-11 14:57 by admin.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
thread_local_concurrent.diff | neologix, 2013-02-26 08:49 |
Messages (28) | |||
---|---|---|---|
msg182577 - (view) | Author: Albert Zeyer (Albert.Zeyer) * | Date: 2013-02-21 02:37 | |
If you have some Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS in some tp_dealloc and you use such objects in thread local storage, you might get crashes, depending on which thread at what time is trying to cleanup such object. I haven't fully figured out the details but I have a somewhat reduced testcase. Note that I encountered this in practice because the sqlite connection object does that (while it disconnects, the GIL is released). This is the C code with some dummy type which has a tp_dealloc which just sleeps for some seconds while the GIL is released: https://github.com/albertz/playground/blob/master/testcrash_python_threadlocal.c This is the Python code: https://github.com/albertz/playground/blob/master/testcrash_python_threadlocal_py.py The Python code also contains some code path with a workaround which I'm using currently to avoid such crashes in my application. |
|||
msg182657 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-02-22 07:41 | |
Could you try with recent checkout of python 2.7? I wonder if this could be an occurrence of issue #13992 fixed by Antoine a couple months ago. |
|||
msg182720 - (view) | Author: Albert Zeyer (Albert.Zeyer) * | Date: 2013-02-23 07:27 | |
The latest 2.7 hg still crashes. |
|||
msg182721 - (view) | Author: Albert Zeyer (Albert.Zeyer) * | Date: 2013-02-23 07:35 | |
The backtrace: Thread 0:: Dispatch queue: com.apple.main-thread 0 libsystem_kernel.dylib 0x00007fff8a54e386 __semwait_signal + 10 1 libsystem_c.dylib 0x00007fff85e30800 nanosleep + 163 2 libsystem_c.dylib 0x00007fff85e30717 usleep + 54 3 testcrash_python_threadlocal.so 0x00000001002ddd40 test_dealloc + 48 4 python.exe 0x00000001000400a9 dict_dealloc + 153 (dictobject.c:1010) 5 python.exe 0x00000001000432d3 PyDict_DelItem + 259 (dictobject.c:855) 6 python.exe 0x00000001000d7f27 _localdummy_destroyed + 71 (threadmodule.c:585) 7 python.exe 0x0000000100006c61 PyObject_Call + 97 (abstract.c:2529) 8 python.exe 0x0000000100006e42 PyObject_CallFunctionObjArgs + 370 (abstract.c:2761) 9 python.exe 0x000000010006b2e6 PyObject_ClearWeakRefs + 534 (weakrefobject.c:892) 10 python.exe 0x00000001000d746b localdummy_dealloc + 27 (threadmodule.c:231) 11 python.exe 0x00000001000400a9 dict_dealloc + 153 (dictobject.c:1010) 12 python.exe 0x00000001000c003b PyThreadState_Clear + 139 (pystate.c:240) 13 python.exe 0x00000001000c02c8 PyInterpreterState_Clear + 56 (pystate.c:104) 14 python.exe 0x00000001000c1c68 Py_Finalize + 344 (pythonrun.c:504) 15 python.exe 0x00000001000d5891 Py_Main + 3041 (main.c:665) 16 python.exe 0x0000000100000a74 start + 52 Thread 1: 0 libsystem_kernel.dylib 0x00007fff8a54e386 __semwait_signal + 10 1 libsystem_c.dylib 0x00007fff85e30800 nanosleep + 163 2 libsystem_c.dylib 0x00007fff85e30717 usleep + 54 3 testcrash_python_threadlocal.so 0x00000001002ddd40 test_dealloc + 48 4 python.exe 0x00000001000400a9 dict_dealloc + 153 (dictobject.c:1010) 5 python.exe 0x00000001000432d3 PyDict_DelItem + 259 (dictobject.c:855) 6 python.exe 0x00000001000d7f27 _localdummy_destroyed + 71 (threadmodule.c:585) 7 python.exe 0x0000000100006c61 PyObject_Call + 97 (abstract.c:2529) 8 python.exe 0x0000000100006e42 PyObject_CallFunctionObjArgs + 370 (abstract.c:2761) 9 python.exe 0x000000010006b2e6 PyObject_ClearWeakRefs + 534 (weakrefobject.c:892) 10 python.exe 0x00000001000d746b localdummy_dealloc + 27 (threadmodule.c:231) 11 python.exe 0x00000001000400a9 dict_dealloc + 153 (dictobject.c:1010) 12 python.exe 0x00000001000c003b PyThreadState_Clear + 139 (pystate.c:240) 13 python.exe 0x00000001000d7ec4 t_bootstrap + 372 (threadmodule.c:643) 14 libsystem_c.dylib 0x00007fff85da6742 _pthread_start + 327 15 libsystem_c.dylib 0x00007fff85d93181 thread_start + 13 Thread 2: 0 libsystem_kernel.dylib 0x00007fff8a54e322 __select + 10 1 time.so 0x00000001002fb01b time_sleep + 139 (timemodule.c:948) 2 python.exe 0x000000010009fcfb PyEval_EvalFrameEx + 18011 (ceval.c:4021) 3 python.exe 0x00000001000a30f3 fast_function + 179 (ceval.c:4107) 4 python.exe 0x000000010009fdad PyEval_EvalFrameEx + 18189 (ceval.c:4042) 5 python.exe 0x00000001000a2fb7 PyEval_EvalCodeEx + 2103 (ceval.c:3253) 6 python.exe 0x000000010002f8cb function_call + 347 (funcobject.c:526) 7 python.exe 0x0000000100006c61 PyObject_Call + 97 (abstract.c:2529) 8 python.exe 0x00000001000a066a PyEval_EvalFrameEx + 20426 (ceval.c:4334) 9 python.exe 0x00000001000a30f3 fast_function + 179 (ceval.c:4107) 10 python.exe 0x000000010009fdad PyEval_EvalFrameEx + 18189 (ceval.c:4042) 11 python.exe 0x00000001000a30f3 fast_function + 179 (ceval.c:4107) 12 python.exe 0x000000010009fdad PyEval_EvalFrameEx + 18189 (ceval.c:4042) 13 python.exe 0x00000001000a2fb7 PyEval_EvalCodeEx + 2103 (ceval.c:3253) 14 python.exe 0x000000010002f8cb function_call + 347 (funcobject.c:526) 15 python.exe 0x0000000100006c61 PyObject_Call + 97 (abstract.c:2529) 16 python.exe 0x0000000100018b07 instancemethod_call + 439 (classobject.c:2603) 17 python.exe 0x0000000100006c61 PyObject_Call + 97 (abstract.c:2529) 18 python.exe 0x000000010009aaa4 PyEval_CallObjectWithKeywords + 180 (ceval.c:3891) 19 python.exe 0x00000001000d7d96 t_bootstrap + 70 (threadmodule.c:614) 20 libsystem_c.dylib 0x00007fff85da6742 _pthread_start + 327 21 libsystem_c.dylib 0x00007fff85d93181 thread_start + 13 Thread 3 Crashed: 0 python.exe 0x00000001000a2329 PyEval_EvalFrameEx + 27785 (ceval.c:2995) 1 python.exe 0x00000001000a2fb7 PyEval_EvalCodeEx + 2103 (ceval.c:3253) 2 python.exe 0x000000010002f8cb function_call + 347 (funcobject.c:526) 3 python.exe 0x0000000100006c61 PyObject_Call + 97 (abstract.c:2529) 4 python.exe 0x00000001000a066a PyEval_EvalFrameEx + 20426 (ceval.c:4334) 5 python.exe 0x00000001000a30f3 fast_function + 179 (ceval.c:4107) 6 python.exe 0x000000010009fdad PyEval_EvalFrameEx + 18189 (ceval.c:4042) 7 python.exe 0x00000001000a30f3 fast_function + 179 (ceval.c:4107) 8 python.exe 0x000000010009fdad PyEval_EvalFrameEx + 18189 (ceval.c:4042) 9 python.exe 0x00000001000a2fb7 PyEval_EvalCodeEx + 2103 (ceval.c:3253) 10 python.exe 0x000000010002f8cb function_call + 347 (funcobject.c:526) 11 python.exe 0x0000000100006c61 PyObject_Call + 97 (abstract.c:2529) 12 python.exe 0x0000000100018b07 instancemethod_call + 439 (classobject.c:2603) 13 python.exe 0x0000000100006c61 PyObject_Call + 97 (abstract.c:2529) 14 python.exe 0x000000010009aaa4 PyEval_CallObjectWithKeywords + 180 (ceval.c:3891) 15 python.exe 0x00000001000d7d96 t_bootstrap + 70 (threadmodule.c:614) 16 libsystem_c.dylib 0x00007fff85da6742 _pthread_start + 327 17 libsystem_c.dylib 0x00007fff85d93181 thread_start + 13 |
|||
msg182730 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-02-23 10:30 | |
Alright, here's what's going on. When the main thread exits, it triggers the interpreter shutdown, which clears all the tstates in PyInterpreterState_Clear(): """ void PyInterpreterState_Clear(PyInterpreterState *interp) { PyThreadState *p; HEAD_LOCK(); for (p = interp->tstate_head; p != NULL; p = p->next) PyThreadState_Clear(p); """ PyThreadState_Clear() clears the TLS dict: """ void PyThreadState_Clear(PyThreadState *tstate) { if (Py_VerboseFlag && tstate->frame != NULL) fprintf(stderr, "PyThreadState_Clear: warning: thread still has a frame\n"); Py_CLEAR(tstate->frame); Py_CLEAR(tstate->dict); """ This deallocation of the TLS dict But when the TLS object is deallocated, if it releases the GIL, this can make other threads runnable, while the interpreter is shutting down (and the tstate are in an unusable state), so all bets are off. Note that this can only happen if there are daemon threads, which is the case in your testcase. Basically, the problem is that arbitrary code can be run while the interpreter is shutting down because of the TLS deallocation. I'm not sure about how to handle it, but one possibility to limit such problems would be to not deallocate the tstate if a thread is currently still active: """ diff --git a/Python/pystate.c b/Python/pystate.c --- a/Python/pystate.c +++ b/Python/pystate.c @@ -230,9 +230,12 @@ void PyThreadState_Clear(PyThreadState *tstate) { - if (Py_VerboseFlag && tstate->frame != NULL) - fprintf(stderr, - "PyThreadState_Clear: warning: thread still has a frame\n"); + if (tstate->frame != NULL) { + if (Py_VerboseFlag) + fprintf(stderr, + "PyThreadState_Clear: warning: thread still has a frame\n"); + return; + } Py_CLEAR(tstate->frame); """ But this would leak to memory leak in some cases... |
|||
msg182731 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-02-23 10:38 | |
Albert, this happens because daemon threads continue running during interpreter shutdown. I suppose the problem goes away if you make the thread non-daemonic? This shouldn't be a problem in Python 3 where Python threads cannot switch during shutdown. |
|||
msg182732 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-02-23 10:55 | |
> This shouldn't be a problem in Python 3 where Python threads cannot switch > during shutdown. What happens if the GIL is relased during shutdown? Also, I'm a bit worried about this code: """ void PyThreadState_Clear(PyThreadState *tstate) { if (Py_VerboseFlag && tstate->frame != NULL) fprintf(stderr, "PyThreadState_Clear: warning: thread still has a frame\n"); Py_CLEAR(tstate->frame); Py_CLEAR(tstate->dict); """ The TLS dict is deallocated after having cleared the frame, which could lead to surprises, no? |
|||
msg182735 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-02-23 11:08 | |
> What happens if the GIL is relased during shutdown? In PyEval_RestoreThread(), any thread other than the main thread trying to take the GIL will immediately exit: take_gil(tstate); if (_Py_Finalizing && tstate != _Py_Finalizing) { drop_gil(tstate); PyThread_exit_thread(); assert(0); /* unreachable */ } > The TLS dict is deallocated after having cleared the frame, which > could lead to surprises, no? I don't know. Can you think of a situation where there is a problem? |
|||
msg182745 - (view) | Author: Albert Zeyer (Albert.Zeyer) * | Date: 2013-02-23 14:22 | |
Note that in my original application where I encountered this (with sqlite), the backtrace looks slightly different. It is at shutdown, but not at interpreter shutdown - the main thread is still running. https://github.com/albertz/music-player/issues/23 I was trying to reproduce it in a similar way with this test case but in the test case, so far I could only reproduce the crash when it does the interpreter shutdown. |
|||
msg182760 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-02-23 16:47 | |
> Note that in my original application where I encountered this (with sqlite), the backtrace looks slightly different. It is at shutdown, but not at interpreter shutdown - the main thread is still running. Could you post a traceback of this crash? |
|||
msg182761 - (view) | Author: Albert Zeyer (Albert.Zeyer) * | Date: 2013-02-23 16:55 | |
Here is one. Others are in the issue report on GitHub. In Thread 5, the PyObject_SetAttr is where some attribute containing a threading.local object is set to None. This threading.local object had a reference to a sqlite connection object (in some TLS contextes). This should also be the actual crashing thread. I use faulthandler which makes it look like Thread 0 crashed in the crash reporter. I had this crash about 5% of the time - but totally unpredictable. But it was always happening in exactly that line where the attribute was set to None. Thread 0 Crashed:: Dispatch queue: com.apple.main-thread 0 libsystem_kernel.dylib 0x00007fff8a54e0fa __psynch_cvwait + 10 1 libsystem_c.dylib 0x00007fff85daaf89 _pthread_cond_wait + 869 2 org.python.python 0x000000010006f54e PyThread_acquire_lock + 96 3 org.python.python 0x000000010001d8e3 PyEval_RestoreThread + 61 4 org.python.python 0x0000000100075bf3 0x100009000 + 445427 5 org.python.python 0x0000000100020041 PyEval_EvalFrameEx + 7548 6 org.python.python 0x000000010001e281 PyEval_EvalCodeEx + 1956 7 org.python.python 0x0000000100024661 0x100009000 + 112225 8 org.python.python 0x00000001000200d2 PyEval_EvalFrameEx + 7693 9 org.python.python 0x000000010001e281 PyEval_EvalCodeEx + 1956 10 org.python.python 0x0000000100024661 0x100009000 + 112225 11 org.python.python 0x00000001000200d2 PyEval_EvalFrameEx + 7693 12 org.python.python 0x000000010001e281 PyEval_EvalCodeEx + 1956 13 org.python.python 0x000000010005df78 0x100009000 + 348024 14 org.python.python 0x000000010001caba PyObject_Call + 97 15 _objc.so 0x0000000104615898 0x104600000 + 88216 16 libffi.dylib 0x00007fff8236e8a6 ffi_closure_unix64_inner + 508 17 libffi.dylib 0x00007fff8236df66 ffi_closure_unix64 + 70 18 com.apple.AppKit 0x00007fff84f63f3f -[NSApplication _docController:shouldTerminate:] + 75 19 com.apple.AppKit 0x00007fff84f63e4e __91-[NSDocumentController(NSInternal) _closeAllDocumentsWithDelegate:shouldTerminateSelector:]_block_invoke_0 + 159 20 com.apple.AppKit 0x00007fff84f63cea -[NSDocumentController(NSInternal) _closeAllDocumentsWithDelegate:shouldTerminateSelector:] + 1557 21 com.apple.AppKit 0x00007fff84f636ae -[NSDocumentController(NSInternal) __closeAllDocumentsWithDelegate:shouldTerminateSelector:] + 265 22 com.apple.AppKit 0x00007fff84f6357f -[NSApplication _shouldTerminate] + 772 23 com.apple.AppKit 0x00007fff84f9134f -[NSApplication(NSAppleEventHandling) _handleAEQuit] + 403 24 com.apple.AppKit 0x00007fff84d40261 -[NSApplication(NSAppleEventHandling) _handleCoreEvent:withReplyEvent:] + 660 25 com.apple.Foundation 0x00007fff867e112b -[NSAppleEventManager dispatchRawAppleEvent:withRawReply:handlerRefCon:] + 308 26 com.apple.Foundation 0x00007fff867e0f8d _NSAppleEventManagerGenericHandler + 106 27 com.apple.AE 0x00007fff832eeb48 aeDispatchAppleEvent(AEDesc const*, AEDesc*, unsigned int, unsigned char*) + 307 28 com.apple.AE 0x00007fff832ee9a9 dispatchEventAndSendReply(AEDesc const*, AEDesc*) + 37 29 com.apple.AE 0x00007fff832ee869 aeProcessAppleEvent + 318 30 com.apple.HIToolbox 0x00007fff8e19f8e9 AEProcessAppleEvent + 100 31 com.apple.AppKit 0x00007fff84d3c916 _DPSNextEvent + 1456 32 com.apple.AppKit 0x00007fff84d3bed2 -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] + 128 33 com.apple.AppKit 0x00007fff84d33283 -[NSApplication run] + 517 34 libffi.dylib 0x00007fff8236dde4 ffi_call_unix64 + 76 35 libffi.dylib 0x00007fff8236e619 ffi_call + 853 36 _objc.so 0x000000010461a663 PyObjCFFI_Caller + 1980 37 _objc.so 0x000000010462f43e 0x104600000 + 193598 38 org.python.python 0x000000010001caba PyObject_Call + 97 39 org.python.python 0x0000000100020225 PyEval_EvalFrameEx + 8032 40 org.python.python 0x00000001000245eb 0x100009000 + 112107 41 org.python.python 0x00000001000200d2 PyEval_EvalFrameEx + 7693 42 org.python.python 0x000000010001e281 PyEval_EvalCodeEx + 1956 43 org.python.python 0x000000010001dad7 PyEval_EvalCode + 54 44 org.python.python 0x0000000100054933 0x100009000 + 309555 45 org.python.python 0x00000001000549ff PyRun_FileExFlags + 165 46 org.python.python 0x00000001000543e9 PyRun_SimpleFileExFlags + 410 47 albertzeyer.MusicPlayer 0x0000000100001f54 main + 682 (main.m:67) 48 albertzeyer.MusicPlayer 0x0000000100001c6d _start + 203 49 albertzeyer.MusicPlayer 0x0000000100001ba1 start + 33 Thread 1:: Dispatch queue: com.apple.libdispatch-manager 0 libsystem_kernel.dylib 0x00007fff8a54ed16 kevent + 10 1 libdispatch.dylib 0x00007fff88230dea _dispatch_mgr_invoke + 883 2 libdispatch.dylib 0x00007fff882309ee _dispatch_mgr_thread + 54 Thread 2: 0 libsystem_kernel.dylib 0x00007fff8a54e0fa __psynch_cvwait + 10 1 libsystem_c.dylib 0x00007fff85daaf89 _pthread_cond_wait + 869 2 org.python.python 0x000000010006f54e PyThread_acquire_lock + 96 3 org.python.python 0x000000010001d8e3 PyEval_RestoreThread + 61 4 _sqlite3.so 0x000000010a4041f1 pysqlite_connection_dealloc + 76 5 org.python.python 0x00000001000729f3 0x100009000 + 432627 6 org.python.python 0x00000001000729f3 0x100009000 + 432627 7 org.python.python 0x0000000100052b55 PyThreadState_Clear + 136 8 org.python.python 0x000000010007610a 0x100009000 + 446730 9 libsystem_c.dylib 0x00007fff85da6742 _pthread_start + 327 10 libsystem_c.dylib 0x00007fff85d93181 thread_start + 13 Thread 3: 0 libsystem_kernel.dylib 0x00007fff8a54e0fa __psynch_cvwait + 10 1 libsystem_c.dylib 0x00007fff85daaf89 _pthread_cond_wait + 869 2 org.python.python 0x000000010006f54e PyThread_acquire_lock + 96 3 org.python.python 0x000000010001d8e3 PyEval_RestoreThread + 61 4 _objc.so 0x00000001046234a3 0x104600000 + 144547 5 org.python.python 0x00000001000a4194 0x100009000 + 635284 6 org.python.python 0x0000000100021a49 PyEval_EvalFrameEx + 14212 7 org.python.python 0x00000001000245eb 0x100009000 + 112107 8 org.python.python 0x00000001000200d2 PyEval_EvalFrameEx + 7693 9 org.python.python 0x000000010001e281 PyEval_EvalCodeEx + 1956 10 org.python.python 0x000000010005df78 0x100009000 + 348024 11 org.python.python 0x000000010001caba PyObject_Call + 97 12 org.python.python 0x000000010001ec59 PyEval_EvalFrameEx + 2452 13 org.python.python 0x00000001000245eb 0x100009000 + 112107 14 org.python.python 0x00000001000200d2 PyEval_EvalFrameEx + 7693 15 org.python.python 0x00000001000245eb 0x100009000 + 112107 16 org.python.python 0x00000001000200d2 PyEval_EvalFrameEx + 7693 17 org.python.python 0x000000010001e281 PyEval_EvalCodeEx + 1956 18 org.python.python 0x000000010005df78 0x100009000 + 348024 19 org.python.python 0x000000010001caba PyObject_Call + 97 20 org.python.python 0x000000010003719a 0x100009000 + 188826 21 org.python.python 0x000000010001caba PyObject_Call + 97 22 org.python.python 0x0000000100023dfc PyEval_CallObjectWithKeywords + 177 23 org.python.python 0x0000000100076010 0x100009000 + 446480 24 libsystem_c.dylib 0x00007fff85da6742 _pthread_start + 327 25 libsystem_c.dylib 0x00007fff85d93181 thread_start + 13 Thread 4: 0 libsystem_kernel.dylib 0x00007fff8a54e0fa __psynch_cvwait + 10 1 libsystem_c.dylib 0x00007fff85daaf89 _pthread_cond_wait + 869 2 org.python.python 0x000000010006f54e PyThread_acquire_lock + 96 3 org.python.python 0x000000010001d8e3 PyEval_RestoreThread + 61 4 org.python.python 0x0000000100053351 PyGILState_Ensure + 93 5 _objc.so 0x0000000104609b6e 0x104600000 + 39790 6 libobjc.A.dylib 0x00007fff880c6230 (anonymous namespace)::AutoreleasePoolPage::pop(void*) + 464 7 com.apple.CoreFoundation 0x00007fff8ec15342 _CFAutoreleasePoolPop + 34 8 com.apple.Foundation 0x00007fff867e003d -[NSAutoreleasePool release] + 154 9 com.apple.CoreFoundation 0x00007fff8ebed85a CFRelease + 170 10 _objc.so 0x000000010462349b 0x104600000 + 144539 11 org.python.python 0x00000001000a4194 0x100009000 + 635284 12 org.python.python 0x0000000100021a49 PyEval_EvalFrameEx + 14212 13 org.python.python 0x000000010001e281 PyEval_EvalCodeEx + 1956 14 org.python.python 0x0000000100024661 0x100009000 + 112225 15 org.python.python 0x00000001000200d2 PyEval_EvalFrameEx + 7693 16 org.python.python 0x00000001000245eb 0x100009000 + 112107 17 org.python.python 0x00000001000200d2 PyEval_EvalFrameEx + 7693 18 org.python.python 0x000000010001e281 PyEval_EvalCodeEx + 1956 19 org.python.python 0x000000010005df78 0x100009000 + 348024 20 org.python.python 0x000000010001caba PyObject_Call + 97 21 org.python.python 0x000000010001ec59 PyEval_EvalFrameEx + 2452 22 org.python.python 0x00000001000245eb 0x100009000 + 112107 23 org.python.python 0x00000001000200d2 PyEval_EvalFrameEx + 7693 24 org.python.python 0x00000001000245eb 0x100009000 + 112107 25 org.python.python 0x00000001000200d2 PyEval_EvalFrameEx + 7693 26 org.python.python 0x000000010001e281 PyEval_EvalCodeEx + 1956 27 org.python.python 0x000000010005df78 0x100009000 + 348024 28 org.python.python 0x000000010001caba PyObject_Call + 97 29 org.python.python 0x000000010003719a 0x100009000 + 188826 30 org.python.python 0x000000010001caba PyObject_Call + 97 31 org.python.python 0x0000000100023dfc PyEval_CallObjectWithKeywords + 177 32 org.python.python 0x0000000100076010 0x100009000 + 446480 33 libsystem_c.dylib 0x00007fff85da6742 _pthread_start + 327 34 libsystem_c.dylib 0x00007fff85d93181 thread_start + 13 Thread 5: 0 org.python.python 0x000000010007575e 0x100009000 + 444254 1 org.python.python 0x0000000100071cbe 0x100009000 + 429246 2 org.python.python 0x0000000100071bcd PyDict_SetItem + 145 3 org.python.python 0x0000000100079a55 PyObject_GenericSetAttr + 327 4 org.python.python 0x0000000100079538 PyObject_SetAttr + 157 5 org.python.python 0x000000010001f303 PyEval_EvalFrameEx + 4158 6 org.python.python 0x00000001000245eb 0x100009000 + 112107 7 org.python.python 0x00000001000200d2 PyEval_EvalFrameEx + 7693 8 org.python.python 0x00000001000245eb 0x100009000 + 112107 9 org.python.python 0x00000001000200d2 PyEval_EvalFrameEx + 7693 10 org.python.python 0x00000001000245eb 0x100009000 + 112107 11 org.python.python 0x00000001000200d2 PyEval_EvalFrameEx + 7693 12 org.python.python 0x000000010001e281 PyEval_EvalCodeEx + 1956 13 org.python.python 0x000000010005df78 0x100009000 + 348024 14 org.python.python 0x000000010001caba PyObject_Call + 97 15 org.python.python 0x000000010001ec59 PyEval_EvalFrameEx + 2452 16 org.python.python 0x00000001000245eb 0x100009000 + 112107 17 org.python.python 0x00000001000200d2 PyEval_EvalFrameEx + 7693 18 org.python.python 0x00000001000245eb 0x100009000 + 112107 19 org.python.python 0x00000001000200d2 PyEval_EvalFrameEx + 7693 20 org.python.python 0x000000010001e281 PyEval_EvalCodeEx + 1956 21 org.python.python 0x000000010005df78 0x100009000 + 348024 22 org.python.python 0x000000010001caba PyObject_Call + 97 23 org.python.python 0x000000010003719a 0x100009000 + 188826 24 org.python.python 0x000000010001caba PyObject_Call + 97 25 org.python.python 0x0000000100023dfc PyEval_CallObjectWithKeywords + 177 26 org.python.python 0x0000000100076010 0x100009000 + 446480 27 libsystem_c.dylib 0x00007fff85da6742 _pthread_start + 327 28 libsystem_c.dylib 0x00007fff85d93181 thread_start + 13 Thread 6: 0 libsystem_kernel.dylib 0x00007fff8a54e386 __semwait_signal + 10 1 libsystem_c.dylib 0x00007fff85e30800 nanosleep + 163 2 libsystem_c.dylib 0x00007fff85e30717 usleep + 54 3 ffmpeg.so 0x000000010bd7609d PlayerObject::workerProc(PyMutex&, bool&) + 509 (ffmpeg_player_decoding.cpp:1087) 4 ffmpeg.so 0x000000010bd78ac2 boost::function2<void, PyMutex&, bool&>::operator()(PyMutex&, bool&) const + 28 (function_template.hpp:759) 5 ffmpeg.so 0x000000010bd78736 PyThread_thread(void*) + 25 (ffmpeg_utils.cpp:98) 6 libsystem_c.dylib 0x00007fff85da6742 _pthread_start + 327 7 libsystem_c.dylib 0x00007fff85d93181 thread_start + 13 Thread 7: 0 libsystem_kernel.dylib 0x00007fff8a54e322 __select + 10 1 time.so 0x00000001007f9d83 0x1007f9000 + 3459 2 org.python.python 0x0000000100020041 PyEval_EvalFrameEx + 7548 3 org.python.python 0x000000010001e281 PyEval_EvalCodeEx + 1956 4 org.python.python 0x000000010005df78 0x100009000 + 348024 5 org.python.python 0x000000010001caba PyObject_Call + 97 6 org.python.python 0x000000010001ec59 PyEval_EvalFrameEx + 2452 7 org.python.python 0x00000001000245eb 0x100009000 + 112107 8 org.python.python 0x00000001000200d2 PyEval_EvalFrameEx + 7693 9 org.python.python 0x00000001000245eb 0x100009000 + 112107 10 org.python.python 0x00000001000200d2 PyEval_EvalFrameEx + 7693 11 org.python.python 0x000000010001e281 PyEval_EvalCodeEx + 1956 12 org.python.python 0x000000010005df78 0x100009000 + 348024 13 org.python.python 0x000000010001caba PyObject_Call + 97 14 org.python.python 0x000000010003719a 0x100009000 + 188826 15 org.python.python 0x000000010001caba PyObject_Call + 97 16 org.python.python 0x0000000100023dfc PyEval_CallObjectWithKeywords + 177 17 org.python.python 0x0000000100076010 0x100009000 + 446480 18 libsystem_c.dylib 0x00007fff85da6742 _pthread_start + 327 19 libsystem_c.dylib 0x00007fff85d93181 thread_start + 13 Thread 8:: com.apple.audio.IOThread.client 0 libsystem_kernel.dylib 0x00007fff8a54c686 mach_msg_trap + 10 1 libsystem_kernel.dylib 0x00007fff8a54bc42 mach_msg + 70 2 com.apple.audio.CoreAudio 0x00007fff825a117a HALB_MachPort::SendMessageWithReply(unsigned int, unsigned int, unsigned int, unsigned int, mach_msg_header_t*, bool, unsigned int) + 98 3 com.apple.audio.CoreAudio 0x00007fff825a1108 HALB_MachPort::SendSimpleMessageWithSimpleReply(unsigned int, unsigned int, int, int&, bool, unsigned int) + 42 4 com.apple.audio.CoreAudio 0x00007fff8259f8db HALC_ProxyIOContext::IOWorkLoop() + 1209 5 com.apple.audio.CoreAudio 0x00007fff8259f391 HALC_ProxyIOContext::IOThreadEntry(void*) + 83 6 com.apple.audio.CoreAudio 0x00007fff8259f24b HALB_IOThread::Entry(void*) + 75 7 libsystem_c.dylib 0x00007fff85da6742 _pthread_start + 327 8 libsystem_c.dylib 0x00007fff85d93181 thread_start + 13 |
|||
msg182765 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-02-23 17:00 | |
> Here is one. Others are in the issue report on GitHub. Yes, I've seen it, but I'd need a backtrace with line numbers (like the one you posted above). thread 5 is crashing, but I don't know where. |
|||
msg182771 - (view) | Author: Albert Zeyer (Albert.Zeyer) * | Date: 2013-02-23 17:11 | |
Sadly, that is quite complicated or almost impossible. It needs the MacOSX system Python and that one lacks debugging information. I just tried with the CPython vom hg-2.7. But it seems the official Python doesn't have objc bindings (and I also need Cocoa bindings) so I can't easily run this right now (and another GUI is not yet implemented). |
|||
msg182800 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-02-23 19:35 | |
I have two questions: - how do you know the crash really happens because of thread 5? - when the thread.local object is being deleted, has another thread just started looking up its attributes? |
|||
msg182806 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-02-23 19:57 | |
Another question: are threads being started or stopped while the thread local object is being deleted? |
|||
msg182829 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-02-23 22:27 | |
> - how do you know the crash really happens because of thread 5? All other threads are blocked on locks or condition variables, it's the only runnable thread. > Another question: are threads being started or stopped while the thread local object is being deleted? From the stack trace, thread 2 is being stopped. I guess the problem is similar to above: thread 2 is in the middle of stopping, its TLS dict is deallocated, which triggers the thread local object deallocation, which releases the GIL. Thread 5 becomes running, and must somehow access thread 2 tstate. It would be much easier with a backtrace, though. |
|||
msg182830 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-02-23 22:35 | |
> > - how do you know the crash really happens because of thread 5? > > All other threads are blocked on locks or condition variables, it's > the only runnable thread. Hm, you are right. > > Another question: are threads being started or stopped while the thread local object is being deleted? > > >From the stack trace, thread 2 is being stopped. > > I guess the problem is similar to above: thread 2 is in the middle of > stopping, its TLS dict is deallocated, which triggers the thread local > object deallocation, which releases the GIL. Thread 5 becomes running, > and must somehow access thread 2 tstate. I've read the code several times and I find it unlikely that it's the cause of the problem: - the thread state's thread-local dict (tstate->dict) is deallocated using Py_CLEAR(), meaning it's unreachable from other threads when deallocating one of the values releases the GIL - the thread-local object's deallocator checks that tstate->dict is non-NULL before using it; the only thing that could go wrong is if PyDict_GetItem() releases the GIL, which sounds unlikely on tstate->dict (also, I've checked that threadmodule.c holds the GIL when inserting and removing thread states from the interpreter's thread states list; it would be more future-proof for local_dealloc to use pystate.c's HEAD_LOCK() and HEAD_UNLOCK() APIs, though) I'm wondering if there's something else interfering here. My attempts at writing a stress-test script have failed to produce any crash. |
|||
msg182898 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-02-24 21:59 | |
I don't know how OS X crash report works, but it seems to have at least some debug info available, since some ymbols are resolved in the backtrace. You might be able to get more info with gdb, with something like: """ gdb /path/to/python (gdb) info line *<crash address> (gdb) disassemble <crash address> """ Otherwise, is there are way to run your code on Linux? |
|||
msg182935 - (view) | Author: Albert Zeyer (Albert.Zeyer) * | Date: 2013-02-25 13:37 | |
The symbols are there because it is a library which exports all the symbols. Other debugging information are not there and I don't know any place where I can get them. It currently cannot work on Linux in the same way because the GUI is Cocoa only right now. I'm trying to get it to run with another Python on Mac, though. Note that in threadmodule.c, in local_clear, we are iterating through all threads: /* Remove all strong references to dummies from the thread states */ if (self->key && (tstate = PyThreadState_Get()) && tstate->interp) { for(tstate = PyInterpreterState_ThreadHead(tstate->interp); tstate; tstate = PyThreadState_Next(tstate)) if (tstate->dict && PyDict_GetItem(tstate->dict, self->key)) PyDict_DelItem(tstate->dict, self->key); } In PyDict_DelItem, if the GIL is released and meanwhile, the list of threadstates is altered, is that a problem for this loop? So maybe tstate becomes invalid there. I also noticed this part in another backtrace of the same crash: Thread 2: 0 libsystem_kernel.dylib 0x00007fff8a54e0fa __psynch_cvwait + 10 1 libsystem_c.dylib 0x00007fff85daaf89 _pthread_cond_wait + 869 2 org.python.python 0x000000010006f54e PyThread_acquire_lock + 96 3 org.python.python 0x000000010001d8e3 PyEval_RestoreThread + 61 4 org.python.python 0x0000000100053351 PyGILState_Ensure + 93 5 _objc.so 0x0000000103b89b6e 0x103b80000 + 39790 6 libobjc.A.dylib 0x00007fff880c6230 (anonymous namespace)::AutoreleasePoolPage::pop(void*) + 464 7 libobjc.A.dylib 0x00007fff880c85a2 (anonymous namespace)::AutoreleasePoolPage::tls_dealloc(void*) + 42 8 libsystem_c.dylib 0x00007fff85dad4fe _pthread_tsd_cleanup + 240 9 libsystem_c.dylib 0x00007fff85da69a2 _pthread_exit + 146 10 libsystem_c.dylib 0x00007fff85da674d _pthread_start + 338 11 libsystem_c.dylib 0x00007fff85d93181 thread_start + 13 This seems to be a non-Python thread, so PyGILState_Ensure would have created a new threadstate and this would have altered the list. |
|||
msg183022 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-02-26 06:57 | |
> Note that in threadmodule.c, in local_clear, we are iterating through all threads: > > In PyDict_DelItem, if the GIL is released and meanwhile, the list of threadstates is altered, is that a problem for this loop? So maybe tstate becomes invalid there. Yes. If PyDict_DelItem() releases the GIL and tstate is deleted, PyThreadState_Next(tstate) is undefined behavior (it accesses tstate->next). Changing your reproducer to create/wait for termination of threads in a loop in a background thread. |
|||
msg183033 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-02-26 08:49 | |
And here's a patch. |
|||
msg183044 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-02-26 11:08 | |
> And here's a patch. Wouldn't it be better to expose and re-use the HEAD_LOCK and HEAD_UNLOCK macros from pystate.c? That said, I doubt this is the issue here. We are removing a string key pointing to a localdummy object. Both are small atomic types not handled by the GC, so I don't see how deallocating these objects could release the GIL. |
|||
msg183049 - (view) | Author: Albert Zeyer (Albert.Zeyer) * | Date: 2013-02-26 12:53 | |
Btw., where we are at this issue - I have seen many more loops over the threads (via PyThreadState_Next). I have a bad feeling that many of these loops have similar issues. In this case, I am also not sure anymore that it really was a problem. I originally thought that in this loop, it would delete the local-dicts (which contained my Test-object/sqlite connection object). But it does not, it only deallocates a string and the dummy object there. The local-dicts were already been freed at Py_CLEAR(dummies). I still tried to reproduce the crash in the testcase even when the interpreter is not shutting down (like it looks in my musicplayer app) but no success. I also wasn't able yet to get more debugging info about the musicplayer app crash. Note that in the musicplayer app, I have the same workaround now as demonstrated in the testcase and there aren't any crashes anymore (so far - they were seldom anyway). |
|||
msg183050 - (view) | Author: Albert Zeyer (Albert.Zeyer) * | Date: 2013-02-26 13:01 | |
> Wouldn't it be better to expose and re-use the HEAD_LOCK and HEAD_UNLOCK macros from pystate.c? The macro-names HEAD_LOCK/HEAD_UNLOCK irritates me a bit. Protecting only the head would not be enough. Any tstate object could be invalidated. But actually, it protects any modification on the list (both in tstate_delete_common and in new_threadstate), as far as I see it. But yes, it would be a good thing to export this locking functionality so other code can use it. |
|||
msg183052 - (view) | Author: Charles-François Natali (neologix) * | Date: 2013-02-26 13:14 | |
> Wouldn't it be better to expose and re-use the HEAD_LOCK and HEAD_UNLOCK > macros from pystate.c? I don't like holding locks before calling "alien" code, it's a recipe for deadlocks: for example, if another thread-local object was deallocated as part of the deallocation chain, we would call back into local_clear(), and deadlock. > That said, I doubt this is the issue here. We are removing a string key pointing > to a localdummy object. Both are small atomic types not handled by the GC, so > I don't see how deallocating these objects could release the GIL. Yes, it shouldn't happen, the thread local dict is deallocated right before (I initially thought the thread local dict was deallocated here). Without a proper backtrace, i'ts going to be hard to debug... |
|||
msg183056 - (view) | Author: Albert Zeyer (Albert.Zeyer) * | Date: 2013-02-26 13:40 | |
> > Wouldn't it be better to expose and re-use the HEAD_LOCK and HEAD_UNLOCK > > macros from pystate.c? > I don't like holding locks before calling "alien" code, it's a recipe > for deadlocks: for example, if another thread-local object was > deallocated as part of the deallocation chain, we would call back into > local_clear(), and deadlock. Ah, yes. Right now, the head-lock is acquired while the GIL is held. So while the head-lock is held, we must not unlock the GIL. So this wouldn't work. Btw., I think it also does happen already. While playing around with this test case, I sometimes encountered a deadlock at quit. I was thinking that it was the result of some badly written memory. But I just saw this code (PyInterpreterState_Clear): HEAD_LOCK(); for (p = interp->tstate_head; p != NULL; p = p->next) PyThreadState_Clear(p); HEAD_UNLOCK(); So, if something inside PyThreadState_Clear unlocks the GIL and some other thread acquires the GIL and then tries to HEAD_LOCK (for example, at thread exit), you have a classic deadlock. A solution would be: Only acquire the head-mutex while the GIL is not held. Then, after you held the head-mutex, also acquire the GIL. |
|||
msg183064 - (view) | Author: Albert Zeyer (Albert.Zeyer) * | Date: 2013-02-26 15:58 | |
Btw., this turns out to be at least 4 kind of separate bugs: 1. The crash from the testcase - when the interpreter shuts down. 2. Maybe the crash from my musicplayer app - if that is a different one. But very related to the first one. 3. Many loops over the thread states could have code inside which might release the GIL. All these loops can crash because the thread state could be invalidated in the meanwhile. 4. Possible deadlock with HEAD_LOCK usage. Should we make separate issue reports for each? |
|||
msg381236 - (view) | Author: Irit Katriel (iritkatriel) * | Date: 2020-11-17 14:15 | |
Is this a python 2-only issue? |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:42 | admin | set | github: 61465 |
2020-11-17 15:04:15 | neologix | set | nosy:
- neologix |
2020-11-17 14:15:30 | iritkatriel | set | nosy:
+ iritkatriel messages: + msg381236 |
2013-04-03 21:15:07 | DragonFireCK | set | nosy:
+ DragonFireCK |
2013-02-26 15:58:46 | Albert.Zeyer | set | messages: + msg183064 |
2013-02-26 13:40:17 | Albert.Zeyer | set | messages: + msg183056 |
2013-02-26 13:14:53 | neologix | set | messages: + msg183052 |
2013-02-26 13:01:20 | Albert.Zeyer | set | messages: + msg183050 |
2013-02-26 12:53:28 | Albert.Zeyer | set | messages: + msg183049 |
2013-02-26 11:08:15 | pitrou | set | messages: + msg183044 |
2013-02-26 08:49:42 | neologix | set | files:
+ thread_local_concurrent.diff keywords: + patch messages: + msg183033 |
2013-02-26 06:57:32 | neologix | set | messages: + msg183022 |
2013-02-25 13:37:22 | Albert.Zeyer | set | messages: + msg182935 |
2013-02-24 21:59:37 | neologix | set | messages: + msg182898 |
2013-02-23 22:35:38 | pitrou | set | messages: + msg182830 |
2013-02-23 22:27:34 | neologix | set | messages: + msg182829 |
2013-02-23 19:57:42 | pitrou | set | messages: + msg182806 |
2013-02-23 19:35:42 | pitrou | set | messages: + msg182800 |
2013-02-23 17:11:26 | Albert.Zeyer | set | messages: + msg182771 |
2013-02-23 17:00:03 | neologix | set | messages: + msg182765 |
2013-02-23 16:56:00 | Albert.Zeyer | set | messages: + msg182761 |
2013-02-23 16:47:27 | neologix | set | messages: + msg182760 |
2013-02-23 14:22:20 | Albert.Zeyer | set | messages: + msg182745 |
2013-02-23 11:08:05 | pitrou | set | messages: + msg182735 |
2013-02-23 10:55:38 | neologix | set | messages: + msg182732 |
2013-02-23 10:38:23 | pitrou | set | messages: + msg182731 |
2013-02-23 10:30:27 | neologix | set | nosy:
+ pitrou messages: + msg182730 |
2013-02-23 07:35:52 | Albert.Zeyer | set | messages: + msg182721 |
2013-02-23 07:27:50 | Albert.Zeyer | set | messages: + msg182720 |
2013-02-22 07:41:18 | neologix | set | nosy:
+ neologix messages: + msg182657 |
2013-02-21 03:49:40 | r.david.murray | set | nosy:
+ r.david.murray |
2013-02-21 02:37:51 | Albert.Zeyer | create |