classification
Title: crash when tp_dealloc allows other threads
Type: crash Stage:
Components: Interpreter Core Versions: Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Albert.Zeyer, DragonFireCK, iritkatriel, pitrou, r.david.murray
Priority: normal Keywords: patch

Created on 2013-02-21 02:37 by Albert.Zeyer, last changed 2020-11-17 15:04 by neologix.

Files
File name Uploaded Description Edit
thread_local_concurrent.diff neologix, 2013-02-26 08:49
Messages (28)
msg182577 - (view) Author: Albert Zeyer (Albert.Zeyer) Date: 2013-02-21 02:37
If you have some Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS in some tp_dealloc and you use such objects in thread local storage, you might get crashes, depending on which thread at what time is trying to cleanup such object.

I haven't fully figured out the details but I have a somewhat reduced testcase. Note that I encountered this in practice because the sqlite connection object does that (while it disconnects, the GIL is released).

This is the C code with some dummy type which has a tp_dealloc which just sleeps for some seconds while the GIL is released: https://github.com/albertz/playground/blob/master/testcrash_python_threadlocal.c

This is the Python code: https://github.com/albertz/playground/blob/master/testcrash_python_threadlocal_py.py

The Python code also contains some code path with a workaround which I'm using currently to avoid such crashes in my application.
msg182657 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-02-22 07:41
Could you try with recent checkout of python 2.7?
I wonder if this could be an occurrence of issue #13992 fixed by Antoine a couple months ago.
msg182720 - (view) Author: Albert Zeyer (Albert.Zeyer) Date: 2013-02-23 07:27
The latest 2.7 hg still crashes.
msg182721 - (view) Author: Albert Zeyer (Albert.Zeyer) Date: 2013-02-23 07:35
The backtrace:

Thread 0:: Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib        	0x00007fff8a54e386 __semwait_signal + 10
1   libsystem_c.dylib             	0x00007fff85e30800 nanosleep + 163
2   libsystem_c.dylib             	0x00007fff85e30717 usleep + 54
3   testcrash_python_threadlocal.so	0x00000001002ddd40 test_dealloc + 48
4   python.exe                    	0x00000001000400a9 dict_dealloc + 153 (dictobject.c:1010)
5   python.exe                    	0x00000001000432d3 PyDict_DelItem + 259 (dictobject.c:855)
6   python.exe                    	0x00000001000d7f27 _localdummy_destroyed + 71 (threadmodule.c:585)
7   python.exe                    	0x0000000100006c61 PyObject_Call + 97 (abstract.c:2529)
8   python.exe                    	0x0000000100006e42 PyObject_CallFunctionObjArgs + 370 (abstract.c:2761)
9   python.exe                    	0x000000010006b2e6 PyObject_ClearWeakRefs + 534 (weakrefobject.c:892)
10  python.exe                    	0x00000001000d746b localdummy_dealloc + 27 (threadmodule.c:231)
11  python.exe                    	0x00000001000400a9 dict_dealloc + 153 (dictobject.c:1010)
12  python.exe                    	0x00000001000c003b PyThreadState_Clear + 139 (pystate.c:240)
13  python.exe                    	0x00000001000c02c8 PyInterpreterState_Clear + 56 (pystate.c:104)
14  python.exe                    	0x00000001000c1c68 Py_Finalize + 344 (pythonrun.c:504)
15  python.exe                    	0x00000001000d5891 Py_Main + 3041 (main.c:665)
16  python.exe                    	0x0000000100000a74 start + 52

Thread 1:
0   libsystem_kernel.dylib        	0x00007fff8a54e386 __semwait_signal + 10
1   libsystem_c.dylib             	0x00007fff85e30800 nanosleep + 163
2   libsystem_c.dylib             	0x00007fff85e30717 usleep + 54
3   testcrash_python_threadlocal.so	0x00000001002ddd40 test_dealloc + 48
4   python.exe                    	0x00000001000400a9 dict_dealloc + 153 (dictobject.c:1010)
5   python.exe                    	0x00000001000432d3 PyDict_DelItem + 259 (dictobject.c:855)
6   python.exe                    	0x00000001000d7f27 _localdummy_destroyed + 71 (threadmodule.c:585)
7   python.exe                    	0x0000000100006c61 PyObject_Call + 97 (abstract.c:2529)
8   python.exe                    	0x0000000100006e42 PyObject_CallFunctionObjArgs + 370 (abstract.c:2761)
9   python.exe                    	0x000000010006b2e6 PyObject_ClearWeakRefs + 534 (weakrefobject.c:892)
10  python.exe                    	0x00000001000d746b localdummy_dealloc + 27 (threadmodule.c:231)
11  python.exe                    	0x00000001000400a9 dict_dealloc + 153 (dictobject.c:1010)
12  python.exe                    	0x00000001000c003b PyThreadState_Clear + 139 (pystate.c:240)
13  python.exe                    	0x00000001000d7ec4 t_bootstrap + 372 (threadmodule.c:643)
14  libsystem_c.dylib             	0x00007fff85da6742 _pthread_start + 327
15  libsystem_c.dylib             	0x00007fff85d93181 thread_start + 13

Thread 2:
0   libsystem_kernel.dylib        	0x00007fff8a54e322 __select + 10
1   time.so                       	0x00000001002fb01b time_sleep + 139 (timemodule.c:948)
2   python.exe                    	0x000000010009fcfb PyEval_EvalFrameEx + 18011 (ceval.c:4021)
3   python.exe                    	0x00000001000a30f3 fast_function + 179 (ceval.c:4107)
4   python.exe                    	0x000000010009fdad PyEval_EvalFrameEx + 18189 (ceval.c:4042)
5   python.exe                    	0x00000001000a2fb7 PyEval_EvalCodeEx + 2103 (ceval.c:3253)
6   python.exe                    	0x000000010002f8cb function_call + 347 (funcobject.c:526)
7   python.exe                    	0x0000000100006c61 PyObject_Call + 97 (abstract.c:2529)
8   python.exe                    	0x00000001000a066a PyEval_EvalFrameEx + 20426 (ceval.c:4334)
9   python.exe                    	0x00000001000a30f3 fast_function + 179 (ceval.c:4107)
10  python.exe                    	0x000000010009fdad PyEval_EvalFrameEx + 18189 (ceval.c:4042)
11  python.exe                    	0x00000001000a30f3 fast_function + 179 (ceval.c:4107)
12  python.exe                    	0x000000010009fdad PyEval_EvalFrameEx + 18189 (ceval.c:4042)
13  python.exe                    	0x00000001000a2fb7 PyEval_EvalCodeEx + 2103 (ceval.c:3253)
14  python.exe                    	0x000000010002f8cb function_call + 347 (funcobject.c:526)
15  python.exe                    	0x0000000100006c61 PyObject_Call + 97 (abstract.c:2529)
16  python.exe                    	0x0000000100018b07 instancemethod_call + 439 (classobject.c:2603)
17  python.exe                    	0x0000000100006c61 PyObject_Call + 97 (abstract.c:2529)
18  python.exe                    	0x000000010009aaa4 PyEval_CallObjectWithKeywords + 180 (ceval.c:3891)
19  python.exe                    	0x00000001000d7d96 t_bootstrap + 70 (threadmodule.c:614)
20  libsystem_c.dylib             	0x00007fff85da6742 _pthread_start + 327
21  libsystem_c.dylib             	0x00007fff85d93181 thread_start + 13

Thread 3 Crashed:
0   python.exe                    	0x00000001000a2329 PyEval_EvalFrameEx + 27785 (ceval.c:2995)
1   python.exe                    	0x00000001000a2fb7 PyEval_EvalCodeEx + 2103 (ceval.c:3253)
2   python.exe                    	0x000000010002f8cb function_call + 347 (funcobject.c:526)
3   python.exe                    	0x0000000100006c61 PyObject_Call + 97 (abstract.c:2529)
4   python.exe                    	0x00000001000a066a PyEval_EvalFrameEx + 20426 (ceval.c:4334)
5   python.exe                    	0x00000001000a30f3 fast_function + 179 (ceval.c:4107)
6   python.exe                    	0x000000010009fdad PyEval_EvalFrameEx + 18189 (ceval.c:4042)
7   python.exe                    	0x00000001000a30f3 fast_function + 179 (ceval.c:4107)
8   python.exe                    	0x000000010009fdad PyEval_EvalFrameEx + 18189 (ceval.c:4042)
9   python.exe                    	0x00000001000a2fb7 PyEval_EvalCodeEx + 2103 (ceval.c:3253)
10  python.exe                    	0x000000010002f8cb function_call + 347 (funcobject.c:526)
11  python.exe                    	0x0000000100006c61 PyObject_Call + 97 (abstract.c:2529)
12  python.exe                    	0x0000000100018b07 instancemethod_call + 439 (classobject.c:2603)
13  python.exe                    	0x0000000100006c61 PyObject_Call + 97 (abstract.c:2529)
14  python.exe                    	0x000000010009aaa4 PyEval_CallObjectWithKeywords + 180 (ceval.c:3891)
15  python.exe                    	0x00000001000d7d96 t_bootstrap + 70 (threadmodule.c:614)
16  libsystem_c.dylib             	0x00007fff85da6742 _pthread_start + 327
17  libsystem_c.dylib             	0x00007fff85d93181 thread_start + 13
msg182730 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-02-23 10:30
Alright, here's what's going on.
When the main thread exits, it triggers the interpreter shutdown, which clears all the tstates in PyInterpreterState_Clear():
"""
void
PyInterpreterState_Clear(PyInterpreterState *interp)
{
    PyThreadState *p;
    HEAD_LOCK();
    for (p = interp->tstate_head; p != NULL; p = p->next)
        PyThreadState_Clear(p);
"""

PyThreadState_Clear() clears the TLS dict:
"""
void
PyThreadState_Clear(PyThreadState *tstate)
{
    if (Py_VerboseFlag && tstate->frame != NULL)
        fprintf(stderr,
          "PyThreadState_Clear: warning: thread still has a frame\n");

    Py_CLEAR(tstate->frame);

    Py_CLEAR(tstate->dict);
"""

This deallocation of the TLS dict But when the TLS object is deallocated, if it releases the GIL, this can make other threads runnable, while the interpreter is shutting down (and the tstate are in an unusable state), so all  bets are off. Note that this can only happen if there are daemon threads, which is the case in your testcase.

Basically, the problem is that arbitrary code can be run while the interpreter is shutting down because of the TLS deallocation.

I'm not sure about how to handle it, but one possibility to limit such problems would be to not deallocate the tstate if a thread is currently still active:

"""
diff --git a/Python/pystate.c b/Python/pystate.c
--- a/Python/pystate.c
+++ b/Python/pystate.c
@@ -230,9 +230,12 @@
 void
 PyThreadState_Clear(PyThreadState *tstate)
 {
-    if (Py_VerboseFlag && tstate->frame != NULL)
-        fprintf(stderr,
-          "PyThreadState_Clear: warning: thread still has a frame\n");
+    if (tstate->frame != NULL) {
+        if (Py_VerboseFlag)
+            fprintf(stderr,
+                    "PyThreadState_Clear: warning: thread still has a frame\n");
+        return;
+    }
 
     Py_CLEAR(tstate->frame);
 
"""

But this would leak to memory leak in some cases...
msg182731 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-02-23 10:38
Albert, this happens because daemon threads continue running during interpreter shutdown. I suppose the problem goes away if you make the thread non-daemonic?

This shouldn't be a problem in Python 3 where Python threads cannot switch during shutdown.
msg182732 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-02-23 10:55
> This shouldn't be a problem in Python 3 where Python threads cannot switch
> during shutdown.

What happens if the GIL is relased during shutdown?

Also, I'm a bit worried about this code:
"""
void
PyThreadState_Clear(PyThreadState *tstate)
{
    if (Py_VerboseFlag && tstate->frame != NULL)
        fprintf(stderr,
          "PyThreadState_Clear: warning: thread still has a frame\n");

    Py_CLEAR(tstate->frame);

    Py_CLEAR(tstate->dict);
"""

The TLS dict is deallocated after having cleared the frame, which
could lead to surprises, no?
msg182735 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-02-23 11:08
> What happens if the GIL is relased during shutdown?

In PyEval_RestoreThread(), any thread other than the main thread trying to take the GIL will immediately exit:

        take_gil(tstate);
        if (_Py_Finalizing && tstate != _Py_Finalizing) {
            drop_gil(tstate);
            PyThread_exit_thread();
            assert(0);  /* unreachable */
        }

> The TLS dict is deallocated after having cleared the frame, which
> could lead to surprises, no?

I don't know. Can you think of a situation where there is a problem?
msg182745 - (view) Author: Albert Zeyer (Albert.Zeyer) Date: 2013-02-23 14:22
Note that in my original application where I encountered this (with sqlite), the backtrace looks slightly different. It is at shutdown, but not at interpreter shutdown - the main thread is still running.

https://github.com/albertz/music-player/issues/23

I was trying to reproduce it in a similar way with this test case but in the test case, so far I could only reproduce the crash when it does the interpreter shutdown.
msg182760 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-02-23 16:47
> Note that in my original application where I encountered this (with sqlite), the backtrace looks slightly different. It is at shutdown, but not at interpreter shutdown - the main thread is still running.

Could you post a traceback of this crash?
msg182761 - (view) Author: Albert Zeyer (Albert.Zeyer) Date: 2013-02-23 16:55
Here is one. Others are in the issue report on GitHub.

In Thread 5, the PyObject_SetAttr is where some attribute containing a threading.local object is set to None. This threading.local object had a reference to a sqlite connection object (in some TLS contextes). This should also be the actual crashing thread. I use faulthandler which makes it look like Thread 0 crashed in the crash reporter.

I had this crash about 5% of the time - but totally unpredictable. But it was always happening in exactly that line where the attribute was set to None.


Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib          0x00007fff8a54e0fa __psynch_cvwait + 10
1   libsystem_c.dylib               0x00007fff85daaf89 _pthread_cond_wait + 869
2   org.python.python               0x000000010006f54e PyThread_acquire_lock + 96
3   org.python.python               0x000000010001d8e3 PyEval_RestoreThread + 61
4   org.python.python               0x0000000100075bf3 0x100009000 + 445427
5   org.python.python               0x0000000100020041 PyEval_EvalFrameEx + 7548
6   org.python.python               0x000000010001e281 PyEval_EvalCodeEx + 1956
7   org.python.python               0x0000000100024661 0x100009000 + 112225
8   org.python.python               0x00000001000200d2 PyEval_EvalFrameEx + 7693
9   org.python.python               0x000000010001e281 PyEval_EvalCodeEx + 1956
10  org.python.python               0x0000000100024661 0x100009000 + 112225
11  org.python.python               0x00000001000200d2 PyEval_EvalFrameEx + 7693
12  org.python.python               0x000000010001e281 PyEval_EvalCodeEx + 1956
13  org.python.python               0x000000010005df78 0x100009000 + 348024
14  org.python.python               0x000000010001caba PyObject_Call + 97
15  _objc.so                        0x0000000104615898 0x104600000 + 88216
16  libffi.dylib                    0x00007fff8236e8a6 ffi_closure_unix64_inner + 508
17  libffi.dylib                    0x00007fff8236df66 ffi_closure_unix64 + 70
18  com.apple.AppKit                0x00007fff84f63f3f -[NSApplication _docController:shouldTerminate:] + 75
19  com.apple.AppKit                0x00007fff84f63e4e __91-[NSDocumentController(NSInternal) _closeAllDocumentsWithDelegate:shouldTerminateSelector:]_block_invoke_0 + 159
20  com.apple.AppKit                0x00007fff84f63cea -[NSDocumentController(NSInternal) _closeAllDocumentsWithDelegate:shouldTerminateSelector:] + 1557
21  com.apple.AppKit                0x00007fff84f636ae -[NSDocumentController(NSInternal) __closeAllDocumentsWithDelegate:shouldTerminateSelector:] + 265
22  com.apple.AppKit                0x00007fff84f6357f -[NSApplication _shouldTerminate] + 772
23  com.apple.AppKit                0x00007fff84f9134f -[NSApplication(NSAppleEventHandling) _handleAEQuit] + 403
24  com.apple.AppKit                0x00007fff84d40261 -[NSApplication(NSAppleEventHandling) _handleCoreEvent:withReplyEvent:] + 660
25  com.apple.Foundation            0x00007fff867e112b -[NSAppleEventManager dispatchRawAppleEvent:withRawReply:handlerRefCon:] + 308
26  com.apple.Foundation            0x00007fff867e0f8d _NSAppleEventManagerGenericHandler + 106
27  com.apple.AE                    0x00007fff832eeb48 aeDispatchAppleEvent(AEDesc const*, AEDesc*, unsigned int, unsigned char*) + 307
28  com.apple.AE                    0x00007fff832ee9a9 dispatchEventAndSendReply(AEDesc const*, AEDesc*) + 37
29  com.apple.AE                    0x00007fff832ee869 aeProcessAppleEvent + 318
30  com.apple.HIToolbox             0x00007fff8e19f8e9 AEProcessAppleEvent + 100
31  com.apple.AppKit                0x00007fff84d3c916 _DPSNextEvent + 1456
32  com.apple.AppKit                0x00007fff84d3bed2 -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] + 128
33  com.apple.AppKit                0x00007fff84d33283 -[NSApplication run] + 517
34  libffi.dylib                    0x00007fff8236dde4 ffi_call_unix64 + 76
35  libffi.dylib                    0x00007fff8236e619 ffi_call + 853
36  _objc.so                        0x000000010461a663 PyObjCFFI_Caller + 1980
37  _objc.so                        0x000000010462f43e 0x104600000 + 193598
38  org.python.python               0x000000010001caba PyObject_Call + 97
39  org.python.python               0x0000000100020225 PyEval_EvalFrameEx + 8032
40  org.python.python               0x00000001000245eb 0x100009000 + 112107
41  org.python.python               0x00000001000200d2 PyEval_EvalFrameEx + 7693
42  org.python.python               0x000000010001e281 PyEval_EvalCodeEx + 1956
43  org.python.python               0x000000010001dad7 PyEval_EvalCode + 54
44  org.python.python               0x0000000100054933 0x100009000 + 309555
45  org.python.python               0x00000001000549ff PyRun_FileExFlags + 165
46  org.python.python               0x00000001000543e9 PyRun_SimpleFileExFlags + 410
47  albertzeyer.MusicPlayer         0x0000000100001f54 main + 682 (main.m:67)
48  albertzeyer.MusicPlayer         0x0000000100001c6d _start + 203
49  albertzeyer.MusicPlayer         0x0000000100001ba1 start + 33


Thread 1:: Dispatch queue: com.apple.libdispatch-manager
0   libsystem_kernel.dylib          0x00007fff8a54ed16 kevent + 10
1   libdispatch.dylib               0x00007fff88230dea _dispatch_mgr_invoke + 883
2   libdispatch.dylib               0x00007fff882309ee _dispatch_mgr_thread + 54

Thread 2:
0   libsystem_kernel.dylib          0x00007fff8a54e0fa __psynch_cvwait + 10
1   libsystem_c.dylib               0x00007fff85daaf89 _pthread_cond_wait + 869
2   org.python.python               0x000000010006f54e PyThread_acquire_lock + 96
3   org.python.python               0x000000010001d8e3 PyEval_RestoreThread + 61
4   _sqlite3.so                     0x000000010a4041f1 pysqlite_connection_dealloc + 76
5   org.python.python               0x00000001000729f3 0x100009000 + 432627
6   org.python.python               0x00000001000729f3 0x100009000 + 432627
7   org.python.python               0x0000000100052b55 PyThreadState_Clear + 136
8   org.python.python               0x000000010007610a 0x100009000 + 446730
9   libsystem_c.dylib               0x00007fff85da6742 _pthread_start + 327
10  libsystem_c.dylib               0x00007fff85d93181 thread_start + 13

Thread 3:
0   libsystem_kernel.dylib          0x00007fff8a54e0fa __psynch_cvwait + 10
1   libsystem_c.dylib               0x00007fff85daaf89 _pthread_cond_wait + 869
2   org.python.python               0x000000010006f54e PyThread_acquire_lock + 96
3   org.python.python               0x000000010001d8e3 PyEval_RestoreThread + 61
4   _objc.so                        0x00000001046234a3 0x104600000 + 144547
5   org.python.python               0x00000001000a4194 0x100009000 + 635284
6   org.python.python               0x0000000100021a49 PyEval_EvalFrameEx + 14212
7   org.python.python               0x00000001000245eb 0x100009000 + 112107
8   org.python.python               0x00000001000200d2 PyEval_EvalFrameEx + 7693
9   org.python.python               0x000000010001e281 PyEval_EvalCodeEx + 1956
10  org.python.python               0x000000010005df78 0x100009000 + 348024
11  org.python.python               0x000000010001caba PyObject_Call + 97
12  org.python.python               0x000000010001ec59 PyEval_EvalFrameEx + 2452
13  org.python.python               0x00000001000245eb 0x100009000 + 112107
14  org.python.python               0x00000001000200d2 PyEval_EvalFrameEx + 7693
15  org.python.python               0x00000001000245eb 0x100009000 + 112107
16  org.python.python               0x00000001000200d2 PyEval_EvalFrameEx + 7693
17  org.python.python               0x000000010001e281 PyEval_EvalCodeEx + 1956
18  org.python.python               0x000000010005df78 0x100009000 + 348024
19  org.python.python               0x000000010001caba PyObject_Call + 97
20  org.python.python               0x000000010003719a 0x100009000 + 188826
21  org.python.python               0x000000010001caba PyObject_Call + 97
22  org.python.python               0x0000000100023dfc PyEval_CallObjectWithKeywords + 177
23  org.python.python               0x0000000100076010 0x100009000 + 446480
24  libsystem_c.dylib               0x00007fff85da6742 _pthread_start + 327
25  libsystem_c.dylib               0x00007fff85d93181 thread_start + 13

Thread 4:
0   libsystem_kernel.dylib          0x00007fff8a54e0fa __psynch_cvwait + 10
1   libsystem_c.dylib               0x00007fff85daaf89 _pthread_cond_wait + 869
2   org.python.python               0x000000010006f54e PyThread_acquire_lock + 96
3   org.python.python               0x000000010001d8e3 PyEval_RestoreThread + 61
4   org.python.python               0x0000000100053351 PyGILState_Ensure + 93
5   _objc.so                        0x0000000104609b6e 0x104600000 + 39790
6   libobjc.A.dylib                 0x00007fff880c6230 (anonymous namespace)::AutoreleasePoolPage::pop(void*) + 464
7   com.apple.CoreFoundation        0x00007fff8ec15342 _CFAutoreleasePoolPop + 34
8   com.apple.Foundation            0x00007fff867e003d -[NSAutoreleasePool release] + 154
9   com.apple.CoreFoundation        0x00007fff8ebed85a CFRelease + 170
10  _objc.so                        0x000000010462349b 0x104600000 + 144539
11  org.python.python               0x00000001000a4194 0x100009000 + 635284
12  org.python.python               0x0000000100021a49 PyEval_EvalFrameEx + 14212
13  org.python.python               0x000000010001e281 PyEval_EvalCodeEx + 1956
14  org.python.python               0x0000000100024661 0x100009000 + 112225
15  org.python.python               0x00000001000200d2 PyEval_EvalFrameEx + 7693
16  org.python.python               0x00000001000245eb 0x100009000 + 112107
17  org.python.python               0x00000001000200d2 PyEval_EvalFrameEx + 7693
18  org.python.python               0x000000010001e281 PyEval_EvalCodeEx + 1956
19  org.python.python               0x000000010005df78 0x100009000 + 348024
20  org.python.python               0x000000010001caba PyObject_Call + 97
21  org.python.python               0x000000010001ec59 PyEval_EvalFrameEx + 2452
22  org.python.python               0x00000001000245eb 0x100009000 + 112107
23  org.python.python               0x00000001000200d2 PyEval_EvalFrameEx + 7693
24  org.python.python               0x00000001000245eb 0x100009000 + 112107
25  org.python.python               0x00000001000200d2 PyEval_EvalFrameEx + 7693
26  org.python.python               0x000000010001e281 PyEval_EvalCodeEx + 1956
27  org.python.python               0x000000010005df78 0x100009000 + 348024
28  org.python.python               0x000000010001caba PyObject_Call + 97
29  org.python.python               0x000000010003719a 0x100009000 + 188826
30  org.python.python               0x000000010001caba PyObject_Call + 97
31  org.python.python               0x0000000100023dfc PyEval_CallObjectWithKeywords + 177
32  org.python.python               0x0000000100076010 0x100009000 + 446480
33  libsystem_c.dylib               0x00007fff85da6742 _pthread_start + 327
34  libsystem_c.dylib               0x00007fff85d93181 thread_start + 13

Thread 5:
0   org.python.python               0x000000010007575e 0x100009000 + 444254
1   org.python.python               0x0000000100071cbe 0x100009000 + 429246
2   org.python.python               0x0000000100071bcd PyDict_SetItem + 145
3   org.python.python               0x0000000100079a55 PyObject_GenericSetAttr + 327
4   org.python.python               0x0000000100079538 PyObject_SetAttr + 157
5   org.python.python               0x000000010001f303 PyEval_EvalFrameEx + 4158
6   org.python.python               0x00000001000245eb 0x100009000 + 112107
7   org.python.python               0x00000001000200d2 PyEval_EvalFrameEx + 7693
8   org.python.python               0x00000001000245eb 0x100009000 + 112107
9   org.python.python               0x00000001000200d2 PyEval_EvalFrameEx + 7693
10  org.python.python               0x00000001000245eb 0x100009000 + 112107
11  org.python.python               0x00000001000200d2 PyEval_EvalFrameEx + 7693
12  org.python.python               0x000000010001e281 PyEval_EvalCodeEx + 1956
13  org.python.python               0x000000010005df78 0x100009000 + 348024
14  org.python.python               0x000000010001caba PyObject_Call + 97
15  org.python.python               0x000000010001ec59 PyEval_EvalFrameEx + 2452
16  org.python.python               0x00000001000245eb 0x100009000 + 112107
17  org.python.python               0x00000001000200d2 PyEval_EvalFrameEx + 7693
18  org.python.python               0x00000001000245eb 0x100009000 + 112107
19  org.python.python               0x00000001000200d2 PyEval_EvalFrameEx + 7693
20  org.python.python               0x000000010001e281 PyEval_EvalCodeEx + 1956
21  org.python.python               0x000000010005df78 0x100009000 + 348024
22  org.python.python               0x000000010001caba PyObject_Call + 97
23  org.python.python               0x000000010003719a 0x100009000 + 188826
24  org.python.python               0x000000010001caba PyObject_Call + 97
25  org.python.python               0x0000000100023dfc PyEval_CallObjectWithKeywords + 177
26  org.python.python               0x0000000100076010 0x100009000 + 446480
27  libsystem_c.dylib               0x00007fff85da6742 _pthread_start + 327
28  libsystem_c.dylib               0x00007fff85d93181 thread_start + 13

Thread 6:
0   libsystem_kernel.dylib          0x00007fff8a54e386 __semwait_signal + 10
1   libsystem_c.dylib               0x00007fff85e30800 nanosleep + 163
2   libsystem_c.dylib               0x00007fff85e30717 usleep + 54
3   ffmpeg.so                       0x000000010bd7609d PlayerObject::workerProc(PyMutex&, bool&) + 509 (ffmpeg_player_decoding.cpp:1087)
4   ffmpeg.so                       0x000000010bd78ac2 boost::function2<void, PyMutex&, bool&>::operator()(PyMutex&, bool&) const + 28 (function_template.hpp:759)
5   ffmpeg.so                       0x000000010bd78736 PyThread_thread(void*) + 25 (ffmpeg_utils.cpp:98)
6   libsystem_c.dylib               0x00007fff85da6742 _pthread_start + 327
7   libsystem_c.dylib               0x00007fff85d93181 thread_start + 13

Thread 7:
0   libsystem_kernel.dylib          0x00007fff8a54e322 __select + 10
1   time.so                         0x00000001007f9d83 0x1007f9000 + 3459
2   org.python.python               0x0000000100020041 PyEval_EvalFrameEx + 7548
3   org.python.python               0x000000010001e281 PyEval_EvalCodeEx + 1956
4   org.python.python               0x000000010005df78 0x100009000 + 348024
5   org.python.python               0x000000010001caba PyObject_Call + 97
6   org.python.python               0x000000010001ec59 PyEval_EvalFrameEx + 2452
7   org.python.python               0x00000001000245eb 0x100009000 + 112107
8   org.python.python               0x00000001000200d2 PyEval_EvalFrameEx + 7693
9   org.python.python               0x00000001000245eb 0x100009000 + 112107
10  org.python.python               0x00000001000200d2 PyEval_EvalFrameEx + 7693
11  org.python.python               0x000000010001e281 PyEval_EvalCodeEx + 1956
12  org.python.python               0x000000010005df78 0x100009000 + 348024
13  org.python.python               0x000000010001caba PyObject_Call + 97
14  org.python.python               0x000000010003719a 0x100009000 + 188826
15  org.python.python               0x000000010001caba PyObject_Call + 97
16  org.python.python               0x0000000100023dfc PyEval_CallObjectWithKeywords + 177
17  org.python.python               0x0000000100076010 0x100009000 + 446480
18  libsystem_c.dylib               0x00007fff85da6742 _pthread_start + 327
19  libsystem_c.dylib               0x00007fff85d93181 thread_start + 13

Thread 8:: com.apple.audio.IOThread.client
0   libsystem_kernel.dylib          0x00007fff8a54c686 mach_msg_trap + 10
1   libsystem_kernel.dylib          0x00007fff8a54bc42 mach_msg + 70
2   com.apple.audio.CoreAudio       0x00007fff825a117a HALB_MachPort::SendMessageWithReply(unsigned int, unsigned int, unsigned int, unsigned int, mach_msg_header_t*, bool, unsigned int) + 98
3   com.apple.audio.CoreAudio       0x00007fff825a1108 HALB_MachPort::SendSimpleMessageWithSimpleReply(unsigned int, unsigned int, int, int&, bool, unsigned int) + 42
4   com.apple.audio.CoreAudio       0x00007fff8259f8db HALC_ProxyIOContext::IOWorkLoop() + 1209
5   com.apple.audio.CoreAudio       0x00007fff8259f391 HALC_ProxyIOContext::IOThreadEntry(void*) + 83
6   com.apple.audio.CoreAudio       0x00007fff8259f24b HALB_IOThread::Entry(void*) + 75
7   libsystem_c.dylib               0x00007fff85da6742 _pthread_start + 327
8   libsystem_c.dylib               0x00007fff85d93181 thread_start + 13
msg182765 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-02-23 17:00
> Here is one. Others are in the issue report on GitHub.

Yes, I've seen it, but I'd need a backtrace with line numbers (like
the one you posted above).
thread 5 is crashing, but I don't know where.
msg182771 - (view) Author: Albert Zeyer (Albert.Zeyer) Date: 2013-02-23 17:11
Sadly, that is quite complicated or almost impossible. It needs the MacOSX system Python and that one lacks debugging information.

I just tried with the CPython vom hg-2.7. But it seems the official Python doesn't have objc bindings (and I also need Cocoa bindings) so I can't easily run this right now (and another GUI is not yet implemented).
msg182800 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-02-23 19:35
I have two questions:
- how do you know the crash really happens because of thread 5?
- when the thread.local object is being deleted, has another thread just started looking up its attributes?
msg182806 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-02-23 19:57
Another question: are threads being started or stopped while the thread local object is being deleted?
msg182829 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-02-23 22:27
> - how do you know the crash really happens because of thread 5?

All other threads are blocked on locks or condition variables, it's
the only runnable thread.

> Another question: are threads being started or stopped while the thread local object is being deleted?

From the stack trace, thread 2 is being stopped.

I guess the problem is similar to above: thread 2 is in the middle of
stopping, its TLS dict is deallocated, which triggers the thread local
object deallocation, which releases the GIL. Thread 5 becomes running,
and must somehow access thread 2 tstate.
It would be much easier with a backtrace, though.
msg182830 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-02-23 22:35
> > - how do you know the crash really happens because of thread 5?
> 
> All other threads are blocked on locks or condition variables, it's
> the only runnable thread.

Hm, you are right.

> > Another question: are threads being started or stopped while the thread local object is being deleted?
> 
> >From the stack trace, thread 2 is being stopped.
> 
> I guess the problem is similar to above: thread 2 is in the middle of
> stopping, its TLS dict is deallocated, which triggers the thread local
> object deallocation, which releases the GIL. Thread 5 becomes running,
> and must somehow access thread 2 tstate.

I've read the code several times and I find it unlikely that it's the
cause of the problem:
- the thread state's thread-local dict (tstate->dict) is deallocated
using Py_CLEAR(), meaning it's unreachable from other threads when
deallocating one of the values releases the GIL
- the thread-local object's deallocator checks that tstate->dict is
non-NULL before using it; the only thing that could go wrong is if
PyDict_GetItem() releases the GIL, which sounds unlikely on tstate->dict

(also, I've checked that threadmodule.c holds the GIL when inserting and
removing thread states from the interpreter's thread states list; it
would be more future-proof for local_dealloc to use pystate.c's
HEAD_LOCK() and HEAD_UNLOCK() APIs, though)

I'm wondering if there's something else interfering here. My attempts at
writing a stress-test script have failed to produce any crash.
msg182898 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-02-24 21:59
I don't know how OS X crash report works, but it seems to have at
least some debug info available, since some ymbols are resolved in the
backtrace.
You might be able to get more info with gdb, with something like:
"""
gdb /path/to/python
(gdb) info line *<crash address>
(gdb) disassemble <crash address>
"""

Otherwise, is there are way to run your code on Linux?
msg182935 - (view) Author: Albert Zeyer (Albert.Zeyer) Date: 2013-02-25 13:37
The symbols are there because it is a library which exports all the symbols. Other debugging information are not there and I don't know any place where I can get them.

It currently cannot work on Linux in the same way because the GUI is Cocoa only right now. I'm trying to get it to run with another Python on Mac, though.

Note that in threadmodule.c, in local_clear, we are iterating through all threads:

    /* Remove all strong references to dummies from the thread states */
    if (self->key
        && (tstate = PyThreadState_Get())
        && tstate->interp) {
        for(tstate = PyInterpreterState_ThreadHead(tstate->interp);
            tstate;
            tstate = PyThreadState_Next(tstate))
            if (tstate->dict &&
                PyDict_GetItem(tstate->dict, self->key))
                PyDict_DelItem(tstate->dict, self->key);
    }

In PyDict_DelItem, if the GIL is released and meanwhile, the list of threadstates is altered, is that a problem for this loop? So maybe tstate becomes invalid there.

I also noticed this part in another backtrace of the same crash:

Thread 2:
0   libsystem_kernel.dylib          0x00007fff8a54e0fa __psynch_cvwait + 10
1   libsystem_c.dylib               0x00007fff85daaf89 _pthread_cond_wait + 869
2   org.python.python               0x000000010006f54e PyThread_acquire_lock + 96
3   org.python.python               0x000000010001d8e3 PyEval_RestoreThread + 61
4   org.python.python               0x0000000100053351 PyGILState_Ensure + 93
5   _objc.so                        0x0000000103b89b6e 0x103b80000 + 39790
6   libobjc.A.dylib                 0x00007fff880c6230 (anonymous namespace)::AutoreleasePoolPage::pop(void*) + 464
7   libobjc.A.dylib                 0x00007fff880c85a2 (anonymous namespace)::AutoreleasePoolPage::tls_dealloc(void*) + 42
8   libsystem_c.dylib               0x00007fff85dad4fe _pthread_tsd_cleanup + 240
9   libsystem_c.dylib               0x00007fff85da69a2 _pthread_exit + 146
10  libsystem_c.dylib               0x00007fff85da674d _pthread_start + 338
11  libsystem_c.dylib               0x00007fff85d93181 thread_start + 13


This seems to be a non-Python thread, so PyGILState_Ensure would have created a new threadstate and this would have altered the list.
msg183022 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-02-26 06:57
> Note that in threadmodule.c, in local_clear, we are iterating through all threads:
>
> In PyDict_DelItem, if the GIL is released and meanwhile, the list of threadstates is altered, is that a problem for this loop? So maybe tstate becomes invalid there.

Yes.
If PyDict_DelItem() releases the GIL and tstate is deleted,
PyThreadState_Next(tstate) is undefined behavior (it accesses
tstate->next).

Changing your reproducer to create/wait for termination of threads in
a loop in a background thread.
msg183033 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-02-26 08:49
And here's a patch.
msg183044 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-02-26 11:08
> And here's a patch.

Wouldn't it be better to expose and re-use the HEAD_LOCK and HEAD_UNLOCK
macros from pystate.c?
That said, I doubt this is the issue here. We are removing a string key pointing
to a localdummy object. Both are small atomic types not handled by the GC, so
I don't see how deallocating these objects could release the GIL.
msg183049 - (view) Author: Albert Zeyer (Albert.Zeyer) Date: 2013-02-26 12:53
Btw., where we are at this issue - I have seen many more loops over the threads (via PyThreadState_Next). I have a bad feeling that many of these loops have similar issues.

In this case, I am also not sure anymore that it really was a problem. I originally thought that in this loop, it would delete the local-dicts (which contained my Test-object/sqlite connection object). But it does not, it only deallocates a string and the dummy object there. The local-dicts were already been freed at Py_CLEAR(dummies).

I still tried to reproduce the crash in the testcase even when the interpreter is not shutting down (like it looks in my musicplayer app) but no success. I also wasn't able yet to get more debugging info about the musicplayer app crash.

Note that in the musicplayer app, I have the same workaround now as demonstrated in the testcase and there aren't any crashes anymore (so far - they were seldom anyway).
msg183050 - (view) Author: Albert Zeyer (Albert.Zeyer) Date: 2013-02-26 13:01
> Wouldn't it be better to expose and re-use the HEAD_LOCK and HEAD_UNLOCK macros from pystate.c?

The macro-names HEAD_LOCK/HEAD_UNLOCK irritates me a bit. Protecting only the head would not be enough. Any tstate object could be invalidated. But actually, it protects any modification on the list (both in tstate_delete_common and in new_threadstate), as far as I see it.

But yes, it would be a good thing to export this locking functionality so other code can use it.
msg183052 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-02-26 13:14
> Wouldn't it be better to expose and re-use the HEAD_LOCK and HEAD_UNLOCK
> macros from pystate.c?

I don't like holding locks before calling "alien" code, it's a recipe
for deadlocks: for example, if another thread-local object was
deallocated as part of the deallocation chain, we would call back into
local_clear(), and deadlock.

> That said, I doubt this is the issue here. We are removing a string key pointing
> to a localdummy object. Both are small atomic types not handled by the GC, so
> I don't see how deallocating these objects could release the GIL.

Yes, it shouldn't happen, the thread local dict is deallocated right
before (I initially thought the thread local dict was deallocated
here).

Without a proper backtrace, i'ts going to be hard to debug...
msg183056 - (view) Author: Albert Zeyer (Albert.Zeyer) Date: 2013-02-26 13:40
> > Wouldn't it be better to expose and re-use the HEAD_LOCK and HEAD_UNLOCK
> > macros from pystate.c?

> I don't like holding locks before calling "alien" code, it's a recipe
> for deadlocks: for example, if another thread-local object was
> deallocated as part of the deallocation chain, we would call back into
> local_clear(), and deadlock.

Ah, yes. Right now, the head-lock is acquired while the GIL is held. So while the head-lock is held, we must not unlock the GIL. So this wouldn't work.

Btw., I think it also does happen already. While playing around with this test case, I sometimes encountered a deadlock at quit. I was thinking that it was the result of some badly written memory.

But I just saw this code (PyInterpreterState_Clear):

    HEAD_LOCK();
    for (p = interp->tstate_head; p != NULL; p = p->next)
        PyThreadState_Clear(p);
    HEAD_UNLOCK();

So, if something inside PyThreadState_Clear unlocks the GIL and some other thread acquires the GIL and then tries to HEAD_LOCK (for example, at thread exit), you have a classic deadlock.

A solution would be: Only acquire the head-mutex while the GIL is not held. Then, after you held the head-mutex, also acquire the GIL.
msg183064 - (view) Author: Albert Zeyer (Albert.Zeyer) Date: 2013-02-26 15:58
Btw., this turns out to be at least 4 kind of separate bugs:

1. The crash from the testcase - when the interpreter shuts down.

2. Maybe the crash from my musicplayer app - if that is a different one. But very related to the first one.

3. Many loops over the thread states could have code inside which might release the GIL. All these loops can crash because the thread state could be invalidated in the meanwhile.

4. Possible deadlock with HEAD_LOCK usage.

Should we make separate issue reports for each?
msg381236 - (view) Author: Irit Katriel (iritkatriel) * (Python triager) Date: 2020-11-17 14:15
Is this a python 2-only issue?
History
Date User Action Args
2020-11-17 15:04:15neologixsetnosy: - neologix
2020-11-17 14:15:30iritkatrielsetnosy: + iritkatriel
messages: + msg381236
2013-04-03 21:15:07DragonFireCKsetnosy: + DragonFireCK
2013-02-26 15:58:46Albert.Zeyersetmessages: + msg183064
2013-02-26 13:40:17Albert.Zeyersetmessages: + msg183056
2013-02-26 13:14:53neologixsetmessages: + msg183052
2013-02-26 13:01:20Albert.Zeyersetmessages: + msg183050
2013-02-26 12:53:28Albert.Zeyersetmessages: + msg183049
2013-02-26 11:08:15pitrousetmessages: + msg183044
2013-02-26 08:49:42neologixsetfiles: + thread_local_concurrent.diff
keywords: + patch
messages: + msg183033
2013-02-26 06:57:32neologixsetmessages: + msg183022
2013-02-25 13:37:22Albert.Zeyersetmessages: + msg182935
2013-02-24 21:59:37neologixsetmessages: + msg182898
2013-02-23 22:35:38pitrousetmessages: + msg182830
2013-02-23 22:27:34neologixsetmessages: + msg182829
2013-02-23 19:57:42pitrousetmessages: + msg182806
2013-02-23 19:35:42pitrousetmessages: + msg182800
2013-02-23 17:11:26Albert.Zeyersetmessages: + msg182771
2013-02-23 17:00:03neologixsetmessages: + msg182765
2013-02-23 16:56:00Albert.Zeyersetmessages: + msg182761
2013-02-23 16:47:27neologixsetmessages: + msg182760
2013-02-23 14:22:20Albert.Zeyersetmessages: + msg182745
2013-02-23 11:08:05pitrousetmessages: + msg182735
2013-02-23 10:55:38neologixsetmessages: + msg182732
2013-02-23 10:38:23pitrousetmessages: + msg182731
2013-02-23 10:30:27neologixsetnosy: + pitrou
messages: + msg182730
2013-02-23 07:35:52Albert.Zeyersetmessages: + msg182721
2013-02-23 07:27:50Albert.Zeyersetmessages: + msg182720
2013-02-22 07:41:18neologixsetnosy: + neologix
messages: + msg182657
2013-02-21 03:49:40r.david.murraysetnosy: + r.david.murray
2013-02-21 02:37:51Albert.Zeyercreate