This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author mocramis
Recipients mocramis
Date 2019-03-29.10:45:18
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1553856319.46.0.350569249761.issue36469@roundup.psfhosted.org>
In-reply-to
Content
I have a script (sadly, I can't publish it) spawning multiple threads that, in rare occurences, does not manage to exit properly and get stuck forever.

More precisely, this seems to happen during Interpreter exit: The atexit callbacks are called sucessfully, and we then have multiple threads that are all atempting to get the GIL why None seems to owns it (_PyThreadState_Current is always '{_value = 0}' while gil_locked is '{_value = 1}').

The main thread stack looks like this:

#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
#1  0x000055d00997ce6a in PyCOND_TIMEDWAIT (cond=0x55d009e8ddc0 <gil_cond>, mut=0x55d009e8dd80 <gil_mutex>, us=<optimized out>) at ../Python/condvar.h:103
#2  take_gil () at ../Python/ceval_gil.h:224
#3  0x000055d00998580b in PyEval_EvalFrameEx () at ../Python/ceval.c:1273
#4  0x000055d00996f16f in _PyEval_EvalCodeWithName.lto_priv.1929 (qualname=0x0, name=<optimized out>, closure=0x0, kwdefs=0x0, defcount=0, defs=0x0, kwcount=0, kws=<optimized out>, argcount=<optimized out>, 
    args=<optimized out>, locals=<optimized out>, globals=<optimized out>, _co=<optimized out>) at ../Python/ceval.c:4033
#5  PyEval_EvalCodeEx () at ../Python/ceval.c:4054
#6  0x000055d0099b90e3 in function_call.lto_priv () at ../Objects/funcobject.c:627
#7  0x000055d009a02e17 in PyObject_Call () at ../Objects/abstract.c:2166
#8  0x000055d00992034e in method_call.lto_priv () at ../Objects/classobject.c:330
#9  0x000055d009a02e17 in PyObject_Call () at ../Objects/abstract.c:2166
#10 0x000055d00996df7d in PyEval_CallObjectWithKeywords () at ../Python/ceval.c:4595
#11 0x000055d009a5d05d in slot_tp_repr () at ../Objects/typeobject.c:5992
#12 0x000055d0099c9685 in PyObject_Repr () at ../Objects/object.c:482
#13 0x000055d0099aa6be in unicode_fromformat_arg (vargs=0x7ffc2ca81110, f=0x55d009a8a837 "R", writer=0x7ffc2ca810b0) at ../Objects/unicodeobject.c:2645
#14 PyUnicode_FromFormatV () at ../Objects/unicodeobject.c:2710
#15 0x000055d009a572bc in PyErr_WarnFormat () at ../Python/_warnings.c:895
#16 0x000055d0098840bb in sock_dealloc (s=0x7f43000fc528) at ../Modules/socketmodule.c:4177
#17 0x000055d0099d031d in subtype_dealloc.lto_priv () at ../Objects/typeobject.c:1209
#18 0x000055d0099b68f7 in frame_dealloc.lto_priv () at ../Objects/frameobject.c:431
#19 0x000055d0098ab7b1 in PyThreadState_Clear (tstate=0x55d00bee8a70) at ../Python/pystate.c:386
#20 0x000055d009a4d08a in PyInterpreterState_Clear () at ../Python/pystate.c:118
#21 0x000055d009a4e1d2 in Py_Finalize () at ../Python/pylifecycle.c:633
#22 0x000055d009a4e2a8 in Py_Exit (sts=sts@entry=0) at ../Python/pylifecycle.c:1465
#23 0x000055d009a4e38e in handle_system_exit () at ../Python/pythonrun.c:602
#24 0x000055d009a4e3f6 in PyErr_PrintEx () at ../Python/pythonrun.c:612
#25 0x000055d009a4f667 in PyErr_Print () at ../Python/pythonrun.c:508
#26 PyRun_SimpleFileExFlags () at ../Python/pythonrun.c:401
#27 0x000055d009a7c2e7 in run_file (p_cf=0x7ffc2ca814fc, filename=0x55d00bb01140 L"...", fp=0x55d00bb62e60) at ../Modules/main.c:318
#28 Py_Main () at ../Modules/main.c:768
#29 0x000055d00990bd71 in main () at ../Programs/python.c:65
#30 0x00007f430b7cd2e1 in __libc_start_main (main=0x55d00990bc90 <main>, argc=11, argv=0x7ffc2ca81708, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffc2ca816f8)
    at ../csu/libc-start.c:291
#31 0x000055d009a12a7a in _start ()

We can see it is trying to get the GIL while finalizing (as it is emitting a warning when destroying a socket). However, this prevents any other thread to get deleted since the first thread holds the head_lock. For instance we have thread 18 trying to get the head lock:

Thread 18 (Thread 0x7f4302ffd700 (LWP 21117)):
#0  0x00007f430c6aa536 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x55d00bb014c0) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x55d00bb014c0, abstime=0x0) at sem_waitcommon.c:111
#2  0x00007f430c6aa5e4 in __new_sem_wait_slow (sem=0x55d00bb014c0, abstime=0x0) at sem_waitcommon.c:181
#3  0x000055d00994d2d5 in PyThread_acquire_lock_timed () at ../Python/thread_pthread.h:352
#4  0x000055d009a4dcec in tstate_delete_common () at ../Python/pystate.c:418
#5  0x000055d009a4dd88 in PyThreadState_DeleteCurrent () at ../Python/pystate.c:457
#6  0x000055d009a482a4 in t_bootstrap () at ../Modules/_threadmodule.c:1027
#7  0x00007f430c6a2494 in start_thread (arg=0x7f4302ffd700) at pthread_create.c:333
#8  0x00007f430b895acf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97


I attached the full stacktrace of the 18 threads.

I a not sure wether we either shouldn't try to lock the GIL while finalizing or if i somehow just happen to have run into a thread aqcuiring the GIL without releasing it.

python version is 3.5.3.

I kept the problematic process running and can extract any information you may want from it.
History
Date User Action Args
2019-03-29 10:45:19mocramissetrecipients: + mocramis
2019-03-29 10:45:19mocramissetmessageid: <1553856319.46.0.350569249761.issue36469@roundup.psfhosted.org>
2019-03-29 10:45:19mocramislinkissue36469 messages
2019-03-29 10:45:19mocramiscreate