Issue 44449: Segfault in _PyTrash_begin when faulthandler tries to dump thread stacks

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/88615

classification

Title:	Segfault in _PyTrash_begin when faulthandler tries to dump thread stacks
Type:	crash	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.10, Python 3.9

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	dgrisby, miss-islington, vstinner
Priority:	normal	Keywords:	patch

Created on 2021-06-18 10:27 by dgrisby, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL	Status	Linked	Edit
PR 27850	merged	vstinner, 2021-08-20 09:33
PR 28066	merged	miss-islington, 2021-08-30 13:24

Messages (5)
msg396042 - (view)	Author: Duncan Grisby (dgrisby)	Date: 2021-06-18 10:27
I am using Python 3.9.4 on CentOS 7. faulthandler is registered with SIGUSR1: faulthandler.register(signal.SIGUSR1) Sending SIGUSR1 normally correctly dumps the thread stacks, but occasionally it segfaults from the main thread instead: Thread 1 (Thread 0x7efe15e69740 (LWP 15201)): #0 _PyTrash_begin (tstate=tstate@entry=0x0, op=op@entry=0x757ece0) at Objects/object.c:2125 #1 0x00007efe156f05e5 in frame_dealloc (f=0x757ece0) at Objects/frameobject.c:578 #2 0x00007efe15898f88 in _Py_DECREF (op=0x757ece0) at Include/object.h:430 #3 dump_traceback (write_header=0, tstate=0x757e1a0, fd=2) at Python/traceback.c:821 #4 _Py_DumpTracebackThreads (fd=fd@entry=2, interp=<optimized out>, interp@entry=0x0, current_tstate=0xbe6a70) at Python/traceback.c:921 #5 0x00007efe1590be7d in faulthandler_dump_traceback (interp=<optimized out>, all_threads=1, fd=2) at Modules/faulthandler.c:243 #6 faulthandler_user (signum=10) at Modules/faulthandler.c:839 #7 <signal handler called> #8 0x00007efe15243d2f in do_futex_wait () from /lib64/libpthread.so.0 #9 0x00007efe15243e07 in __new_sem_wait_slow () from /lib64/libpthread.so.0 #10 0x00007efe15243ea5 in sem_timedwait () from /lib64/libpthread.so.0 #11 0x00007efe15896d11 in PyThread_acquire_lock_timed (lock=lock@entry=0x7ea7080, microseconds=microseconds@entry=5000000, intr_flag=intr_flag@entry=1) at Python/thread_pthread.h :457 #12 0x00007efe158f35a4 in acquire_timed (timeout=5000000000, lock=0x7ea7080) at Modules/_threadmodule.c:63 #13 lock_PyThread_acquire_lock (self=0x7efdf4518750, args=<optimized out>, kwds=<optimized out>) at Modules/_threadmodule.c:146 #14 0x00007efe15749916 in method_vectorcall_VARARGS_KEYWORDS (func=0x7efe15e1b310, args=0x186d208, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/descrobject.c:346 ... It has failed because tstate is null. tstate came from Py_TRASHCAN_BEGIN_CONDITION that calls PyThreadState_GET(), assuming it returns a valid pointer, but the comment on the _PyThreadState_GET macro says: Efficient macro reading directly the 'gilstate.tstate_current' atomic variable. The macro is unsafe: it does not check for error and it can return NULL. The only place I can see that tstate_current would be set to NULL is in _PyThreadState_DeleteCurrent(). I suspect that there has been a race with a thread exit. I'm not sure quite what to do about this. Perhaps faulthandler should check if tstate_current is NULL and set it suitably if so?
msg399828 - (view)	Author: STINNER Victor (vstinner) *	Date: 2021-08-18 09:45
_Py_DumpTracebackThreads() should not use Py_DECREF(). It's a bug. It must only read memory, not modify memory since it's called from a signal handler. It's a regression in dump_traceback(). Python 3.9 and 3.10 use: frame = PyThreadState_GetFrame(tstate); ... Py_DECREF(frame); The main branch (future 3.11) uses: frame = tstate->frame; Without Py_DECREF(): it's a borrowed reference. It was changed by commit ae0a2b756255629140efcbe57fc2e714f0267aa3. Python 3.9 and 3.10 should be fixed to use a borrowed reference.
msg400599 - (view)	Author: STINNER Victor (vstinner) *	Date: 2021-08-30 13:24
New changeset fe997e1a67835a929705c8c305d41c4d7dd326e3 by Victor Stinner in branch '3.10': bpo-44449: faulthandler don't modify frame refcnt (GH-27850) https://github.com/python/cpython/commit/fe997e1a67835a929705c8c305d41c4d7dd326e3
msg400606 - (view)	Author: miss-islington (miss-islington)	Date: 2021-08-30 13:56
New changeset 720aef48b558e68c07937f0cc8d62a60f23dcb3d by Miss Islington (bot) in branch '3.9': bpo-44449: faulthandler don't modify frame refcnt (GH-27850) https://github.com/python/cpython/commit/720aef48b558e68c07937f0cc8d62a60f23dcb3d
msg400607 - (view)	Author: STINNER Victor (vstinner) *	Date: 2021-08-30 14:04
The bug should now be fixed in 3.9 and 3.10 branches. The main branch should not be impacted, it has a different implementation. Thanks for the bug report Duncan Grisby.

History
Date	User	Action	Args
2022-04-11 14:59:46	admin	set	github: 88615
2021-08-30 14:04:32	vstinner	set	status: open -> closed versions: + Python 3.10 messages: + msg400607 resolution: fixed stage: patch review -> resolved
2021-08-30 13:56:10	miss-islington	set	messages: + msg400606
2021-08-30 13:24:47	miss-islington	set	nosy: + miss-islington pull_requests: + pull_request26510
2021-08-30 13:24:46	vstinner	set	messages: + msg400599
2021-08-20 09:33:48	vstinner	set	keywords: + patch stage: patch review pull_requests: + pull_request26310
2021-08-18 09:45:24	vstinner	set	nosy: + vstinner messages: + msg399828
2021-06-18 10:27:10	dgrisby	create