Issue 1531859: Tracing and profiling functions can cause hangs in threads

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/43753

classification

Title:	Tracing and profiling functions can cause hangs in threads
Type:	behavior	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 2.6

process

Status:	closed	Resolution:	out of date
Dependencies:		Superseder:
Assigned To:		Nosy List:	ajaksu2, rocky, rockyb, splitscreen
Priority:	normal	Keywords:	patch

Created on 2006-07-31 16:48 by splitscreen, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
deadlock.diff	splitscreen, 2006-08-01 00:30	version 2, better comments

Messages (9)
msg50795 - (view)	Author: Matt Fleming (splitscreen)	Date: 2006-07-31 16:48
Attached is a patch (with test case) that fixes a problem when a tracing function or a profiling function for a thread references a thread ID, causing it to hang. Matt
msg50796 - (view)	Author: Matt Fleming (splitscreen)	Date: 2006-08-01 00:30
Logged In: YES user_id=1126061 Actually the problem is a little different than I first reliased. I've updated the comment block above the code in threading.py's __delete method to more clearly explain the situation.
msg50797 - (view)	Author: Rocky Bernstein (rockyb)	Date: 2006-08-01 13:28
Logged In: YES user_id=158581 I would like to try to clarify the problem a little and suggest some possible solution approaches. While this patch solves a particular threading.settrace() problem (and possibly a potential threading.setprofile problem), the more I think about this, I'm not sure it will solve all of them or is necessary in all cases. To reiterate the problem: It was noticed that having tracing (Threading.settrace) or profiling turned on while inside threading.py can cause a thread hang when _active_limbo_lock.aquire() is called recursively: once while code uses a method in threading.py like _delete(), and another time when tracing or profiling routine is called by settrace from within a Threading method and the tracing/profiling code calls one of the Threading methods like enumerate() to get information for its own purposes. (The patch addresses this for _delete but I'm not sure it would address it if the first call were say enumerate). One possibility and clearly the most reliable one because it relies least on code using Threading, would be for threading.py to check for this kind of recursive invocation (at the module level, not the method level) which might be done by scanning a call stack. More later. Another possibility might be to document this behavior and put the burden on the profiler/debugger/tracer or anything that might cause some set of threading routines to be called recursively. To address the problem outside of Threading code, what might be done is call _active_limbo_lock.aquire(blocking=0) before calling a Threading routine like enumerate(), and use the Threading routine only only if the lock is acquired. This will work, but it may get the "cannot acquire lock" status too often, namely in situations where there isn't a recursive call. Better than this would again be to somehow indicate that "a call to a Threading routine which does locking" is in progress. A simple and reliable way to do this would be to share the responsibility: the Threading methods would set a boolean variable set to indicate this condition. Code using Threading could test this before making calls which would cause recursive invocation.
msg50798 - (view)	Author: Rocky Bernstein (rockyb)	Date: 2006-08-01 14:26
Logged In: YES user_id=158581 One change to my comment below. I now don't think the "share the responsibility" approach mentioned will work any differently than the approach where the user of Threading adds active_limbo_lock.aquire(blocking=0) calls. The most reliable then is scanning the call stack, but this requires knowledge of the internals of Threading.py. This knowledge could be eliminated thouhg. On entry to a locking routine, a local variable could be set and instead of scanning the call stack for method names and file names (threading.py) a scan could be done for that local variable. Going further Threading could provide a routine to do the stack scan.
msg50799 - (view)	Author: Matt Fleming (splitscreen)	Date: 2006-08-01 15:53
Logged In: YES user_id=1126061 Your first solution, where the threading module in the standard library would check for any sort of recursion problem before trying to acquire the _active_limbo_lock, if say, it _does_ try to acquire _active_limbo_lock is it's already locked, what is the solution? It's probably not a good idea for threading.enumerate() to not return anything. How about unsetting/resetting the _trace_hook/_profile_hook around locked sections of code? This is pretty much the same as the developer of a tracing function using your idea of asking for _active_limbo_lock in their func except that it'd be transparent to them. This might be a vialbe solution, as long as it's clearly documented somewhere that this is what happens. It might just be a better solution to paste something in the threading docs such as "Don't call these functions from within tracing functions in threaded code" and a list of functions that are problematic, leaving the responsibility of this solely on the developer. I have no idea. But yes, thanks for pulling me up on the fact taht this patch is incomplete and doesn't fix all cases. Matt
msg50800 - (view)	Author: Rocky Bernstein (rockyb)	Date: 2006-08-01 16:01
Logged In: YES user_id=158581 threading.enumerate() could raise an error. I'm curious though to learn what others think of various aspects of the problem.
msg84493 - (view)	Author: Daniel Diniz (ajaksu2) *	Date: 2009-03-30 04:22
The supplied test case passes for me (Linux, trunk).
msg84530 - (view)	Author: rocky bernstein (rocky)	Date: 2009-03-30 09:11
Well, in the over 3 years since this has last been looked at, I wouldn't be surprised if someone else noticed the problem and therefore it has since been fixed. Was version 2.6 released back in January '06? Python news seems to indication that October 2008 is when "Python 2.6 Final" was released. My recollection is that this was reported against version 2.5. But I really don't remember. Thanks though, for for getting on this.
msg85602 - (view)	Author: Daniel Diniz (ajaksu2) *	Date: 2009-04-06 01:25
Rocky, No, 2.6 was released in 2008 and the bug was indeed filled against 2.5. However, 2.5 will not receive general bugfixes anymore, so I've changed the target version to 2.6 (as it's where it can be fixed if still present). Setting it to pending so other people interested in this will have an incentive to verify whether it's fixed or not, so we can close or reopen. Thanks for your detailed analysis of the bug!

History
Date	User	Action	Args
2022-04-11 14:56:19	admin	set	github: 43753
2009-04-25 21:38:29	ajaksu2	set	status: pending -> closed resolution: out of date stage: test needed -> resolved
2009-04-06 01:25:10	ajaksu2	set	status: open -> pending messages: + msg85602
2009-03-30 09:11:37	rocky	set	nosy: + rocky messages: + msg84530
2009-03-30 04:22:43	ajaksu2	set	versions: + Python 2.6 nosy: + ajaksu2 messages: + msg84493 type: behavior stage: test needed
2006-07-31 16:48:00	splitscreen	create