classification
Title: Tracing and profiling functions can cause hangs in threads
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 2.6
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: ajaksu2, rocky, rockyb, splitscreen
Priority: normal Keywords: patch

Created on 2006-07-31 16:48 by splitscreen, last changed 2009-04-25 21:38 by ajaksu2. This issue is now closed.

Files
File name Uploaded Description Edit
deadlock.diff splitscreen, 2006-08-01 00:30 version 2, better comments
Messages (9)
msg50795 - (view) Author: Matt Fleming (splitscreen) Date: 2006-07-31 16:48
Attached is a patch (with test case) that fixes a
problem when a tracing function or a profiling function
for a thread references a thread ID, causing it to hang.

Matt
msg50796 - (view) Author: Matt Fleming (splitscreen) Date: 2006-08-01 00:30
Logged In: YES 
user_id=1126061

Actually the problem is a little different than I first
reliased. I've updated the comment block above the code in
threading.py's __delete method to more clearly explain the
situation.
msg50797 - (view) Author: Rocky Bernstein (rockyb) Date: 2006-08-01 13:28
Logged In: YES 
user_id=158581

I would like to try to clarify the problem a little and
suggest some possible solution approaches. 

While this patch solves a particular threading.settrace()
problem (and possibly a potential threading.setprofile
problem), the more I think about this, I'm not sure it will
solve all of them or is necessary in all cases.

To reiterate the problem: 

It was noticed that having tracing (Threading.settrace) or
profiling turned on while inside threading.py can cause a
thread hang when _active_limbo_lock.aquire() is called
recursively: once while code uses a method in threading.py
like _delete(), and another time when tracing or profiling
routine is called by settrace from within a Threading method
and the tracing/profiling code calls one of the Threading
methods like enumerate() to get information for its own
purposes. (The patch addresses this for _delete but I'm not
sure it would address it if the first call were say enumerate).

One possibility and clearly the most reliable one because it
relies least on code using Threading, would be for
threading.py to check for this kind of recursive invocation
 (at the module level, not the method level) which might be
done by scanning a call stack. More later. 

Another possibility might be to document this behavior and
put the burden on the profiler/debugger/tracer or anything
that might cause some set of threading routines to be called
recursively. To address the problem outside of Threading
code, what might be done is call
_active_limbo_lock.aquire(blocking=0) before calling a
Threading routine like enumerate(), and use the Threading
routine only only if the lock is acquired.

This will work, but it may get the "cannot acquire lock"
status too often, namely in situations where there isn't a
recursive call. Better than this would again be to somehow
indicate that "a call to a Threading routine which does
locking" is in progress. 

A simple and reliable way to do this would be to share the
responsibility: the Threading methods would set a boolean
variable set to indicate this condition. Code using
Threading could test this before making calls which would
cause recursive invocation.
msg50798 - (view) Author: Rocky Bernstein (rockyb) Date: 2006-08-01 14:26
Logged In: YES 
user_id=158581

One change to my comment below. I now don't think the "share
the responsibility" approach mentioned will work any
differently than the approach where the user of Threading
adds active_limbo_lock.aquire(blocking=0) calls. 

The most reliable then is scanning the call stack, but this
requires knowledge of the internals of Threading.py. This
knowledge could be eliminated thouhg. On entry to a locking
routine, a local variable could be set and instead of
scanning the call stack for method names and file names
(threading.py) a scan could be done for that local variable.
Going further Threading could provide a routine to do the
stack scan.
msg50799 - (view) Author: Matt Fleming (splitscreen) Date: 2006-08-01 15:53
Logged In: YES 
user_id=1126061

Your first solution, where the threading module in the
standard library would check for any sort of recursion
problem before trying to acquire the _active_limbo_lock, if
say, it _does_ try to acquire _active_limbo_lock is it's
already locked, what is the solution? It's probably not a
good idea for threading.enumerate() to not return anything.

How about unsetting/resetting the _trace_hook/_profile_hook
around locked sections of code? This is pretty much the same
as the developer of a tracing function using your idea of
asking for _active_limbo_lock in their func except that it'd
be transparent to them. This might be a vialbe solution, as
long as it's clearly documented somewhere that this is what
happens. 

It might just be a better solution to paste something in the
threading docs such as "Don't call these functions from
within tracing functions in threaded code" and a list of
functions that are problematic, leaving the responsibility
of this solely on the developer. I have no idea.

But yes, thanks for pulling me up on the fact taht this
patch is incomplete and doesn't fix all cases.

Matt
msg50800 - (view) Author: Rocky Bernstein (rockyb) Date: 2006-08-01 16:01
Logged In: YES 
user_id=158581

threading.enumerate() could raise an error. I'm curious
though to learn what others think of various aspects of the
problem.
msg84493 - (view) Author: Daniel Diniz (ajaksu2) (Python triager) Date: 2009-03-30 04:22
The supplied test case passes for me (Linux, trunk).
msg84530 - (view) Author: rocky bernstein (rocky) Date: 2009-03-30 09:11
Well, in the over 3 years since this has last been looked at, I wouldn't
be surprised if someone else noticed the problem and therefore it has
since been fixed.

Was version 2.6 released back in January '06? Python news seems to
indication that October 2008 is when "Python 2.6 Final" was released.

My recollection is that this was reported against version 2.5. But I
really don't remember. 

Thanks though, for for getting on this.
msg85602 - (view) Author: Daniel Diniz (ajaksu2) (Python triager) Date: 2009-04-06 01:25
Rocky,
No, 2.6 was released in 2008 and the bug was indeed filled against 2.5.
However, 2.5 will not receive general bugfixes anymore, so I've changed
the target version to 2.6 (as it's where it can be fixed if still present).

Setting it to pending so other people interested in this will have an
incentive to verify whether it's fixed or not, so we can close or reopen.

Thanks for your detailed analysis of the bug!
History
Date User Action Args
2009-04-25 21:38:29ajaksu2setstatus: pending -> closed
resolution: out of date
stage: test needed -> resolved
2009-04-06 01:25:10ajaksu2setstatus: open -> pending

messages: + msg85602
2009-03-30 09:11:37rockysetnosy: + rocky
messages: + msg84530
2009-03-30 04:22:43ajaksu2setversions: + Python 2.6
nosy: + ajaksu2

messages: + msg84493

type: behavior
stage: test needed
2006-07-31 16:48:00splitscreencreate