This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: pthreads need signal protection
Type: Stage:
Components: Interpreter Core Versions:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: gvanrossum Nosy List: gvanrossum, jasonlowe
Priority: normal Keywords:

Created on 2001-09-27 14:36 by anonymous, last changed 2022-04-10 16:04 by admin. This issue is now closed.

Messages (12)
msg6718 - (view) Author: Nobody/Anonymous (nobody) Date: 2001-09-27 14:36
I've been playing around with Python and threads, and
I've noticed some odd and often unstable behavior.  In
particular, on my Solaris 8 box I can get Python 1.5.2,
1.6, 2.0, or 2.1 to core dump every time with the
following sequence.  I've also seen this happen on
Solaris 6 (all UltraSPARC based):

1. Enter the following code into the interactive
interpreter:
--
import threading

def loopingfunc():
  while 1: pass

threading.Thread(target=loopingfunc).start()
--

2. Send a SIGINT signal (usually Ctrl-C, your terminal
settings may vary).  "Keyboard Interrupt" is displayed
and so far everything looks fine.

3. Now simply press the <Enter> key to enter a blank
line in the interpreter.  For my Solaris 8 box with the
GNU readline 2.2 module present, this always ends up in
a core dump.  It may take a while, since at this point
the readline signal handler is being re-entered
recursively until the stack overflows.

I've described this problem in the past on Usenet, but
didn't get much response.  For a more complete
discussion of the problem and a possible solution, see

http://groups.google.com/groups?hl=en&threadm=98osml%24sul%241%40newshost.mot.com&rnum=1&prev=/groups%3Fas_ugroup%3Dcomp.lang.python%26as_uauthors%3DJason%2520Lowe

(If the URL doesn't work, search groups.google.com for
posts by "Jason Lowe" in comp.lang.python and view the
entire thread of the result.)

Upon investigation of the problem, it looks like the
problem is caused by an interaction with pthreads and
signals.  The SIGINT signal is delivered to the thread
that is performing the spin loop, NOT the thread that
is in the readline() module.  Because the readline
module uses setjmp()/longjmp() for its signal handling,
the longjmp() ends up being executed by the wrong
thread with dire results.

Pthreads and signals don't mix very well, so one has to
be very careful to make sure everything works
properly.  A typical solution is to ensure signals are
only delivered to one thread by masking all signals in
all other threads.  I believe this will be the same
root cause of bug #219772 (Interactive InterPreter+
Thread -> core dump at exit).

I was able to solve the problem by modifying
Python/thread_pthread.h's PyThread_start_new_thread()
to block all signals with pthread_sigmask() after the
new thread was started.  This causes all threads
created by Python except the initial thread to have all
signals masked.  This forces signals to be delivered to
the main thread.  I don't believe anyone is depending
on the current behavior that signals will be delivered
to an indeterminate thread, so this change seems safe.
However I haven't run many other Python applications
that deal with threads and signals. 

I propose that on platforms that implement Python
threads with pthreads, the code masks all signals in
all threads except the initial, main thread.  This will
resolve the problem of signals being delivered to
threads indeterminately.  I think I can dig up my
initial code deltas if desired, or I can always
recreate them.  It's just a few lines to mask signals
in the thread before thread creation, then restore them
afterwards.  (This causes only the main thread to have
signals preserved.)

A side question from this is whether the thread module
(or posix module?) should expose the pthread_sigmask()
functionality to Python threads on a platform that uses
pthreads.  This would allow developers to manipulate
the signal masks of the Python threads so that a
particular signal can be routed to a particular
thread.  (They would mask this signal in all other
threads except the desired thread.)

msg6719 - (view) Author: Jason Lowe (jasonlowe) Date: 2001-09-27 14:40
Logged In: YES 
user_id=56897

Ack.  SourceForge wants to log me out every few minutes, so
I wasn't logged in when I submitted this. Sorry 'bout that.
msg6720 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2001-09-28 16:08
Logged In: YES 
user_id=6380

I don't have Solaris access, and I can't get this to break
on Linux. But I agree with your suggestion that posix
threads should block signals.

Are you capable of coming up with a patch that does that, in
a way that is independent of the specific platform (as long
as it has PTHREADS)? You may have to open a new issue in the
patch manager, since SF doesn't allow after-the-fact
attachments to anonymous entries. (Maybe SF logs you out
whenever you quit your browser? That's what it does for me.
:-)
msg6721 - (view) Author: Jason Lowe (jasonlowe) Date: 2001-10-03 14:32
Logged In: YES 
user_id=56897

I'm working on a patch now.  Unfortunately, I only have
access to Solaris and Linux right now, but I'll test the
patch on those.  I might be able to scrounge up an HPUX
machine as well.  I'll post more info as I get it.

Unfortunately, it appears I have to poll this issue for
updates, so I might not respond right away to comments.  The
'monitor' feature doesn't seem to work for me, among many
other SourceForge things.  If I wait about 3 minutes,
SourceForge wants me to log back in if I click anything and
I never seem to get any email notifications (but my email
address listed for my account is correct).  Weird.
msg6722 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2001-10-03 15:58
Logged In: YES 
user_id=6380

To get email (to your @users.sf.net account), click on the 
Monitor button that appears at the top of the bug entry 
when you're logged in to SF.
msg6723 - (view) Author: Jason Lowe (jasonlowe) Date: 2001-10-05 15:38
Logged In: YES 
user_id=56897

I've checked the Monitor button while logged in, but I still
do not receive email notification of updates.  When I
clicked it again, it said I was no longer monitoring, so I
clicked it yet again back into monitoring mode.  Apparently
SF knows I'm monitoring it, but it still doesn't send me
email.  Mail to my @users.sf.net account does work, so I'm
at a loss to explain why

a) My login cookie doesn't stick around very long at all
and
b) Why I never get monitor email from SF

Re: the patch, I have something that works well on Solaris. 
I'll try it on Linux today, but I don't have access to an
HP-UX system.  I'm a little concerned about the impact to
HP-UX (pre 11.0 and post 11.0) and AIX, and I don't have
access to those machines to check out those concerns.
Hopefully I'll have the patch posted by today.
msg6724 - (view) Author: Jason Lowe (jasonlowe) Date: 2001-10-05 17:09
Logged In: YES 
user_id=56897

I've submitted the patch for pthread signal masking.  My
biggest concerns are the guesses I made for DCE threads and
whether they will work for AIX which might need to use
sigthreadmask().

Regarding reproducing this on Linux, I was able to get Linux
to crash if I held down Ctrl-C (with fairly fast key
repeat).  After starting the spinning thread, Python would
crash on Linux under a storm of SIGINTs within 30 seconds or
so.  Without the spinning thread, I couldn't get it to
crash.  With the patch applied, the spinning thread running
during the storm of SIGINTs wouldn't crash it.  So that
implies the signal masking is doing something good even in
the Linux case.

Re: my SF problems, I submitted a few support requests. 
Hopefully something gets fixed.
msg6725 - (view) Author: Jason Lowe (jasonlowe) Date: 2001-10-09 18:40
Logged In: YES 
user_id=56897

Patch is #468347 [mask signals for non-main pthreads]
msg6726 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2001-10-12 21:52
Logged In: YES 
user_id=6380

Since I've now applied your patch, I presume this is fixed,
and I'm closing the bug report. Let me know if there are
still problems.
msg6727 - (view) Author: Jason Lowe (jasonlowe) Date: 2001-10-16 16:59
Logged In: YES 
user_id=56897

I'll grab the 2.2b1 release when it is available and test it
on the Solaris and Linux configurations we have.
msg6728 - (view) Author: Jason Lowe (jasonlowe) Date: 2001-10-24 14:41
Logged In: YES 
user_id=56897

I've verified Python 2.2b1 fixes the thread-signal
interaction on Solaris 6, Solaris 8, and RedHat Linux 7.1. 
Thanks for the quick patch application!
msg6729 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2001-10-24 14:46
Logged In: YES 
user_id=6380

Thanks for the followup!
History
Date User Action Args
2022-04-10 16:04:28adminsetgithub: 35243
2001-09-27 14:36:14anonymouscreate