This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author morngnstar
Recipients
Date 2003-06-18.23:28:33
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
When a segmentation fault happens on Linux in any 
thread but the main thread, the program exits, but 
zombie threads remain behind.

Steps to reproduce:
1. Download attached tar and extract files zombie.py 
and zombieCmodule.c.
2. Compile and link zombieCmodule.c as a shared library 
(or whatever other method you prefer for making a 
Python extension module).
3. Put the output from step 2 (zombieC.so) in your 
lib/python directory.
4. Run python2.2 zombie.py.
5. After the program exits, run ps.

zombie.py launches several threads that just loop 
forever, and one that calls a C function in zombieC. The 
latter prints "NULL!" then segfaults intentionally, 
printing "Segmentation fault". Then the program exits, 
returning control back to the shell.

Expected, and Python 2.1 behavior:
No Python threads appear in the output of ps.

Actual Python 2.2 behavior:
5 Python threads appear in the output of ps. To kill 
them, you have to apply kill -9 to each one individually.

  Not only does this bug leave around messy zombie 
threads, but the threads left behind hold on to program 
resources. For example, if the program binds a socket, 
that port cannot be bound again until someone kills the 
threads. Of course programs should not generate 
segfaults, but if they do they should fail gracefully.

  I have identified the cause of this bug. The old Python 
2.1 behavior can be restored by removing these lines of 
Python/thread_pthread.h:

sigfillset(&newmask);
SET_THREAD_SIGMASK(SIG_BLOCK, &newmask, 
&oldmask);

... and ...

SET_THREAD_SIGMASK(SIG_SETMASK, &oldmask, NULL);

  I guess even SIGSEGV gets blocked by this code, and 
somehow that prevents the default behavior of segfaults 
from working correctly.

  I'm not suggesting that removing this code is a good 
way to fix this bug. This is just an example to show that 
it seems to be the blocking of signals that causes this 
bug.
History
Date User Action Args
2007-08-23 14:14:04adminlinkissue756924 messages
2007-08-23 14:14:04admincreate