classification
Title: shutdown (exit) can hang or segfault with daemon threads running
Type: crash Stage: resolved
Components: Interpreter Core Versions: Python 3.3, Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Netto, Rhamphoryncus, Trundle, amaury.forgeotdarc, flub, forest, georg.brandl, ggenellina, gregory.p.smith, guettli, haypo, jamescooper, kbk, larsimmisch, pitrou, python-dev, terry.reedy
Priority: high Keywords: patch

Created on 2008-01-17 02:01 by gregory.p.smith, last changed 2013-08-07 09:21 by guettli. This issue is now closed.

Files
File name Uploaded Description Edit
thread_exit.py gregory.p.smith, 2008-01-17 02:01 run this several times for several different behaviors
python-trunk-issue1856-threadexit.diff Rhamphoryncus, 2008-01-17 22:35 Immediately exit non-main threads if the interpreter is exiting
thread_noswap.patch amaury.forgeotdarc, 2008-01-17 22:54
python-thread_noswap-2.diff Rhamphoryncus, 2008-05-07 18:11
finalizing.patch pitrou, 2011-05-02 20:49 review
finalizing2.patch pitrou, 2011-05-03 17:25
Messages (27)
msg60014 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-01-17 02:01
This probably applies to 3.0 as well but i have not tested it there.

Here are some sample failures:

=========== A ==============
Exception in thread Thread-0000000001 (most likely raised during
interpreter shutdown):Exception in thread Thread-0000000003 (most likely
raised during interpreter shutdown):

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xe4cf1bb0 (LWP 22281)]
0x080c0824 in PyEval_EvalFrameEx (f=0x820d59c, throwflag=0)
    at Python/ceval.c:2633
2633            if (tstate->frame->f_exc_type != NULL)
============================



=========== B ==============
Exception in thread Thread-0000000001 (most likely raised during
interpreter shutdown):
Traceback (most recent call last):
  File "/home/gps/oss/python/trunk/Lib/threading.py", line 486, in
__bootstrap_inner
  File "thread_exit.py", line 24, in run
<type 'exceptions.AttributeError'>: 'NoneType' object has no attribute 'add'

Program received signal SIGINT, Interrupt.
[Switching to Thread 0xf7fe08e0 (LWP 21703)]
0xffffe410 in __kernel_vsyscall ()
(gdb) bt
#0  0xffffe410 in __kernel_vsyscall ()
#1  0x4dac1af0 in sem_wait@GLIBC_2.0 () from
/lib/tls/i686/cmov/libpthread.so.0
#2  0xf7fe08c0 in ?? ()
#3  0x080ec9d5 in PyThread_acquire_lock (lock=0x0, waitflag=1)
    at Python/thread_pthread.h:349
#4  0x080be6d5 in PyEval_RestoreThread (tstate=0x8168050) at
Python/ceval.c:314
#5  0x0806caf5 in file_dealloc (f=0xf7f29d58) at Objects/fileobject.c:396
#6  0x0810be36 in frame_dealloc (f=0x8201f04) at Objects/frameobject.c:416
#7  0x080e28d2 in PyThreadState_Clear (tstate=0x8184068)
    at Python/pystate.c:217
#8  0x080e2994 in PyInterpreterState_Clear (interp=0x8168008)
    at Python/pystate.c:105
#9  0x080e473b in Py_Finalize () at Python/pythonrun.c:444
#10 0x080e428e in handle_system_exit () at Python/pythonrun.c:1641
#11 0x080e447a in PyErr_PrintEx (set_sys_last_vars=1)
    at Python/pythonrun.c:1087
#12 0x080e510d in PyRun_SimpleFileExFlags (fp=0x817bd98,
    filename=0xffffd73c "thread_exit.py", closeit=1, flags=0xffffd518)
    at Python/pythonrun.c:996
#13 0x0805753b in Py_Main (argc=1, argv=0xffffd5b4) at Modules/main.c:574
#14 0x4d95cea2 in __libc_start_main () from /lib/tls/i686/cmov/libc.so.6
#15 0x080566c1 in _start () at ../sysdeps/i386/elf/start.S:119
(gdb) bt
#0  0x080c0824 in PyEval_EvalFrameEx (f=0x820d59c, throwflag=0)
    at Python/ceval.c:2633
#1  0x080c451c in PyEval_EvalFrameEx (f=0x820d40c, throwflag=0)
    at Python/ceval.c:3690
#2  0x080c451c in PyEval_EvalFrameEx (f=0x820d13c, throwflag=0)
    at Python/ceval.c:3690
#3  0x080c5fd4 in PyEval_EvalCodeEx (co=0xf7f22d58, globals=0xf7f962d4,
    locals=0x0, args=0xf7f46e18, argcount=1, kws=0x0, kwcount=0, defs=0x0,
    defcount=0, closure=0x0) at Python/ceval.c:2876
#4  0x0810d4d6 in function_call (func=0xf7f41454, arg=0xf7f46e0c, kw=0x0)
    at Objects/funcobject.c:524
#5  0x0805b0b4 in PyObject_Call (func=0xf7f41454, arg=0x0, kw=0x0)
    at Objects/abstract.c:1872
#6  0x08061008 in instancemethod_call (func=0xf7f41454, arg=0xf7f46e0c,
kw=0x0)
    at Objects/classobject.c:2509
#7  0x0805b0b4 in PyObject_Call (func=0xf7f3fdec, arg=0x0, kw=0x0)
    at Objects/abstract.c:1872
#8  0x080be8f9 in PyEval_CallObjectWithKeywords (func=0x0, arg=0xf7fa002c,
    kw=0x0) at Python/ceval.c:3473
#9  0x080efb4d in t_bootstrap (boot_raw=0x820c910)
    at ./Modules/threadmodule.c:422
#10 0x4dabd341 in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#11 0x4da114ee in clone () from /lib/tls/i686/cmov/libc.so.6
============================

And as with all problems of this sort... sometimes the program exits
normally without any problems.

I ran python trunk:60012 under gdb above.  But these problems occur on
older versions such as 2.4.
msg60016 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-01-17 04:24
So can we definitely rule out that this could be caused by the recent
changes to threading.py (r57216)?
msg60018 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-01-17 07:06
Yes i believe it is unrelated to any recent change.

I can reproduce both behaviors on my OS X 10.4 dual core mac using
Python 2.5.1.

Python 2.3 on the mac appears to get stuck in a loop when run stand
alone but gets a memory access fault when run under gdb.

python 2.4.4 seems to hang most of the time.

(all behaviors are possible i expect, i just ran it by hand under
several versions a few times)

The systems i ran it on when reporting the bug was SMP.  As with many
threading bugs it might be easier to reproduce on SMP systems but i
haven't checked.
msg60056 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-01-17 20:44
I uploaded a script for a similar issue:
http://bugs.python.org/issue1596321 (thread_crash.py)
which sometimes segfaults on Windows XP (trunk version, debug build),
with the same call stacks as printed by Gregory, on both threads.

I explain it this way:
On interpreter shutdown, the main thread clears the other's thread
TreadState. There you find the instruction: 
      Py_CLEAR(tstate->frame);
But this can call arbitrary python code on deallocation of locals, and
release the GIL (file.close() in our case).
The other thread can then continue to run. If it is given enough
processor time before the whole process shutdowns, it will reach this
line in ceval.c (line 2633)
	if (tstate->frame->f_exc_type != NULL)
since tstate->frame has been cleared by the main thread ==> boom.


There can be several ways to solve this problem:
- While clearing a thread state, prevent the "owner" thread to gain
focus. Only the main thread can "Py_END_ALLOW_THREADS". The daemon
thread is blocked forever, and will die there.
- Be smarter when clearing tstate->frame: first call frame_clear(),
which will free local variables, but let the frame in a valid state.
Only after that, call Py_CLEAR(tstate->frame), when we are sure that no
more thread switch can occur. Of course there are many other fields in a
frame; we have to inspect them carefully. The first solution seems a
more robust approach.
msg60059 - (view) Author: Adam Olsen (Rhamphoryncus) Date: 2008-01-17 21:00
I think non-main threads should kill themselves off if they grab the
interpreter lock and the interpreter is tearing down.  They're about to
get killed off anyway, when the process exits.

PyGILState_Ensure would still be broken.  It touches various things that
get torn down (autoInterpreterState, autoTLSkey, and the HEAD_LOCK
macros) before it grabs the GIL.  Reordering should be possible of course.
msg60063 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-01-17 21:32
agreed, during shutdown the other threads should be stopped.  anything
to do this complicates acquiring and releasing the GIL by adding another
check to see if we're shutting down.

brainstorm: I haven't looked at the existing BEGIN_ALLOW_THREADS and
END_ALLOW_THREADS implementations but would it be possible to modify
them on the fly from the thread doing the shutdown (main or not) while
it holds the GIL such that all future calls to BEGIN_ALLOW_THREADS do
not actually release the GIL and END_ALLOW_THREADS always blocks.  That
should bring other threads to a halt pretty quickly and prevent
destructors from releasing the GIL (file.close, etc).
msg60069 - (view) Author: Adam Olsen (Rhamphoryncus) Date: 2008-01-17 22:35
I'm not sure I understand you, Gregory.  Are arguing in favour of adding
extra logic to the GIL code, or against it?

I'm attaching a patch that has non-main thread exit, and it seems to fix
the test case.  It doesn't fix the PyGILState_Ensure problems though.

Also note that PyThread_exit_thread() was completely broken, becoming a
no-op once threading was initialized, and even before then it would exit
the process rather than just the thread.  I've fixed it for pthreads,
but not for any of the other platforms.
msg60070 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-01-17 22:39
> PyGILState_Ensure would still be broken.  
> It touches various things that get torn down (autoInterpreterState,
> autoTLSkey, and the HEAD_LOCK macros) before it grabs the GIL.
> Reordering should be possible of course.

Adam, did you notice the change on revision 59231 ? the
PyGILState_Ensure stuff should now remain valid during the
PyInterpreterState_Clear() call.
msg60071 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-01-17 22:54
Adam, your patch cover one case of the thread releasing the GIL
(Py_ALLOW_THREADS), but there are more calls to PyThread_acquire_lock
(the checkinterval for example).

I have a competing patch: it makes the main thread never release the GIL
after some point. Other threads stay blocked and don't clean themselves:
the OS will take care of them.

Both approaches correct the initial problem, though.
A remaining question is where exactly in Py_Finalize() we should ban
other threads. I think that this point should be as late as possible, to
allow some object/resources to be cleaned properly.
msg60072 - (view) Author: Adam Olsen (Rhamphoryncus) Date: 2008-01-17 23:04
> Adam, did you notice the change on revision 59231 ? the
> PyGILState_Ensure stuff should now remain valid during the
> PyInterpreterState_Clear() call.

That doesn't matter.  PyGILState_Ensure needs to remain valid *forever*.
 Only once the process is completely gone can we be sure it won't be called.

Note that PyGILState_Ensure has two behaviours: it can be called when
your thread is already running python code and has a tstate, or it can
be called when you have none.  revision 59231 fixed the former, but only
when Py_Finalize hasn't finished.  It doesn't fix it for any daemon
threads, or for dummy threads (created outside of python's control).

...

You're right, I did forget the 3 other places that acquire the
interpreter_lock.  The more I think about it the more I think our two
approaches are equivalent, but conditionalizing the release means we
don't need to touch the broken PyThread_exit_thread functions.

I think the banning should be as early as possible, right after
call_sys_exitfunc() has finished.  You can't do anything sane once the
interpreter is half-gone (despite the precedent of trying anyway.)
msg60073 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-01-17 23:23
> That doesn't matter.  PyGILState_Ensure needs to remain valid
> *forever*. Only once the process is completely gone can we be sure 
> it won't be called.

We could apply the same idea: when exiting, PyGILState_Ensure() blocks
forever, except for the main thread of course.

Note that all this state must be restartable: after Py_Finalize(), it
should be possible to call Py_Initialize() again. This seems to raise
the score of the "exit_thread" approach.
I don't know if multiple interpreters are well supported, though.
Is there somewhere a list of use cases, or a test script that can
exercise this?
msg60074 - (view) Author: Adam Olsen (Rhamphoryncus) Date: 2008-01-17 23:34
PyGILState_Ensure WOULD block forever if it acquired the GIL before
doing anything else.

The only way to make Py_Initialize callable after Py_Finalize is to make
various bits of the finalization into no-ops.  For instance, it's
currently impossible to unload C extension modules, so they must be left
around permanently.  I'm not convinced it works currently, or that
there's use cases for it.

Note that unloading python.so between Py_Finalize and Py_Initialize
would definitely break things, as you'd lose all the global variables
maintaining things.  That eliminates the only "it's more elegant" argument.
msg60075 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-01-17 23:51
> PyGILState_Ensure WOULD block forever if it acquired the GIL 
> before doing anything else.
Is it possible at all? PyThread_acquire_lock is not reentrant.
msg60077 - (view) Author: Adam Olsen (Rhamphoryncus) Date: 2008-01-18 00:12
Hrm.  It seems you're right.  Python needs thread-local data to
determine if the GIL is held by the current thread.  Thus, autoTLSkey
and all that need to never be torn down.  (The check could be done much
more directly than the current PyThreadState_IsCurrent machinations, but
I digress.)
msg66363 - (view) Author: Adam Olsen (Rhamphoryncus) Date: 2008-05-07 18:11
Cleaned up version of Amaury's patch.  I stop releasing the GIL after
sys.exitfunc is called, which protects threads from the ensuing teardown.

I also grab the import lock (and never release it).  This should prevent
the nasty issue with daemon threads doing imports mentioned in issue
1720705, so that part of the documentation change can be removed.  I
didn't do any significant testing though.

Importing raises a potential issue with this approach.  The standard
meme of "release GIL; grab lock; acquire GIL" becomes invalid.  A child
may grab the lock, then block on the GIL, while the main thread (which
never releases the GIL) will block on acquiring the lock.  I worked
around it with the import lock due to it's recursive behaviour, but if
it exists anywhere else in CPython it may cause a problem if used during
shutdown.
msg66742 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2008-05-12 17:11
Closed #2077 as a duplicate.
msg73720 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-09-24 15:47
The threads don't have to be daemons: sys.exit() calls Py_Finalize()
without waiting for non-daemons threads.
msg81485 - (view) Author: Gabriel Genellina (ggenellina) Date: 2009-02-09 20:30
I think applying Rhamphoryncus' patch in #1722344 fixes this too (that 
is, move WaitForThreadShutdown from the end of PyMain into Py_Finalize, 
so it is always called)
But it should be tested on several platforms.
msg112641 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-08-03 18:09
Is this still an issue in 2.7 or 3.x?
Or should it be closed?
msg115330 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-09-01 19:53
It is certainly still a problem with 3.x, but I don't find a way to exhibit it here.
msg116928 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-09-20 11:55
See #9901 for a variation on the same global issue (running threads can access interpreter structures - the GIL - while the main thread is trying to destroy them).
msg135006 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-05-02 20:36
With Python 3.3, thread_exit.py still crashes more or less randomly:

$ ./python thread_exit.py 
[49205 refs]
python: Python/_warnings.c:501: setup_context: Assertion `((((((PyObject*)(*filename))->ob_type))->tp_flags & ((1L<<28))) != 0)' failed.
Abandon
msg135007 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-05-02 20:49
A patch that seems to work under Linux and Windows (for 3.2/3.3).
msg135051 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-05-03 17:25
Victor pointed out that Py_Finalize() is not necessarily called in the main Python thread. This new patch records the thread state of the finalizing thread, and also includes a test case.
msg135144 - (view) Author: Roundup Robot (python-dev) Date: 2011-05-04 18:05
New changeset 2a19d09b08f8 by Antoine Pitrou in branch '3.2':
Issue #1856: Avoid crashes and lockups when daemon threads run while the
http://hg.python.org/cpython/rev/2a19d09b08f8

New changeset c892b0321d23 by Antoine Pitrou in branch 'default':
Issue #1856: Avoid crashes and lockups when daemon threads run while the
http://hg.python.org/cpython/rev/c892b0321d23
msg135146 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-05-04 18:06
Should be fixed in 3.2 and 3.3 now. I don't really want to bother with 2.7 and 3.1 (the GIL implementation is different), but someone can backport the patch if they want to :)
msg194599 - (view) Author: Thomas Guettler (guettli) Date: 2013-08-07 09:21
There are some examples to work around this for Python2: http://stackoverflow.com/questions/18098475/detect-interpreter-shut-down-in-daemon-thread
History
Date User Action Args
2013-08-07 09:21:03guettlisetnosy: + guettli
messages: + msg194599
2011-05-04 18:06:56pitrousetstatus: open -> closed
versions: - Python 3.1, Python 2.7
messages: + msg135146

resolution: fixed
stage: patch review -> resolved
2011-05-04 18:05:05python-devsetnosy: + python-dev
messages: + msg135144
2011-05-03 17:25:36pitrousetfiles: + finalizing2.patch

messages: + msg135051
2011-05-02 22:09:58Trundlesetnosy: + Trundle
2011-05-02 20:49:53pitrousetfiles: + finalizing.patch

nosy: + haypo
messages: + msg135007

stage: patch review
2011-05-02 20:36:31pitrousetmessages: + msg135006
versions: + Python 3.3
2010-09-20 11:55:36pitrousetmessages: + msg116928
2010-09-01 21:21:16gvanrossumsetnosy: - gvanrossum
2010-09-01 19:53:25pitrousetnosy: + pitrou

messages: + msg115330
versions: + Python 3.1, Python 3.2
2010-08-03 18:09:04terry.reedysetnosy: + terry.reedy

messages: + msg112641
versions: + Python 2.7, - Python 2.6, Python 2.5
2010-07-01 09:28:57Nettosetnosy: + Netto
2010-06-09 21:25:35terry.reedysetversions: - Python 2.4
2010-04-13 17:24:58jamescoopersetnosy: + jamescooper
2009-06-03 14:38:27flubsetnosy: + flub
2009-02-09 20:30:28ggenellinasetnosy: + ggenellina
messages: + msg81485
2009-02-08 14:58:03ocean-citylinkissue5160 superseder
2008-12-12 02:03:53forestsetnosy: + forest
2008-09-24 15:47:56amaury.forgeotdarcsetmessages: + msg73720
2008-05-12 17:11:21georg.brandllinkissue2077 superseder
2008-05-12 17:11:14georg.brandlsetnosy: + georg.brandl
messages: + msg66742
2008-05-07 18:11:54Rhamphoryncussetfiles: + python-thread_noswap-2.diff
keywords: + patch
messages: + msg66363
2008-04-29 23:52:23larsimmischsetnosy: + larsimmisch
2008-02-18 21:17:16kbksetnosy: + kbk
2008-01-20 19:57:56christian.heimessetpriority: high
2008-01-18 00:12:27Rhamphoryncussetmessages: + msg60077
2008-01-17 23:51:13amaury.forgeotdarcsetmessages: + msg60075
2008-01-17 23:34:31Rhamphoryncussetmessages: + msg60074
2008-01-17 23:23:38amaury.forgeotdarcsetmessages: + msg60073
2008-01-17 23:04:41Rhamphoryncussetmessages: + msg60072
2008-01-17 22:54:39amaury.forgeotdarcsetfiles: + thread_noswap.patch
messages: + msg60071
2008-01-17 22:39:20amaury.forgeotdarcsetmessages: + msg60070
2008-01-17 22:35:35Rhamphoryncussetfiles: + python-trunk-issue1856-threadexit.diff
messages: + msg60069
2008-01-17 21:32:19gregory.p.smithsetmessages: + msg60063
2008-01-17 21:00:48Rhamphoryncussetnosy: + Rhamphoryncus
messages: + msg60059
2008-01-17 20:44:14amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg60056
2008-01-17 07:06:03gregory.p.smithsetmessages: + msg60018
2008-01-17 04:24:12gvanrossumsetnosy: + gvanrossum
messages: + msg60016
2008-01-17 02:01:14gregory.p.smithcreate