This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: returning after forking a child thread doesn't call Py_Finalize
Type: behavior Stage: patch review
Components: Interpreter Core, Library (Lib), Subinterpreters Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: open Resolution:
Dependencies: 6643 Superseder:
Assigned To: rnk Nosy List: christian.heimes, gregory.p.smith, neologix, rnk, sbt, vstinner
Priority: normal Keywords: patch

Created on 2009-08-04 18:21 by rnk, last changed 2022-04-11 14:56 by admin.

Files
File name Uploaded Description Edit
thread_fork_hang.py rnk, 2009-08-04 18:21 Failing fork/threads test case.
finalize-patch.diff rnk, 2009-08-05 01:06 Straw man patch to do finalization when returning from a forked child thread. review
Messages (5)
msg91263 - (view) Author: Reid Kleckner (rnk) (Python committer) Date: 2009-08-04 18:21
I attached a test case to reproduce.

Here is what it does:
- The main thread in the parent process starts a new thread and waits
for it.
- The child thread forks.
- The child process creates a daemon thread, and returns.
- The parent process (in the thread that forked) calls os.waitpid(childpid).

What should happen is that the forked child process should terminate
because it shouldn't wait for the daemon thread, and
os.waitpid(childpid) should return after that, and then the main thread
should return from thread.join().

What happens is that because it was a child thread that forked, the C
stack starts inside of the pthread wrapper (or equivalent) instead of
main.  So when child process returns, it doesn't know that it is now the
main thread, and it doesn't execute Py_Finalize.  Furthermore, pthreads
doesn't call sys_exit_group because it thinks that it is a lone thread
returning, and it doesn't want to terminate the entire process group. 
When you look at it with 'ps f', this is what it looks like:

24325 pts/3    Ss     0:01 bash
 4453 pts/3    Sl     0:00  \_ ./python thread_fork_hang.py
 4459 pts/3    Zl     0:07  |   \_ [python] <defunct>
 4467 pts/3    R+     0:00  \_ ps f

Here's the stack traces from the parent process:
(gdb) bt 
#0  0x00007ffff7bd0991 in sem_wait () from /lib/libpthread.so.0
#1  0x0000000000587abd in PyThread_acquire_lock (lock=0x12bb680,
waitflag=1) at ../../unladen2/Python/thread_pthread.h:349
#2  0x00000000005b1660 in lock_PyThread_acquire_lock
(self=0x7ffff7f37150, args=<value optimized out>)
    at ../../unladen2/Modules/threadmodule.c:46
#3  0x000000000055b89d in _PyEval_CallFunction (stack_pointer=0x128ff20,
na=<value optimized out>, nk=0) at ../../unladen2/Python/eval.cc:4046
#4  0x000000000055644c in PyEval_EvalFrame (f=0x128fd60) at
../../unladen2/Python/eval.cc:2518
#5  0x000000000055b225 in PyEval_EvalCodeEx (co=0x7ffff7ef9670,
globals=0x1, locals=0x2, args=0x123bbd8, argcount=1, kws=0x123bbe0,
kwcount=0, 
    defs=0x7ffff7e540e8, defcount=1, closure=0x0) at
../../unladen2/Python/eval.cc:3093
#6  0x000000000055b7b0 in _PyEval_CallFunction (stack_pointer=0x123bbe0,
na=1, nk=0) at ../../unladen2/Python/eval.cc:4188
#7  0x000000000055644c in PyEval_EvalFrame (f=0x123ba40) at
../../unladen2/Python/eval.cc:2518
#8  0x000000000055b225 in PyEval_EvalCodeEx (co=0x7ffff7efea30,
globals=0x1, locals=0x2, args=0x12038c8, argcount=1, kws=0x12038d0,
kwcount=0, 
    defs=0x7ffff7e54368, defcount=1, closure=0x0) at
../../unladen2/Python/eval.cc:3093
#9  0x000000000055b7b0 in _PyEval_CallFunction (stack_pointer=0x12038d0,
na=1, nk=0) at ../../unladen2/Python/eval.cc:4188
#10 0x000000000055644c in PyEval_EvalFrame (f=0x1203750) at
../../unladen2/Python/eval.cc:2518
#11 0x000000000055b225 in PyEval_EvalCodeEx (co=0x7ffff7f55d50,
globals=0x0, locals=0x0, args=0x0, argcount=0, kws=0x0, kwcount=0,
defs=0x0, 
    defcount=0, closure=0x0) at ../../unladen2/Python/eval.cc:3093
#12 0x000000000055bc02 in PyEval_EvalCode (co=0x12bb680, globals=0x80,
locals=0x0) at ../../unladen2/Python/eval.cc:552
#13 0x000000000057deb1 in PyRun_FileExFlags (fp=0x1121260,
filename=0x7fffffffe6be "thread_fork_hang.py", start=<value optimized out>, 
    globals=0x10fa010, locals=0x10fa010, closeit=1,
flags=0x7fffffffe290) at ../../unladen2/Python/pythonrun.c:1359
#14 0x000000000057e167 in PyRun_SimpleFileExFlags (fp=0x1121260,
filename=0x7fffffffe6be "thread_fork_hang.py", closeit=1,
flags=0x7fffffffe290)
    at ../../unladen2/Python/pythonrun.c:955
#15 0x00000000004d8954 in Py_Main (argc=-134459232, argv=<value
optimized out>) at ../../unladen2/Modules/main.c:695
#16 0x00007ffff6cdf1c4 in __libc_start_main () from /lib/libc.so.6
#17 0x00000000004d7ae9 in _start ()
(gdb) thread 2
[Switching to thread 2 (Thread 0x40800950 (LWP 4458))]#0 
0x00007ffff7bd234f in waitpid () from /lib/libpthread.so.0
(gdb) bt
#0  0x00007ffff7bd234f in waitpid () from /lib/libpthread.so.0
#1  0x00000000005b6adf in posix_waitpid (self=<value optimized out>,
args=<value optimized out>) at ../../unladen2/Modules/posixmodule.c:5797
#2  0x000000000055b89d in _PyEval_CallFunction (stack_pointer=0x129cff8,
na=<value optimized out>, nk=0) at ../../unladen2/Python/eval.cc:4046
#3  0x000000000055644c in PyEval_EvalFrame (f=0x129ce60) at
../../unladen2/Python/eval.cc:2518
#4  0x000000000055b225 in PyEval_EvalCodeEx (co=0x7ffff7f558f0,
globals=0x0, locals=0x0, args=0x7ffff7f98068, argcount=0, kws=0x1239f70, 
    kwcount=0, defs=0x0, defcount=0, closure=0x0) at
../../unladen2/Python/eval.cc:3093
#5  0x00000000005d98fc in function_call (func=0x7ffff7eefc80,
arg=0x7ffff7f98050, kw=0x1286d20) at ../../unladen2/Objects/funcobject.c:524
#6  0x00000000004dc68d in PyObject_Call (func=0x7ffff7eefc80,
arg=0x7ffff7f98050, kw=0x1286d20) at ../../unladen2/Objects/abstract.c:2487
#7  0x00000000005549d0 in _PyEval_CallFunctionVarKw
(stack_pointer=0x129ce08, num_posargs=<value optimized out>, num_kwargs=0, 
    flags=<value optimized out>) at ../../unladen2/Python/eval.cc:45
#8  0x0000000000559de2 in PyEval_EvalFrame (f=0x129cc80) at
../../unladen2/Python/eval.cc:2560
#9  0x000000000055b225 in PyEval_EvalCodeEx (co=0x7ffff7efe710,
globals=0x1, locals=0x1, args=0x11f1478, argcount=1, kws=0x11f1480,
kwcount=0, 
    defs=0x0, defcount=0, closure=0x0) at ../../unladen2/Python/eval.cc:3093
#10 0x000000000055b7b0 in _PyEval_CallFunction (stack_pointer=0x11f1480,
na=1, nk=0) at ../../unladen2/Python/eval.cc:4188
#11 0x000000000055644c in PyEval_EvalFrame (f=0x11f12e0) at
../../unladen2/Python/eval.cc:2518
#12 0x000000000055b225 in PyEval_EvalCodeEx (co=0x7ffff7efe850,
globals=0x1, locals=0x1, args=0x11f1290, argcount=1, kws=0x11f1298,
kwcount=0, 
    defs=0x0, defcount=0, closure=0x0) at ../../unladen2/Python/eval.cc:3093
#13 0x000000000055b7b0 in _PyEval_CallFunction (stack_pointer=0x11f1298,
na=1, nk=0) at ../../unladen2/Python/eval.cc:4188
#14 0x000000000055644c in PyEval_EvalFrame (f=0x11f1110) at
../../unladen2/Python/eval.cc:2518
#15 0x000000000055b225 in PyEval_EvalCodeEx (co=0x7ffff7efe7b0,
globals=0x1, locals=0x1, args=0x7ffff7ee9fe8, argcount=1, kws=0x0,
kwcount=0, 
    defs=0x0, defcount=0, closure=0x0) at ../../unladen2/Python/eval.cc:3093
#16 0x00000000005d97fd in function_call (func=0x7ffff7e55ed8,
arg=0x7ffff7ee9fd0, kw=0x0) at ../../unladen2/Objects/funcobject.c:524
#17 0x00000000004dc68d in PyObject_Call (func=0x7ffff7e55ed8,
arg=0x7ffff7ee9fd0, kw=0x0) at ../../unladen2/Objects/abstract.c:2487
#18 0x00000000004e3bff in instancemethod_call (func=0x7ffff7e55ed8,
arg=0x7ffff7ee9fd0, kw=0x0) at ../../unladen2/Objects/classobject.c:2579
#19 0x00000000004dc68d in PyObject_Call (func=0x7ffff7f3c640,
arg=0x7ffff7f98050, kw=0x0) at ../../unladen2/Objects/abstract.c:2487
#20 0x0000000000553530 in PyEval_CallObjectWithKeywords
(func=0x7ffff7f3c640, arg=0x7ffff7f98050, kw=0x0) at
../../unladen2/Python/eval.cc:45
#21 0x00000000005b1a6d in t_bootstrap (boot_raw=0x125b300) at
../../unladen2/Modules/threadmodule.c:425
#22 0x00007ffff7bca3f7 in start_thread () from /lib/libpthread.so.0
#23 0x00007ffff6d98b3d in clone () from /lib/libc.so.6
#24 0x0000000000000000 in ?? ()

The child process is defunct, but you can still trace the left over
thread with and attach to that with GDB if you guess the process id.  I
couldn't figure out how to get ps to tell it to me, though.  Here's that
trace:

(gdb) bt
#0  0x0000000000558412 in PyEval_EvalFrame (f=0x1245190) at
../../unladen2/Python/eval.cc:2336
#1  0x000000000055b225 in PyEval_EvalCodeEx (co=0x7ffff7fce7b0,
globals=0x0, locals=0x0, args=0x7ffff7f98068, argcount=0, kws=0x10eb600, 
    kwcount=0, defs=0x0, defcount=0, closure=0x0) at
../../unladen2/Python/eval.cc:3093
#2  0x00000000005d98fc in function_call (func=0x7ffff7e56d70,
arg=0x7ffff7f98050, kw=0x123ba90) at ../../unladen2/Objects/funcobject.c:524
#3  0x00000000004dc68d in PyObject_Call (func=0x7ffff7e56d70,
arg=0x7ffff7f98050, kw=0x123ba90) at ../../unladen2/Objects/abstract.c:2487
#4  0x00000000005549d0 in _PyEval_CallFunctionVarKw
(stack_pointer=0x1245138, num_posargs=<value optimized out>, num_kwargs=0, 
    flags=<value optimized out>) at ../../unladen2/Python/eval.cc:45
#5  0x0000000000559de2 in PyEval_EvalFrame (f=0x1244fb0) at
../../unladen2/Python/eval.cc:2560
#6  0x000000000055b225 in PyEval_EvalCodeEx (co=0x7ffff7efe710,
globals=0x1, locals=0x1, args=0x1244ee8, argcount=1, kws=0x1244ef0,
kwcount=0, 
    defs=0x0, defcount=0, closure=0x0) at ../../unladen2/Python/eval.cc:3093
#7  0x000000000055b7b0 in _PyEval_CallFunction (stack_pointer=0x1244ef0,
na=1, nk=0) at ../../unladen2/Python/eval.cc:4188
#8  0x000000000055644c in PyEval_EvalFrame (f=0x1244d50) at
../../unladen2/Python/eval.cc:2518
#9  0x000000000055b225 in PyEval_EvalCodeEx (co=0x7ffff7efe850,
globals=0x1, locals=0x1, args=0x1247c00, argcount=1, kws=0x1247c08,
kwcount=0, 
    defs=0x0, defcount=0, closure=0x0) at ../../unladen2/Python/eval.cc:3093
#10 0x000000000055b7b0 in _PyEval_CallFunction (stack_pointer=0x1247c08,
na=1, nk=0) at ../../unladen2/Python/eval.cc:4188
#11 0x000000000055644c in PyEval_EvalFrame (f=0x1247a80) at
../../unladen2/Python/eval.cc:2518
#12 0x000000000055b225 in PyEval_EvalCodeEx (co=0x7ffff7efe7b0,
globals=0x1, locals=0x1, args=0x7ffff7ef0528, argcount=1, kws=0x0,
kwcount=0, 
    defs=0x0, defcount=0, closure=0x0) at ../../unladen2/Python/eval.cc:3093
#13 0x00000000005d97fd in function_call (func=0x7ffff7e55ed8,
arg=0x7ffff7ef0510, kw=0x0) at ../../unladen2/Objects/funcobject.c:524
#14 0x00000000004dc68d in PyObject_Call (func=0x7ffff7e55ed8,
arg=0x7ffff7ef0510, kw=0x0) at ../../unladen2/Objects/abstract.c:2487
#15 0x00000000004e3bff in instancemethod_call (func=0x7ffff7e55ed8,
arg=0x7ffff7ef0510, kw=0x0) at ../../unladen2/Objects/classobject.c:2579
#16 0x00000000004dc68d in PyObject_Call (func=0x7ffff7f53230,
arg=0x7ffff7f98050, kw=0x0) at ../../unladen2/Objects/abstract.c:2487
#17 0x0000000000553530 in PyEval_CallObjectWithKeywords
(func=0x7ffff7f53230, arg=0x7ffff7f98050, kw=0x0) at
../../unladen2/Python/eval.cc:45
#18 0x00000000005b1a6d in t_bootstrap (boot_raw=0x1273e60) at
../../unladen2/Modules/threadmodule.c:425
#19 0x00007ffff7bca3f7 in start_thread () from /lib/libpthread.so.0
#20 0x00007ffff6d98b3d in clone () from /lib/libc.so.6
#21 0x0000000000000000 in ?? ()
(gdb) pystack
thread_fork_hang.py (15): daemon
Current language:  auto; currently c
Current language:  auto; currently c++
/home/rnk/unladen2/Lib/threading.py (481): run
/home/rnk/unladen2/Lib/threading.py (523): __bootstrap_inner
/home/rnk/unladen2/Lib/threading.py (498): __bootstrap
Current language:  auto; currently c
Current language:  auto; currently c++
Current language:  auto; currently c

You can see that it's stuck in the daemon's busy loop from the test case.

So what's the right way to fix this?  Should Py_Finalize be called for
the child?  The problem is that if the parent process registers cleanup
code via the atexit module or something, it might be run twice, once in
the parent and once in the child.  However, it's closer to the semantics
of fork, and I'd say that's what you get for using it.  Most people
still using fork are probably just turning around and exec'ing anyway.

I'm willing to write the fix if we can agree on a solution.
msg91290 - (view) Author: Reid Kleckner (rnk) (Python committer) Date: 2009-08-05 01:06
Here's a patch against 2.6 for one way to fix it.  I imagine it has
problems, but I wanted to throw it out there as a straw man.

This patch builds on the patch for http://bugs.python.org/issue6643
since some of the test cases will occasionally deadlock without it.
msg192367 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-07-05 22:29
Python 3.x is affected, too.
msg192370 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-07-05 22:56
Shouldn't the child process be terminating using os._exit()?
msg396681 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-06-28 23:56
I add the "Subinterpreters" component even if the issue is about the main interpreter, this kind of issue impacts subinterpreters.
History
Date User Action Args
2022-04-11 14:56:51adminsetgithub: 50891
2021-12-12 19:10:33iritkatrielsetversions: + Python 3.9, Python 3.10, Python 3.11, - Python 2.7, Python 3.3, Python 3.4
2021-06-28 23:56:53vstinnersetnosy: + vstinner
messages: + msg396681
components: + Subinterpreters
2013-07-06 08:36:18pitrousetnosy: + neologix
2013-07-05 22:56:37sbtsetnosy: + sbt
messages: + msg192370
2013-07-05 22:30:10christian.heimessettype: behavior
components: + Interpreter Core
stage: patch review
2013-07-05 22:29:33christian.heimessetnosy: + christian.heimes

messages: + msg192367
versions: + Python 2.7, Python 3.3, Python 3.4, - Python 2.6
2011-01-04 02:06:01pitrousetnosy: + gregory.p.smith
2010-07-18 14:50:49rnksetassignee: rnk
dependencies: + Throw away more radioactive locks that could be held across a fork in threading.py
2009-08-05 01:06:27rnksetfiles: + finalize-patch.diff
keywords: + patch
messages: + msg91290
2009-08-04 20:12:55rnksetversions: + Python 2.6, - Python 3.2
2009-08-04 18:21:49rnkcreate