Author rnk
Recipients rnk
Date 2009-08-04.18:21:44
SpamBayes Score 3.83027e-15
Marked as misclassified No
Message-id <1249410110.52.0.861898541816.issue6642@psf.upfronthosting.co.za>
In-reply-to
Content
I attached a test case to reproduce.

Here is what it does:
- The main thread in the parent process starts a new thread and waits
for it.
- The child thread forks.
- The child process creates a daemon thread, and returns.
- The parent process (in the thread that forked) calls os.waitpid(childpid).

What should happen is that the forked child process should terminate
because it shouldn't wait for the daemon thread, and
os.waitpid(childpid) should return after that, and then the main thread
should return from thread.join().

What happens is that because it was a child thread that forked, the C
stack starts inside of the pthread wrapper (or equivalent) instead of
main.  So when child process returns, it doesn't know that it is now the
main thread, and it doesn't execute Py_Finalize.  Furthermore, pthreads
doesn't call sys_exit_group because it thinks that it is a lone thread
returning, and it doesn't want to terminate the entire process group. 
When you look at it with 'ps f', this is what it looks like:

24325 pts/3    Ss     0:01 bash
 4453 pts/3    Sl     0:00  \_ ./python thread_fork_hang.py
 4459 pts/3    Zl     0:07  |   \_ [python] <defunct>
 4467 pts/3    R+     0:00  \_ ps f

Here's the stack traces from the parent process:
(gdb) bt 
#0  0x00007ffff7bd0991 in sem_wait () from /lib/libpthread.so.0
#1  0x0000000000587abd in PyThread_acquire_lock (lock=0x12bb680,
waitflag=1) at ../../unladen2/Python/thread_pthread.h:349
#2  0x00000000005b1660 in lock_PyThread_acquire_lock
(self=0x7ffff7f37150, args=<value optimized out>)
    at ../../unladen2/Modules/threadmodule.c:46
#3  0x000000000055b89d in _PyEval_CallFunction (stack_pointer=0x128ff20,
na=<value optimized out>, nk=0) at ../../unladen2/Python/eval.cc:4046
#4  0x000000000055644c in PyEval_EvalFrame (f=0x128fd60) at
../../unladen2/Python/eval.cc:2518
#5  0x000000000055b225 in PyEval_EvalCodeEx (co=0x7ffff7ef9670,
globals=0x1, locals=0x2, args=0x123bbd8, argcount=1, kws=0x123bbe0,
kwcount=0, 
    defs=0x7ffff7e540e8, defcount=1, closure=0x0) at
../../unladen2/Python/eval.cc:3093
#6  0x000000000055b7b0 in _PyEval_CallFunction (stack_pointer=0x123bbe0,
na=1, nk=0) at ../../unladen2/Python/eval.cc:4188
#7  0x000000000055644c in PyEval_EvalFrame (f=0x123ba40) at
../../unladen2/Python/eval.cc:2518
#8  0x000000000055b225 in PyEval_EvalCodeEx (co=0x7ffff7efea30,
globals=0x1, locals=0x2, args=0x12038c8, argcount=1, kws=0x12038d0,
kwcount=0, 
    defs=0x7ffff7e54368, defcount=1, closure=0x0) at
../../unladen2/Python/eval.cc:3093
#9  0x000000000055b7b0 in _PyEval_CallFunction (stack_pointer=0x12038d0,
na=1, nk=0) at ../../unladen2/Python/eval.cc:4188
#10 0x000000000055644c in PyEval_EvalFrame (f=0x1203750) at
../../unladen2/Python/eval.cc:2518
#11 0x000000000055b225 in PyEval_EvalCodeEx (co=0x7ffff7f55d50,
globals=0x0, locals=0x0, args=0x0, argcount=0, kws=0x0, kwcount=0,
defs=0x0, 
    defcount=0, closure=0x0) at ../../unladen2/Python/eval.cc:3093
#12 0x000000000055bc02 in PyEval_EvalCode (co=0x12bb680, globals=0x80,
locals=0x0) at ../../unladen2/Python/eval.cc:552
#13 0x000000000057deb1 in PyRun_FileExFlags (fp=0x1121260,
filename=0x7fffffffe6be "thread_fork_hang.py", start=<value optimized out>, 
    globals=0x10fa010, locals=0x10fa010, closeit=1,
flags=0x7fffffffe290) at ../../unladen2/Python/pythonrun.c:1359
#14 0x000000000057e167 in PyRun_SimpleFileExFlags (fp=0x1121260,
filename=0x7fffffffe6be "thread_fork_hang.py", closeit=1,
flags=0x7fffffffe290)
    at ../../unladen2/Python/pythonrun.c:955
#15 0x00000000004d8954 in Py_Main (argc=-134459232, argv=<value
optimized out>) at ../../unladen2/Modules/main.c:695
#16 0x00007ffff6cdf1c4 in __libc_start_main () from /lib/libc.so.6
#17 0x00000000004d7ae9 in _start ()
(gdb) thread 2
[Switching to thread 2 (Thread 0x40800950 (LWP 4458))]#0 
0x00007ffff7bd234f in waitpid () from /lib/libpthread.so.0
(gdb) bt
#0  0x00007ffff7bd234f in waitpid () from /lib/libpthread.so.0
#1  0x00000000005b6adf in posix_waitpid (self=<value optimized out>,
args=<value optimized out>) at ../../unladen2/Modules/posixmodule.c:5797
#2  0x000000000055b89d in _PyEval_CallFunction (stack_pointer=0x129cff8,
na=<value optimized out>, nk=0) at ../../unladen2/Python/eval.cc:4046
#3  0x000000000055644c in PyEval_EvalFrame (f=0x129ce60) at
../../unladen2/Python/eval.cc:2518
#4  0x000000000055b225 in PyEval_EvalCodeEx (co=0x7ffff7f558f0,
globals=0x0, locals=0x0, args=0x7ffff7f98068, argcount=0, kws=0x1239f70, 
    kwcount=0, defs=0x0, defcount=0, closure=0x0) at
../../unladen2/Python/eval.cc:3093
#5  0x00000000005d98fc in function_call (func=0x7ffff7eefc80,
arg=0x7ffff7f98050, kw=0x1286d20) at ../../unladen2/Objects/funcobject.c:524
#6  0x00000000004dc68d in PyObject_Call (func=0x7ffff7eefc80,
arg=0x7ffff7f98050, kw=0x1286d20) at ../../unladen2/Objects/abstract.c:2487
#7  0x00000000005549d0 in _PyEval_CallFunctionVarKw
(stack_pointer=0x129ce08, num_posargs=<value optimized out>, num_kwargs=0, 
    flags=<value optimized out>) at ../../unladen2/Python/eval.cc:45
#8  0x0000000000559de2 in PyEval_EvalFrame (f=0x129cc80) at
../../unladen2/Python/eval.cc:2560
#9  0x000000000055b225 in PyEval_EvalCodeEx (co=0x7ffff7efe710,
globals=0x1, locals=0x1, args=0x11f1478, argcount=1, kws=0x11f1480,
kwcount=0, 
    defs=0x0, defcount=0, closure=0x0) at ../../unladen2/Python/eval.cc:3093
#10 0x000000000055b7b0 in _PyEval_CallFunction (stack_pointer=0x11f1480,
na=1, nk=0) at ../../unladen2/Python/eval.cc:4188
#11 0x000000000055644c in PyEval_EvalFrame (f=0x11f12e0) at
../../unladen2/Python/eval.cc:2518
#12 0x000000000055b225 in PyEval_EvalCodeEx (co=0x7ffff7efe850,
globals=0x1, locals=0x1, args=0x11f1290, argcount=1, kws=0x11f1298,
kwcount=0, 
    defs=0x0, defcount=0, closure=0x0) at ../../unladen2/Python/eval.cc:3093
#13 0x000000000055b7b0 in _PyEval_CallFunction (stack_pointer=0x11f1298,
na=1, nk=0) at ../../unladen2/Python/eval.cc:4188
#14 0x000000000055644c in PyEval_EvalFrame (f=0x11f1110) at
../../unladen2/Python/eval.cc:2518
#15 0x000000000055b225 in PyEval_EvalCodeEx (co=0x7ffff7efe7b0,
globals=0x1, locals=0x1, args=0x7ffff7ee9fe8, argcount=1, kws=0x0,
kwcount=0, 
    defs=0x0, defcount=0, closure=0x0) at ../../unladen2/Python/eval.cc:3093
#16 0x00000000005d97fd in function_call (func=0x7ffff7e55ed8,
arg=0x7ffff7ee9fd0, kw=0x0) at ../../unladen2/Objects/funcobject.c:524
#17 0x00000000004dc68d in PyObject_Call (func=0x7ffff7e55ed8,
arg=0x7ffff7ee9fd0, kw=0x0) at ../../unladen2/Objects/abstract.c:2487
#18 0x00000000004e3bff in instancemethod_call (func=0x7ffff7e55ed8,
arg=0x7ffff7ee9fd0, kw=0x0) at ../../unladen2/Objects/classobject.c:2579
#19 0x00000000004dc68d in PyObject_Call (func=0x7ffff7f3c640,
arg=0x7ffff7f98050, kw=0x0) at ../../unladen2/Objects/abstract.c:2487
#20 0x0000000000553530 in PyEval_CallObjectWithKeywords
(func=0x7ffff7f3c640, arg=0x7ffff7f98050, kw=0x0) at
../../unladen2/Python/eval.cc:45
#21 0x00000000005b1a6d in t_bootstrap (boot_raw=0x125b300) at
../../unladen2/Modules/threadmodule.c:425
#22 0x00007ffff7bca3f7 in start_thread () from /lib/libpthread.so.0
#23 0x00007ffff6d98b3d in clone () from /lib/libc.so.6
#24 0x0000000000000000 in ?? ()

The child process is defunct, but you can still trace the left over
thread with and attach to that with GDB if you guess the process id.  I
couldn't figure out how to get ps to tell it to me, though.  Here's that
trace:

(gdb) bt
#0  0x0000000000558412 in PyEval_EvalFrame (f=0x1245190) at
../../unladen2/Python/eval.cc:2336
#1  0x000000000055b225 in PyEval_EvalCodeEx (co=0x7ffff7fce7b0,
globals=0x0, locals=0x0, args=0x7ffff7f98068, argcount=0, kws=0x10eb600, 
    kwcount=0, defs=0x0, defcount=0, closure=0x0) at
../../unladen2/Python/eval.cc:3093
#2  0x00000000005d98fc in function_call (func=0x7ffff7e56d70,
arg=0x7ffff7f98050, kw=0x123ba90) at ../../unladen2/Objects/funcobject.c:524
#3  0x00000000004dc68d in PyObject_Call (func=0x7ffff7e56d70,
arg=0x7ffff7f98050, kw=0x123ba90) at ../../unladen2/Objects/abstract.c:2487
#4  0x00000000005549d0 in _PyEval_CallFunctionVarKw
(stack_pointer=0x1245138, num_posargs=<value optimized out>, num_kwargs=0, 
    flags=<value optimized out>) at ../../unladen2/Python/eval.cc:45
#5  0x0000000000559de2 in PyEval_EvalFrame (f=0x1244fb0) at
../../unladen2/Python/eval.cc:2560
#6  0x000000000055b225 in PyEval_EvalCodeEx (co=0x7ffff7efe710,
globals=0x1, locals=0x1, args=0x1244ee8, argcount=1, kws=0x1244ef0,
kwcount=0, 
    defs=0x0, defcount=0, closure=0x0) at ../../unladen2/Python/eval.cc:3093
#7  0x000000000055b7b0 in _PyEval_CallFunction (stack_pointer=0x1244ef0,
na=1, nk=0) at ../../unladen2/Python/eval.cc:4188
#8  0x000000000055644c in PyEval_EvalFrame (f=0x1244d50) at
../../unladen2/Python/eval.cc:2518
#9  0x000000000055b225 in PyEval_EvalCodeEx (co=0x7ffff7efe850,
globals=0x1, locals=0x1, args=0x1247c00, argcount=1, kws=0x1247c08,
kwcount=0, 
    defs=0x0, defcount=0, closure=0x0) at ../../unladen2/Python/eval.cc:3093
#10 0x000000000055b7b0 in _PyEval_CallFunction (stack_pointer=0x1247c08,
na=1, nk=0) at ../../unladen2/Python/eval.cc:4188
#11 0x000000000055644c in PyEval_EvalFrame (f=0x1247a80) at
../../unladen2/Python/eval.cc:2518
#12 0x000000000055b225 in PyEval_EvalCodeEx (co=0x7ffff7efe7b0,
globals=0x1, locals=0x1, args=0x7ffff7ef0528, argcount=1, kws=0x0,
kwcount=0, 
    defs=0x0, defcount=0, closure=0x0) at ../../unladen2/Python/eval.cc:3093
#13 0x00000000005d97fd in function_call (func=0x7ffff7e55ed8,
arg=0x7ffff7ef0510, kw=0x0) at ../../unladen2/Objects/funcobject.c:524
#14 0x00000000004dc68d in PyObject_Call (func=0x7ffff7e55ed8,
arg=0x7ffff7ef0510, kw=0x0) at ../../unladen2/Objects/abstract.c:2487
#15 0x00000000004e3bff in instancemethod_call (func=0x7ffff7e55ed8,
arg=0x7ffff7ef0510, kw=0x0) at ../../unladen2/Objects/classobject.c:2579
#16 0x00000000004dc68d in PyObject_Call (func=0x7ffff7f53230,
arg=0x7ffff7f98050, kw=0x0) at ../../unladen2/Objects/abstract.c:2487
#17 0x0000000000553530 in PyEval_CallObjectWithKeywords
(func=0x7ffff7f53230, arg=0x7ffff7f98050, kw=0x0) at
../../unladen2/Python/eval.cc:45
#18 0x00000000005b1a6d in t_bootstrap (boot_raw=0x1273e60) at
../../unladen2/Modules/threadmodule.c:425
#19 0x00007ffff7bca3f7 in start_thread () from /lib/libpthread.so.0
#20 0x00007ffff6d98b3d in clone () from /lib/libc.so.6
#21 0x0000000000000000 in ?? ()
(gdb) pystack
thread_fork_hang.py (15): daemon
Current language:  auto; currently c
Current language:  auto; currently c++
/home/rnk/unladen2/Lib/threading.py (481): run
/home/rnk/unladen2/Lib/threading.py (523): __bootstrap_inner
/home/rnk/unladen2/Lib/threading.py (498): __bootstrap
Current language:  auto; currently c
Current language:  auto; currently c++
Current language:  auto; currently c

You can see that it's stuck in the daemon's busy loop from the test case.

So what's the right way to fix this?  Should Py_Finalize be called for
the child?  The problem is that if the parent process registers cleanup
code via the atexit module or something, it might be run twice, once in
the parent and once in the child.  However, it's closer to the semantics
of fork, and I'd say that's what you get for using it.  Most people
still using fork are probably just turning around and exec'ing anyway.

I'm willing to write the fix if we can agree on a solution.
History
Date User Action Args
2009-08-04 18:21:51rnksetrecipients: + rnk
2009-08-04 18:21:50rnksetmessageid: <1249410110.52.0.861898541816.issue6642@psf.upfronthosting.co.za>
2009-08-04 18:21:49rnklinkissue6642 messages
2009-08-04 18:21:45rnkcreate