Title: Forking from background thread
Type: behavior Stage: resolved
Components: Build Versions: Python 3.7, Python 3.6, Python 3.4, Python 3.5, Python 2.7
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: christian.heimes, pablogsal, pitrou, rhettinger, vtsozik
Priority: normal Keywords:

Created on 2019-02-05 22:17 by vtsozik, last changed 2019-02-06 14:42 by pitrou. This issue is now closed.

File name Uploaded Description Edit vtsozik, 2019-02-05 22:17
Messages (6)
msg334886 - (view) Author: Vadim Tsozik (vtsozik) Date: 2019-02-05 22:17
Attached is code sample that forks child process either from main or from background thread. Child starts and joins all of its threads except a sleeping daemon. If parent forks child from main thread program exits immediately after child threads are joined and waitpid is unblocked by SIGCHLD. However if parent process happens to fork from main thread everything works correctly and process exits immediately without waiting for daemon to sleep for 3600 seconds. I'm wondering what is the difference between main and background thread in parent. Only one thread survives forking in child and becomes main thread in the child, so there should be no differences in the behavior.

Thank you in advance for your help,
msg334887 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2019-02-05 22:31
In general threads and forks don't mix well. If you fork from any thread but the main thread, you can run into undefined behavior. Daemon threads are a special property of Python's threading model. You need to have one non-daemon thread running to keep the process active. A Python process exits when all remaining threads are daemon threads.
msg334892 - (view) Author: Vadim Tsozik (vtsozik) Date: 2019-02-05 23:13
Thank you for your reply. I understand that forking and threads do not mix well if developer is not careful and child doesn't clear/reset synchronization variables inherited from parent. However this is not the case in provided code sample. The answer to my question is probably related to the fact that only main thread handles signaling by default in POSIX. But in the provided example parent background thread becomes main in the child. In parent it doesn't matter if waitpid blocks on main or background thread. The only thing really matters if code forks from main or background threads.
msg334904 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-02-06 06:41
Unless it's clear that there is a buggy behavior or a useful feature request, it would be better to move this to StackOverflow which is a more appropriate forum for general questions on how Python works internally.
msg334939 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2019-02-06 14:40
It is actually quite an intricate problem.  What happens is that child process *main thread* ends, but not its background sleeping thread (the `lambda: time.sleep(3600)`).

To diagnose it, you can display the process tree:
$ ps fu
antoine  12634  0.0  0.0  28308  9208 pts/0    Ss   15:21   0:00 bash
antoine   2520  0.0  0.0 179072 10684 pts/0    Sl+  15:29   0:00  \_ ./python
antoine   2522  0.0  0.0      0     0 pts/0    Zl+  15:29   0:00      \_ [python] <defunct>

Then you can display all threads for the child process (here with pid 2522):
$ ps -T -p 2522
 2522  2522 pts/0    00:00:00 python <defunct>
 2522  2525 pts/0    00:00:00 python

The main thread is marked zombie ("defunct") but thread 2525 is still running... What is it doing?  Let's attach gdb:
$ gdb ./python --pid 2525

And display the call stack:
(gdb) bt
#0  0x00007f1fb3ca503f in __GI___select (nfds=nfds@entry=0, readfds=readfds@entry=0x0, writefds=writefds@entry=0x0, 
    exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x7f1fb23553c0) at ../sysdeps/unix/sysv/linux/select.c:41
#1  0x000055e6fc4fcf7e in pysleep (secs=<optimized out>) at ./Modules/timemodule.c:1864
#2  0x000055e6fc4fd022 in time_sleep (self=self@entry=<module at remote 0x7f1fb4a03398>, obj=<optimized out>)
    at ./Modules/timemodule.c:366
#3  0x000055e6fc3a02e7 in _PyMethodDef_RawFastCallKeywords (method=0x55e6fc887ee0 <time_methods+288>, 
    self=<module at remote 0x7f1fb4a03398>, args=args@entry=0x7f1fb336a8f8, nargs=nargs@entry=1, kwnames=0x0) at Objects/call.c:646
#4  0x000055e6fc3a04c7 in _PyCFunction_FastCallKeywords (
    func=func@entry=<built-in method sleep of module object at remote 0x7f1fb4a03398>, args=args@entry=0x7f1fb336a8f8, 
    nargs=nargs@entry=1, kwnames=kwnames@entry=0x0) at Objects/call.c:732
#5  0x000055e6fc4506e9 in call_function (pp_stack=pp_stack@entry=0x7f1fb2355570, oparg=oparg@entry=1, kwnames=kwnames@entry=0x0)
    at Python/ceval.c:4607
#6  0x000055e6fc45c678 in _PyEval_EvalFrameDefault (f=Frame 0x7f1fb336a770, for file, line 36, in <lambda> (), 
    throwflag=<optimized out>) at Python/ceval.c:3195
#7  0x000055e6fc451110 in PyEval_EvalFrameEx (f=f@entry=Frame 0x7f1fb336a770, for file, line 36, in <lambda> (), 
    throwflag=throwflag@entry=0) at Python/ceval.c:581
#8  0x000055e6fc451d21 in _PyEval_EvalCodeWithName (_co=_co@entry=<code at remote 0x7f1fb4989700>, 
    globals=globals@entry={'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <SourceFileLoader(name='__main__', path='') at remote 0x7f1fb49d4710>, '__spec__': None, '__annotations__': {}, '__builtins__': <module at remote 0x7f1fb4adf8c0>, '__file__': '', '__cached__': None, 'threading': <module at remote 0x7f1fb36ca668>, 'time': <module at remote 0x7f1fb4a03398>, 'os': <module at remote 0x7f1fb49e5050>, 'atexit': <module at remote 0x7f1fb36d3aa0>, 'signal': <module at remote 0x7f1fb36cc500>, 'run': <function at remote 0x7f1fb4a93e10>, 'start': <function at remote 0x7f1fb33699f0>, 'join': <function at remote 0x7f1fb3369aa0>, 'runFork': <function at remote 0x7f1fb3369b50>, 'handleExit': <function at remote 0x7f1fb3369c00>, 'handleChildExit': <function at remote 0x7f1fb3369cb0>, 'main': <function at remote 0x7f1fb3369d60>}, locals=locals@entry=0x0, 
    args=args@entry=0x7f1fb4aec078, argcount=argcount@entry=0, kwnames=kwnames@entry=0x0, kwargs=0x0, kwcount=0, kwstep=2, 
    defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name='<lambda>', qualname='runFork.<locals>.<lambda>') at Python/ceval.c:3969


So basically the sleep() call wasn't woken up by the main thread's death... even though we might have expected it to.  This is indeed a case of weird interaction between threads and processes.  The only reference I could find is a single comment in a StackOverflow question:
Be aware that infinite waits on semaphores, handles etc can cause your process to become a zombie in both Windows and Linux.

The reason I'm posting this detailed explanation is that I hit the exact same issue when trying to debug the PEP 556 implementation, and it took me quite some time (and Pablo's help) to finally understand and workaround the issue.

In the end, I would recommend you don't use fork() but use multiprocessing with the "forkserver" start method, which will eliminate such problems:
msg334940 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2019-02-06 14:42
By the way, one likely explanation why this happens only when fork() is called from a non-main thread is that the non-main thread (which becomes the main thread in the child process) ends with pthread_exit() while the main thread would end with exit().
Date User Action Args
2019-02-06 14:42:31pitrousetstatus: open -> closed
resolution: not a bug
messages: + msg334940

stage: resolved
2019-02-06 14:40:29pitrousetnosy: + pitrou
messages: + msg334939
2019-02-06 14:31:44pitrousetnosy: + pablogsal
2019-02-06 06:41:15rhettingersetnosy: + rhettinger
messages: + msg334904
2019-02-05 23:13:32vtsoziksetmessages: + msg334892
2019-02-05 22:31:27christian.heimessetnosy: + christian.heimes
messages: + msg334887
2019-02-05 22:17:51vtsozikcreate