This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author pitrou
Recipients christian.heimes, pablogsal, pitrou, rhettinger, vtsozik
Date 2019-02-06.14:40:28
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
It is actually quite an intricate problem.  What happens is that child process *main thread* ends, but not its background sleeping thread (the `lambda: time.sleep(3600)`).

To diagnose it, you can display the process tree:
$ ps fu
antoine  12634  0.0  0.0  28308  9208 pts/0    Ss   15:21   0:00 bash
antoine   2520  0.0  0.0 179072 10684 pts/0    Sl+  15:29   0:00  \_ ./python
antoine   2522  0.0  0.0      0     0 pts/0    Zl+  15:29   0:00      \_ [python] <defunct>

Then you can display all threads for the child process (here with pid 2522):
$ ps -T -p 2522
 2522  2522 pts/0    00:00:00 python <defunct>
 2522  2525 pts/0    00:00:00 python

The main thread is marked zombie ("defunct") but thread 2525 is still running... What is it doing?  Let's attach gdb:
$ gdb ./python --pid 2525

And display the call stack:
(gdb) bt
#0  0x00007f1fb3ca503f in __GI___select (nfds=nfds@entry=0, readfds=readfds@entry=0x0, writefds=writefds@entry=0x0, 
    exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x7f1fb23553c0) at ../sysdeps/unix/sysv/linux/select.c:41
#1  0x000055e6fc4fcf7e in pysleep (secs=<optimized out>) at ./Modules/timemodule.c:1864
#2  0x000055e6fc4fd022 in time_sleep (self=self@entry=<module at remote 0x7f1fb4a03398>, obj=<optimized out>)
    at ./Modules/timemodule.c:366
#3  0x000055e6fc3a02e7 in _PyMethodDef_RawFastCallKeywords (method=0x55e6fc887ee0 <time_methods+288>, 
    self=<module at remote 0x7f1fb4a03398>, args=args@entry=0x7f1fb336a8f8, nargs=nargs@entry=1, kwnames=0x0) at Objects/call.c:646
#4  0x000055e6fc3a04c7 in _PyCFunction_FastCallKeywords (
    func=func@entry=<built-in method sleep of module object at remote 0x7f1fb4a03398>, args=args@entry=0x7f1fb336a8f8, 
    nargs=nargs@entry=1, kwnames=kwnames@entry=0x0) at Objects/call.c:732
#5  0x000055e6fc4506e9 in call_function (pp_stack=pp_stack@entry=0x7f1fb2355570, oparg=oparg@entry=1, kwnames=kwnames@entry=0x0)
    at Python/ceval.c:4607
#6  0x000055e6fc45c678 in _PyEval_EvalFrameDefault (f=Frame 0x7f1fb336a770, for file, line 36, in <lambda> (), 
    throwflag=<optimized out>) at Python/ceval.c:3195
#7  0x000055e6fc451110 in PyEval_EvalFrameEx (f=f@entry=Frame 0x7f1fb336a770, for file, line 36, in <lambda> (), 
    throwflag=throwflag@entry=0) at Python/ceval.c:581
#8  0x000055e6fc451d21 in _PyEval_EvalCodeWithName (_co=_co@entry=<code at remote 0x7f1fb4989700>, 
    globals=globals@entry={'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <SourceFileLoader(name='__main__', path='') at remote 0x7f1fb49d4710>, '__spec__': None, '__annotations__': {}, '__builtins__': <module at remote 0x7f1fb4adf8c0>, '__file__': '', '__cached__': None, 'threading': <module at remote 0x7f1fb36ca668>, 'time': <module at remote 0x7f1fb4a03398>, 'os': <module at remote 0x7f1fb49e5050>, 'atexit': <module at remote 0x7f1fb36d3aa0>, 'signal': <module at remote 0x7f1fb36cc500>, 'run': <function at remote 0x7f1fb4a93e10>, 'start': <function at remote 0x7f1fb33699f0>, 'join': <function at remote 0x7f1fb3369aa0>, 'runFork': <function at remote 0x7f1fb3369b50>, 'handleExit': <function at remote 0x7f1fb3369c00>, 'handleChildExit': <function at remote 0x7f1fb3369cb0>, 'main': <function at remote 0x7f1fb3369d60>}, locals=locals@entry=0x0, 
    args=args@entry=0x7f1fb4aec078, argcount=argcount@entry=0, kwnames=kwnames@entry=0x0, kwargs=0x0, kwcount=0, kwstep=2, 
    defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name='<lambda>', qualname='runFork.<locals>.<lambda>') at Python/ceval.c:3969


So basically the sleep() call wasn't woken up by the main thread's death... even though we might have expected it to.  This is indeed a case of weird interaction between threads and processes.  The only reference I could find is a single comment in a StackOverflow question:
Be aware that infinite waits on semaphores, handles etc can cause your process to become a zombie in both Windows and Linux.

The reason I'm posting this detailed explanation is that I hit the exact same issue when trying to debug the PEP 556 implementation, and it took me quite some time (and Pablo's help) to finally understand and workaround the issue.

In the end, I would recommend you don't use fork() but use multiprocessing with the "forkserver" start method, which will eliminate such problems:
Date User Action Args
2019-02-06 14:40:31pitrousetrecipients: + pitrou, rhettinger, christian.heimes, pablogsal, vtsozik
2019-02-06 14:40:29pitrousetmessageid: <>
2019-02-06 14:40:29pitroulinkissue35902 messages
2019-02-06 14:40:28pitroucreate