classification
Title: Python hangs on fork when a logger is in use in a background thread
Type: Stage:
Components: Library (Lib) Versions: Python 3.7, Python 3.6, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: ryan.petrello, vinay.sajip, xtreak
Priority: normal Keywords:

Created on 2019-06-27 12:39 by ryan.petrello, last changed 2019-06-27 13:32 by ryan.petrello.

Files
File name Uploaded Description Edit
hang.py ryan.petrello, 2019-06-27 12:39
Messages (6)
msg346732 - (view) Author: Ryan Petrello (ryan.petrello) Date: 2019-06-27 12:39
It's possible to cause Python child processes to encountered a deadlock and hang on fork by:

1) Spawning one or more threads in a parent process
2) Have one of those threads emit log lines
3) Have the main thread use os.fork()
4) In the child, log immediately

The attached Python script can be used to reproduce this issue in Python 2.7.15 and Python 3.6.5
msg346733 - (view) Author: Ryan Petrello (ryan.petrello) Date: 2019-06-27 12:41
This issue seems _very_ similar to https://bugs.python.org/issue6721, but I wasn't able to encounter the exact stack I've been seeing on my end.  Specifically, when I run the reproducer script in this issue (hang.py) for a few minutes, hung child processes start to build up.
msg346734 - (view) Author: Ryan Petrello (ryan.petrello) Date: 2019-06-27 12:47
Here's the stack of one the hung child processes under py2.7:

(gdb) py-bt
#4 Waiting for a lock (e.g. GIL)
#5 Waiting for a lock (e.g. GIL)
#7 Frame 0x7f346a925430, for file /usr/lib64/python2.7/Queue.py, line 118, in put (self=<Queue(unfinished_tasks=147, queue=<collections.deque at remote 0x7f346a908ec0>, maxsize=0, all_tasks_done=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x7f346aa74230>, acquire=<built-in method acquire of thread.lock object at remote 0x7f346aa74230>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x7f346aa74230>) at remote 0x7f3467df4150>, mutex=<thread.lock at remote 0x7f346aa74230>, not_full=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x7f346aa74230>, acquire=<built-in method acquire of thread.lock object at remote 0x7f346aa74230>, _Condition__waiters=[], release=<built-in method release of thread.lock object at remote 0x7f346aa74230>) at remote 0x7f3467df4110>, not_empty=<_Condition(_Verbose__verbose=False, _Condition__lock=<thread.lock at remote 0x7f346aa74230>, acquire=<built-in method acquire of thread.lock object ...(truncated)
    self.not_full.acquire()
#11 Frame 0x1ae6ae0, for file /tmp/foo/lib/python2.7/site-packages/concurrent/futures/thread.py, line 129, in submit (self=<ThreadPoolExecutor(_shutdown_lock=<thread.lock at remote 0x7f346aa74270>, _shutdown=False, _max_workers=2, _thread_name_prefix='Thread---Type <return> to continue, or q <return> to quit---
PoolExecutor-0', _threads=set([<...>, <...>]), _work_queue=<...>) at remote 0x7f3467df0f90>, fn=<instancemethod at remote 0x7f3467e01e10>, args=(<LogRecord(threadName='MainThread', name='example', thread=139863104821056, created=<float at remote 0x1a7d580>, process=1155, processName='Process-74', args=(), module='hang', filename='hang.py', levelno=20, exc_text=None, pathname='/tmp/hang.py', lineno=25, msg='exiting', exc_info=None, funcName='exit', relativeCreated=<float at remote 0x1a7d568>, levelname='INFO', msecs=<float at remote 0x1a7d550>) at remote 0x7f346a972490>,), kwargs={}, f=<Future(_exception=None, _result=None, _condition=<_Condition(_Condition__lock=<_RLock(_Verbose__verbose=False, _RLock__owner=None, _RLock__block=<thread.lock at remote 0x7f346aa...(truncated)
    self._work_queue.put(w)
#15 Frame 0x7f3467df15a8, for file /tmp/hang.py, line 16, in handle (self=<ThreadPoolLogger(executor=<ThreadPoolExecutor(_shutdown_lock=<thread.lock at remote 0x7f346aa74270>, _shutdown=False, _max_workers=2, _thread_name_prefix='ThreadPoolExecutor-0', _threads=set([<...>, <...>]), _work_queue=<...>) at remote 0x7f3467df0f90>) at remote 0x7f3467df0f50>, record=<LogRecord(threadName='MainThread', name='example', thread=139863104821056, created=<float at remote 0x1a7d580>, process=1155, processName='Process-74', args=(), module='hang', filename='hang.py', levelno=20, exc_text=None, pathname='/tmp/hang.py', lineno=25, msg='exiting', exc_info=None, funcName='exit', relativeCreated=<f---Type <return> to continue, or q <return> to quit---
loat at remote 0x1a7d568>, levelname='INFO', msecs=<float at remote 0x1a7d550>) at remote 0x7f346a972490>)
    self.executor.submit(self.send, record)
#19 Frame 0x7f346a925240, for file /usr/lib64/python2.7/logging/__init__.py, line 1318, in callHandlers (self=<Logger(name='example', parent=None, handlers=[<ThreadPoolLogger(executor=<ThreadPoolExecutor(_shutdown_lock=<thread.lock at remote 0x7f346aa74270>, _shutdown=False, _max_workers=2, _thread_name_prefix='ThreadPoolExecutor-0', _threads=set([<...>, <...>]), _work_queue=<...>) at remote 0x7f3467df0f90>) at remote 0x7f3467df0f50>], level=10, disabled=0, propagate=1, filters=[]) at remote 0x7f3467de1a90>, record=<LogRecord(threadName='MainThread', name='example', thread=139863104821056, created=<float at remote 0x1a7d580>, process=1155, processName='Process-74', args=(), module='hang', filename='hang.py', levelno=20, exc_text=None, pathname='/tmp/hang.py', lineno=25, msg='exiting', exc_info=None, funcName='exit', relativeCreated=<float at remote 0x1a7d568>, levelname='INFO', msecs=<float at remote 0x1a7d550>) at remote 0x7f346a972490>, c=<...>, found=1, hdlr=<...>)
    hdlr.handle(record)
#22 Frame 0x7f3467df13e0, for file /usr/lib64/python2.7/logging/__init__.py, line 1278, in handle (self=<Logger(name='example', parent=None, handlers=[<ThreadPoolLogger(executor=<ThreadPoolExecutor(_shutdown_lock=<thread.lock at remote 0x7f346aa74270>, _shut---Type <return> to continue, or q <return> to quit---
down=False, _max_workers=2, _thread_name_prefix='ThreadPoolExecutor-0', _threads=set([<...>, <...>]), _work_queue=<...>) at remote 0x7f3467df0f90>) at remote 0x7f3467df0f50>], level=10, disabled=0, propagate=1, filters=[]) at remote 0x7f3467de1a90>, record=<LogRecord(threadName='MainThread', name='example', thread=139863104821056, created=<float at remote 0x1a7d580>, process=1155, processName='Process-74', args=(), module='hang', filename='hang.py', levelno=20, exc_text=None, pathname='/tmp/hang.py', lineno=25, msg='exiting', exc_info=None, funcName='exit', relativeCreated=<float at remote 0x1a7d568>, levelname='INFO', msecs=<float at remote 0x1a7d550>) at remote 0x7f346a972490>)
    self.callHandlers(record)
#25 Frame 0x1ae60e0, for file /usr/lib64/python2.7/logging/__init__.py, line 1268, in _log (self=<Logger(name='example', parent=None, handlers=[<ThreadPoolLogger(executor=<ThreadPoolExecutor(_shutdown_lock=<thread.lock at remote 0x7f346aa74270>, _shutdown=False, _max_workers=2, _thread_name_prefix='ThreadPoolExecutor-0', _threads=set([<...>, <...>]), _work_queue=<...>) at remote 0x7f3467df0f90>) at remote 0x7f3467df0f50>], level=10, disabled=0, propagate=1, filters=[]) at remote 0x7f3467de1a90>, level=20, msg='exiting', args=(), exc_info=None, extra=None, fn='/tmp/hang.py', lno=25, func='exit', record=<LogRecord(threadName='MainThread', name='example', thread=139863104821056, created=<float at remote 0x1a7d580>, process=1155, processName='Process-74', args=(...), ---Type <return> to continue, or q <return> to quit---

Here's the similar stack under py3.6:

(gdb) bt
#0  0x00007f208166cadb in futex_abstimed_wait (cancel=true, private=<optimized out>,
    abstime=0x0, expected=0, futex=0x88bbf0)
    at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:43
#1  do_futex_wait (sem=sem@entry=0x88bbf0, abstime=0x0)
    at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:223
#2  0x00007f208166cb6f in __new_sem_wait_slow (sem=0x88bbf0, abstime=0x0)
    at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:292
#3  0x00007f208166cc0b in __new_sem_wait (sem=<optimized out>)
    at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:28
#4  0x00007f2081a05a80 in PyThread_acquire_lock_timed (lock=0x88bbf0,
    microseconds=microseconds@entry=-1, intr_flag=intr_flag@entry=0)
    at /usr/src/debug/Python-3.6.8/Python/thread_pthread.h:386
#5  0x00007f2081a05c34 in PyThread_acquire_lock (lock=<optimized out>,
    waitflag=waitflag@entry=1)
    at /usr/src/debug/Python-3.6.8/Python/thread_pthread.h:595
#6  0x00007f2081918797 in _enter_buffered_busy (self=self@entry=0x7f2081e29b48)
    at /usr/src/debug/Python-3.6.8/Modules/_io/bufferedio.c:292
#7  0x00007f2081a1f131 in buffered_flush (self=0x7f2081e29b48, args=<optimized out>)
    at /usr/src/debug/Python-3.6.8/Modules/_io/bufferedio.c:852
---Type <return> to continue, or q <return> to quit---
#8  0x00007f2081962dd0 in _PyCFunction_FastCallDict (
    func_obj=func_obj@entry=0x7f207eb1cb88, args=args@entry=0x0, nargs=0,
    kwargs=kwargs@entry=0x0)
    at /usr/src/debug/Python-3.6.8/Objects/methodobject.c:209
#9  0x00007f2081924c46 in _PyObject_FastCallDict (func=func@entry=0x7f207eb1cb88,
    args=args@entry=0x0, nargs=nargs@entry=0, kwargs=kwargs@entry=0x0)
    at /usr/src/debug/Python-3.6.8/Objects/abstract.c:2313
#10 0x00007f2081925c66 in callmethod (is_size_t=1, va=0x7ffee1ce4ef0, format=0x0,
    func=0x7f207eb1cb88) at /usr/src/debug/Python-3.6.8/Objects/abstract.c:2583
msg346736 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2019-06-27 13:14
See also https://bugs.python.org/issue36533
msg346737 - (view) Author: Ryan Petrello (ryan.petrello) Date: 2019-06-27 13:27
Also, for what it's worth, I'm *not* able to reproduce this hang in Python 3.7.0.
msg346738 - (view) Author: Ryan Petrello (ryan.petrello) Date: 2019-06-27 13:32
Actually, I think I *am* seeing this on 3.7:

(gdb) bt
#0  0x00007f130d530adb in futex_abstimed_wait (cancel=true, private=<optimized out>, abstime=0x0, expected=0,
    futex=0x142c6d0) at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:43
#1  do_futex_wait (sem=sem@entry=0x142c6d0, abstime=0x0) at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:223
#2  0x00007f130d530b6f in __new_sem_wait_slow (sem=0x142c6d0, abstime=0x0)
    at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:292
#3  0x00007f130d530c0b in __new_sem_wait (sem=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:28
#4  0x0000000000535baa in PyThread_acquire_lock_timed (lock=0x142c6d0, microseconds=microseconds@entry=-1,
    intr_flag=intr_flag@entry=0) at Python/thread_pthread.h:340
#5  0x0000000000535c24 in PyThread_acquire_lock (lock=<optimized out>, waitflag=waitflag@entry=1)
    at Python/thread_pthread.h:576
#6  0x000000000058949a in _enter_buffered_busy (self=0x7f130ca0fbf8) at ./Modules/_io/bufferedio.c:282
#7  buffered_flush (self=0x7f130ca0fbf8, args=<optimized out>) at ./Modules/_io/bufferedio.c:835
#8  0x000000000043aae8 in _PyMethodDef_RawFastCallDict (kwargs=<optimized out>, nargs=0, args=<optimized out>,
    self=0x7f130ca0fbf8, method=0x8eeda0 <bufferedwriter_methods+320>) at Objects/call.c:482
#9  _PyCFunction_FastCallDict (func=0x7f130aeb3438, args=<optimized out>, nargs=0, kwargs=<optimized out>)
    at Objects/call.c:582
History
Date User Action Args
2019-06-27 13:32:31ryan.petrellosetmessages: + msg346738
versions: + Python 3.7
2019-06-27 13:27:37ryan.petrellosetmessages: + msg346737
2019-06-27 13:14:54xtreaksetnosy: + vinay.sajip, xtreak
messages: + msg346736
2019-06-27 12:47:54ryan.petrellosetmessages: + msg346734
2019-06-27 12:41:01ryan.petrellosetmessages: + msg346733
2019-06-27 12:39:22ryan.petrellocreate