Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python hangs on fork when a logger is in use in a background thread #81610

Open
ryanpetrello mannequin opened this issue Jun 27, 2019 · 7 comments
Open

Python hangs on fork when a logger is in use in a background thread #81610

ryanpetrello mannequin opened this issue Jun 27, 2019 · 7 comments
Labels
3.7 (EOL) end of life stdlib Python modules in the Lib dir

Comments

@ryanpetrello
Copy link
Mannequin

ryanpetrello mannequin commented Jun 27, 2019

BPO 37429
Nosy @vsajip, @tirkarthi
Files
  • hang.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2019-06-27.12:39:22.242>
    labels = ['3.7', 'library']
    title = 'Python hangs on fork when a logger is in use in a background thread'
    updated_at = <Date 2019-06-27.13:32:31.963>
    user = 'https://bugs.python.org/ryanpetrello'

    bugs.python.org fields:

    activity = <Date 2019-06-27.13:32:31.963>
    actor = 'ryan.petrello'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2019-06-27.12:39:22.242>
    creator = 'ryan.petrello'
    dependencies = []
    files = ['48441']
    hgrepos = []
    issue_num = 37429
    keywords = []
    message_count = 6.0
    messages = ['346732', '346733', '346734', '346736', '346737', '346738']
    nosy_count = 3.0
    nosy_names = ['vinay.sajip', 'ryan.petrello', 'xtreak']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue37429'
    versions = ['Python 2.7', 'Python 3.6', 'Python 3.7']

    @ryanpetrello
    Copy link
    Mannequin Author

    ryanpetrello mannequin commented Jun 27, 2019

    It's possible to cause Python child processes to encountered a deadlock and hang on fork by:

    1. Spawning one or more threads in a parent process
    2. Have one of those threads emit log lines
    3. Have the main thread use os.fork()
    4. In the child, log immediately

    The attached Python script can be used to reproduce this issue in Python 2.7.15 and Python 3.6.5

    @ryanpetrello ryanpetrello mannequin added the stdlib Python modules in the Lib dir label Jun 27, 2019
    @ryanpetrello
    Copy link
    Mannequin Author

    ryanpetrello mannequin commented Jun 27, 2019

    This issue seems _very_ similar to https://bugs.python.org/issue6721, but I wasn't able to encounter the exact stack I've been seeing on my end. Specifically, when I run the reproducer script in this issue (hang.py) for a few minutes, hung child processes start to build up.

    @ryanpetrello
    Copy link
    Mannequin Author

    ryanpetrello mannequin commented Jun 27, 2019

    Here's the stack of one the hung child processes under py2.7:

    (gdb) py-bt
    #4 Waiting for a lock (e.g. GIL)
    #5 Waiting for a lock (e.g. GIL)
    #7 Frame 0x7f346a925430, for file /usr/lib64/python2.7/Queue.py, line 118, in put (self=\<Queue(unfinished_tasks=147, queue=\<collections.deque at remote 0x7f346a908ec0\>, maxsize=0, all_tasks_done=\<_Condition(_Verbose__verbose=False, \_Condition__lock=\<thread.lock at remote 0x7f346aa74230\>, acquire=\<built-in method acquire of thread.lock object at remote 0x7f346aa74230\>, \_Condition__waiters=[], release=\<built-in method release of thread.lock object at remote 0x7f346aa74230\>) at remote 0x7f3467df4150\>, mutex=\<thread.lock at remote 0x7f346aa74230\>, not_full=\<_Condition(_Verbose__verbose=False, \_Condition__lock=\<thread.lock at remote 0x7f346aa74230\>, acquire=\<built-in method acquire of thread.lock object at remote 0x7f346aa74230\>, \_Condition__waiters=[], release=\<built-in method release of thread.lock object at remote 0x7f346aa74230\>) at remote 0x7f3467df4110\>, not_empty=\<_Condition(_Verbose__verbose=False, \_Condition__lock=\<thread.lock at remote 0x7f346aa74230\>, acquire=\<built-in method acquire of thread.lock object ...(truncated)
        self.not_full.acquire()
    #11 Frame 0x1ae6ae0, for file /tmp/foo/lib/python2.7/site-packages/concurrent/futures/thread.py, line 129, in submit (self=\<ThreadPoolExecutor(_shutdown_lock=\<thread.lock at remote 0x7f346aa74270\>, \_shutdown=False, \_max_workers=2, \_thread_name_prefix='Thread---Type \<return\> to continue, or q \<return\> to quit\---
    PoolExecutor-0', \_threads=set([\<...\>, \<...\>]), \_work_queue=\<...\>) at remote 0x7f3467df0f90\>, fn=\<instancemethod at remote 0x7f3467e01e10\>, args=(\<LogRecord(threadName='MainThread', name='example', thread=139863104821056, created=\<float at remote 0x1a7d580\>, process=1155, processName='Process-74', args=(), module='hang', filename='hang.py', levelno=20, exc_text=None, pathname='/tmp/hang.py', lineno=25, msg='exiting', exc_info=None, funcName='exit', relativeCreated=\<float at remote 0x1a7d568\>, levelname='INFO', msecs=\<float at remote 0x1a7d550\>) at remote 0x7f346a972490\>,), kwargs={}, f=\<Future(_exception=None, \_result=None, \_condition=\<_Condition(_Condition__lock=\<_RLock(_Verbose__verbose=False, \_RLock__owner=None, \_RLock__block=\<thread.lock at remote 0x7f346aa...(truncated)
        self.\_work_queue.put(w)
    #15 Frame 0x7f3467df15a8, for file /tmp/hang.py, line 16, in handle (self=\<ThreadPoolLogger(executor=\<ThreadPoolExecutor(_shutdown_lock=\<thread.lock at remote 0x7f346aa74270\>, \_shutdown=False, \_max_workers=2, \_thread_name_prefix='ThreadPoolExecutor-0', \_threads=set([\<...\>, \<...\>]), \_work_queue=\<...\>) at remote 0x7f3467df0f90\>) at remote 0x7f3467df0f50\>, record=\<LogRecord(threadName='MainThread', name='example', thread=139863104821056, created=\<float at remote 0x1a7d580\>, process=1155, processName='Process-74', args=(), module='hang', filename='hang.py', levelno=20, exc_text=None, pathname='/tmp/hang.py', lineno=25, msg='exiting', exc_info=None, funcName='exit', relativeCreated=\<f---Type \<return\> to continue, or q \<return\> to quit\---
    loat at remote 0x1a7d568\>, levelname='INFO', msecs=\<float at remote 0x1a7d550\>) at remote 0x7f346a972490\>)
        self.executor.submit(self.send, record)
    #19 Frame 0x7f346a925240, for file /usr/lib64/python2.7/logging/__init__.py, line 1318, in callHandlers (self=\<Logger(name='example', parent=None, handlers=[\<ThreadPoolLogger(executor=\<ThreadPoolExecutor(_shutdown_lock=\<thread.lock at remote 0x7f346aa74270\>, \_shutdown=False, \_max_workers=2, \_thread_name_prefix='ThreadPoolExecutor-0', \_threads=set([\<...\>, \<...\>]), \_work_queue=\<...\>) at remote 0x7f3467df0f90\>) at remote 0x7f3467df0f50\>], level=10, disabled=0, propagate=1, filters=[]) at remote 0x7f3467de1a90\>, record=\<LogRecord(threadName='MainThread', name='example', thread=139863104821056, created=\<float at remote 0x1a7d580\>, process=1155, processName='Process-74', args=(), module='hang', filename='hang.py', levelno=20, exc_text=None, pathname='/tmp/hang.py', lineno=25, msg='exiting', exc_info=None, funcName='exit', relativeCreated=\<float at remote 0x1a7d568\>, levelname='INFO', msecs=\<float at remote 0x1a7d550\>) at remote 0x7f346a972490\>, c=\<...\>, found=1, hdlr=\<...\>)
        hdlr.handle(record)
    #22 Frame 0x7f3467df13e0, for file /usr/lib64/python2.7/logging/__init__.py, line 1278, in handle (self=\<Logger(name='example', parent=None, handlers=[\<ThreadPoolLogger(executor=\<ThreadPoolExecutor(_shutdown_lock=\<thread.lock at remote 0x7f346aa74270\>, \_shut---Type \<return\> to continue, or q \<return\> to quit\---
    down=False, _max_workers=2, _thread_name_prefix='ThreadPoolExecutor-0', _threads=set([<...>, <...>]), _work_queue=<...>) at remote 0x7f3467df0f90>) at remote 0x7f3467df0f50>], level=10, disabled=0, propagate=1, filters=[]) at remote 0x7f3467de1a90>, record=<LogRecord(threadName='MainThread', name='example', thread=139863104821056, created=<float at remote 0x1a7d580>, process=1155, processName='Process-74', args=(), module='hang', filename='hang.py', levelno=20, exc_text=None, pathname='/tmp/hang.py', lineno=25, msg='exiting', exc_info=None, funcName='exit', relativeCreated=<float at remote 0x1a7d568>, levelname='INFO', msecs=<float at remote 0x1a7d550>) at remote 0x7f346a972490>)
        self.callHandlers(record)
    #25 Frame 0x1ae60e0, for file /usr/lib64/python2.7/logging/__init__.py, line 1268, in _log (self=<Logger(name='example', parent=None, handlers=[<ThreadPoolLogger(executor=<ThreadPoolExecutor(_shutdown_lock=<thread.lock at remote 0x7f346aa74270>, _shutdown=False, _max_workers=2, _thread_name_prefix='ThreadPoolExecutor-0', _threads=set([<...>, <...>]), _work_queue=<...>) at remote 0x7f3467df0f90>) at remote 0x7f3467df0f50>], level=10, disabled=0, propagate=1, filters=[]) at remote 0x7f3467de1a90>, level=20, msg='exiting', args=(), exc_info=None, extra=None, fn='/tmp/hang.py', lno=25, func='exit', record=<LogRecord(threadName='MainThread', name='example', thread=139863104821056, created=<float at remote 0x1a7d580>, process=1155, processName='Process-74', args=(...), ---Type <return> to continue, or q <return> to quit
    

    Here's the similar stack under py3.6:

    (gdb) bt
    #0  0x00007f208166cadb in futex_abstimed_wait (cancel=true, private=\<optimized out\>,
        abstime=0x0, expected=0, futex=0x88bbf0)
        at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:43
    #1  do_futex_wait (sem=sem@entry=0x88bbf0, abstime=0x0)
        at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:223
    #2  0x00007f208166cb6f in \_\_new_sem_wait_slow (sem=0x88bbf0, abstime=0x0)
        at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:292
    #3  0x00007f208166cc0b in \_\_new_sem_wait (sem=\<optimized out\>)
        at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:28
    #4  0x00007f2081a05a80 in PyThread_acquire_lock_timed (lock=0x88bbf0,
        microseconds=microseconds@entry=-1, intr_flag=intr_flag@entry=0)
        at /usr/src/debug/Python-3.6.8/Python/thread_pthread.h:386
    #5  0x00007f2081a05c34 in PyThread_acquire_lock (lock=\<optimized out\>,
        waitflag=waitflag@entry=1)
        at /usr/src/debug/Python-3.6.8/Python/thread_pthread.h:595
    #6  0x00007f2081918797 in \_enter_buffered_busy (self=self@entry=0x7f2081e29b48)
        at /usr/src/debug/Python-3.6.8/Modules/_io/bufferedio.c:292
    #7  0x00007f2081a1f131 in buffered_flush (self=0x7f2081e29b48, args=\<optimized out\>)
        at /usr/src/debug/Python-3.6.8/Modules/_io/bufferedio.c:852
    ---Type \<return\> to continue, or q \<return\> to quit\---
    #8  0x00007f2081962dd0 in _PyCFunction_FastCallDict (
        func_obj=func_obj@entry=0x7f207eb1cb88, args=args@entry=0x0, nargs=0,
        kwargs=kwargs@entry=0x0)
        at /usr/src/debug/Python-3.6.8/Objects/methodobject.c:209
    #9  0x00007f2081924c46 in _PyObject_FastCallDict (func=func@entry=0x7f207eb1cb88,
        args=args@entry=0x0, nargs=nargs@entry=0, kwargs=kwargs@entry=0x0)
        at /usr/src/debug/Python-3.6.8/Objects/abstract.c:2313
    #10 0x00007f2081925c66 in callmethod (is_size_t=1, va=0x7ffee1ce4ef0, format=0x0,
        func=0x7f207eb1cb88) at /usr/src/debug/Python-3.6.8/Objects/abstract.c:2583
    

    @tirkarthi
    Copy link
    Member

    See also https://bugs.python.org/issue36533

    @ryanpetrello
    Copy link
    Mannequin Author

    ryanpetrello mannequin commented Jun 27, 2019

    Also, for what it's worth, I'm *not* able to reproduce this hang in Python 3.7.0.

    @ryanpetrello
    Copy link
    Mannequin Author

    ryanpetrello mannequin commented Jun 27, 2019

    Actually, I think I *am* seeing this on 3.7:

    (gdb) bt
    #0  0x00007f130d530adb in futex_abstimed_wait (cancel=true, private=\<optimized out\>, abstime=0x0, expected=0,
        futex=0x142c6d0) at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:43
    #1  do_futex_wait (sem=sem@entry=0x142c6d0, abstime=0x0) at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:223
    #2  0x00007f130d530b6f in \_\_new_sem_wait_slow (sem=0x142c6d0, abstime=0x0)
        at ../nptl/sysdeps/unix/sysv/linux/sem_waitcommon.c:292
    #3  0x00007f130d530c0b in \_\_new_sem_wait (sem=\<optimized out\>) at ../nptl/sysdeps/unix/sysv/linux/sem_wait.c:28
    #4  0x0000000000535baa in PyThread_acquire_lock_timed (lock=0x142c6d0, microseconds=microseconds@entry=-1,
        intr_flag=intr_flag@entry=0) at [Python/thread_pthread.h:340](https://github.com/python/cpython/blob/main/Python/thread_pthread.h#L340)
    #5  0x0000000000535c24 in PyThread_acquire_lock (lock=\<optimized out\>, waitflag=waitflag@entry=1)
        at [Python/thread_pthread.h:576](https://github.com/python/cpython/blob/main/Python/thread_pthread.h#L576)
    #6  0x000000000058949a in \_enter_buffered_busy (self=0x7f130ca0fbf8) at ./Modules/_io/bufferedio.c:282
    #7  buffered_flush (self=0x7f130ca0fbf8, args=\<optimized out\>) at ./Modules/_io/bufferedio.c:835
    #8  0x000000000043aae8 in \_PyMethodDef_RawFastCallDict (kwargs=\<optimized out\>, nargs=0, args=\<optimized out\>,
        self=0x7f130ca0fbf8, method=0x8eeda0 \<bufferedwriter_methods+320\>) at [Objects/call.c:482](https://github.com/python/cpython/blob/main/Objects/call.c#L482)
    #9  \_PyCFunction_FastCallDict (func=0x7f130aeb3438, args=\<optimized out\>, nargs=0, kwargs=\<optimized out\>)
        at [Objects/call.c:582](https://github.com/python/cpython/blob/main/Objects/call.c#L582)
    

    @ryanpetrello ryanpetrello mannequin added the 3.7 (EOL) end of life label Jun 27, 2019
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @jermenkoo
    Copy link

    jermenkoo commented Jun 5, 2023

    Has anyone been able to reproduce this in 3.8+ or above? We believe we're seeing an instance of this in Python 3.8.6 but hang.py works fine. :|

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life stdlib Python modules in the Lib dir
    Projects
    Status: No status
    Status: No status
    Development

    No branches or pull requests

    2 participants