classification
Title: Setting a signal handler gets multiprocessing.Pool stuck
Type: Stage:
Components: Library (Lib) Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: ionelmc, mapozyan, pitrou, wumpus
Priority: normal Keywords: patch

Created on 2019-09-19 21:41 by ionelmc, last changed 2020-07-19 17:38 by wumpus.

Files
File name Uploaded Description Edit
mp-bug-python2.8.py ionelmc, 2019-09-19 21:41
mp-signal-bug-python3.8.py mapozyan, 2020-05-06 10:47
pool.py.patch mapozyan, 2020-05-06 11:46
Messages (6)
msg352815 - (view) Author: Ionel Cristian Mărieș (ionelmc) Date: 2019-09-19 21:41
Running `python3.8 mp-bug-python2.8.py` usually gets stuck after a dozen iterations or so.

It appears that if I stop setting that signal handler it doesn't get stuck. Unfortunately I need it to perform critical cleanup.

This is what I got from gdb:

(gdb) py-bt
Traceback (most recent call first):
  File "/usr/lib/python3.8/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/usr/lib/python3.8/multiprocessing/queues.py", line 355, in get
    with self._rlock:
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 370, in worker
    `func` and (a, b) becomes func(a, b).
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.8/multiprocessing/process.py", line 569, in _bootstrap
  File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 75, in _launch
    code = process_obj._bootstrap(parent_sentinel=child_r)
  File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.8/multiprocessing/context.py", line 276, in _Popen
    return Popen(process_obj)
  File "/usr/lib/python3.8/multiprocessing/process.py", line 633, in start
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 838, in _repopulate_pool_static
    self._length = None
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 559, in _repopulate_pool
    outqueue.put(None)
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 212, in __init__
    self._repopulate_pool()
  File "/usr/lib/python3.8/multiprocessing/context.py", line 375, in Pool
  File "mp-bug-python2.8.py", line 21, in <module>



And without the macros:



#0  0x00007f51c79fb6d6 in futex_abstimed_wait_cancelable (private=128, abstime=0x0, expected=0, futex_word=0x7f51c7e29000) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x7f51c7e29000, abstime=0x0) at sem_waitcommon.c:111
#2  0x00007f51c79fb7c8 in __new_sem_wait_slow (sem=0x7f51c7e29000, abstime=0x0) at sem_waitcommon.c:181
#3  0x00007f51c79fb839 in __new_sem_wait (sem=<optimized out>) at sem_wait.c:42
#4  0x00007f51c58d51f7 in semlock_acquire (self=0x7f51c58a60b0, args=<optimized out>, kwds=<optimized out>) at ./Modules/_multiprocessing/semaphore.c:319
#5  0x000000000067d929 in method_vectorcall_VARARGS_KEYWORDS (func=<method_descriptor at remote 0x7f51c6349c70>, args=0x7f51c58a4eb8, nargsf=<optimized out>, kwnames=<optimized out>)
    at ../Objects/descrobject.c:332
#6  0x000000000042c5dc in _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>) at ../Include/cpython/abstract.h:127
#7  call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x22bd160) at ../Python/ceval.c:4987
#8  _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:3486
#9  0x0000000000425ed7 in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>) at ../Objects/call.c:283
#10 0x00000000006761e4 in _PyObject_Vectorcall (kwnames=0x0, nargsf=1, args=0x7ffe20bcc908, callable=<function at remote 0x7f51c6356820>) at ../Include/cpython/abstract.h:127
#11 method_vectorcall (method=<optimized out>, args=<optimized out>, nargsf=<optimized out>, kwnames=0x0) at ../Objects/classobject.c:67
#12 0x000000000042857e in _PyObject_Vectorcall (kwnames=0x0, nargsf=0, args=0x0, callable=<method at remote 0x7f51c665efc0>) at ../Include/cpython/abstract.h:127
#13 _PyObject_CallNoArg (func=<optimized out>) at ../Include/cpython/abstract.h:153
#14 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:3287
#15 0x0000000000425ed7 in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>) at ../Objects/call.c:283
#16 0x0000000000676245 in _PyObject_Vectorcall (kwnames=0x0, nargsf=1, args=0x23cea18, callable=<function at remote 0x7f51c63563a0>) at ../Include/cpython/abstract.h:127
#17 method_vectorcall (method=<optimized out>, args=0x23cea20, nargsf=<optimized out>, kwnames=0x0) at ../Objects/classobject.c:60
#18 0x000000000042cf2f in _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>) at ../Include/cpython/abstract.h:127
#19 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x22bd160) at ../Python/ceval.c:4987
#20 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:3500
#21 0x00000000004f9f44 in PyEval_EvalFrameEx (throwflag=0,
    f=Frame 0x23ce820, for file /usr/lib/python3.8/multiprocessing/pool.py, line 370, in worker (inqueue=<SimpleQueue(_reader=<Connection(_handle=3, _readable=True, _writable=False) at remote 0x7f51c589dcd0>, _writer=<Connection(_handle=None, _readable=False, _writable=True) at remote 0x7f51c589d940>, _rlock=<Lock(_semlock=<_multiprocessing.SemLock at remote 0x7f51c58a60b0>, acquire=<built-in method acquire of _multiprocessing.SemLock object at remote 0x7f51c58a60b0>, release=<built-in method release of _multiprocessing.SemLock object at remote 0x7f51c58a60b0>) at remote 0x7f51c589d0d0>, _poll=<method at remote 0x7f51c58ae2c0>, _wlock=<Lock(_semlock=<_multiprocessing.SemLock at remote 0x7f51c58a1030>, acquire=<built-in method acquire of _multiprocessing.SemLock object at remote 0x7f51c58a1030>, release=<built-in method release of _multiprocessing.SemLock object at remote 0x7f51c58a1030>) at remote 0x7f51c589deb0>) at remote 0x7f51c589d3a0>, outqueue=<SimpleQueue(_reader=<Connection(_handle=None, _readable=True, _wr...(truncated))
    at ../Python/ceval.c:741
#22 _PyEval_EvalCodeWithName (_co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, argcount=6, kwnames=0x0, kwargs=0x7f51c636e628, kwcount=0, kwstep=1,
    defs=0x7f51c65a4828, defcount=4, kwdefs=0x0, closure=0x0, name='worker', qualname='worker') at ../Python/ceval.c:4298
#23 0x000000000043e7a2 in _PyFunction_Vectorcall (func=func@entry=<function at remote 0x7f51c6352d30>, stack=<optimized out>, nargsf=nargsf@entry=6, kwnames=<optimized out>) at ../Objects/call.c:435
#24 0x000000000044143c in PyVectorcall_Call (callable=<function at remote 0x7f51c6352d30>, tuple=<optimized out>, kwargs=<optimized out>) at ../Objects/call.c:199
#25 0x0000000000428aec in do_call_core (kwdict={},
    callargs=(<SimpleQueue(_reader=<Connection(_handle=3, _readable=True, _writable=False) at remote 0x7f51c589dcd0>, _writer=<Connection(_handle=None, _readable=False, _writable=True) at remote 0x7f51c589d940>, _rlock=<Lock(_semlock=<_multiprocessing.SemLock at remote 0x7f51c58a60b0>, acquire=<built-in method acquire of _multiprocessing.SemLock object at remote 0x7f51c58a60b0>, release=<built-in method release of _multiprocessing.SemLock object at remote 0x7f51c58a60b0>) at remote 0x7f51c589d0d0>, _poll=<method at remote 0x7f51c58ae2c0>, _wlock=<Lock(_semlock=<_multiprocessing.SemLock at remote 0x7f51c58a1030>, acquire=<built-in method acquire of _multiprocessing.SemLock object at remote 0x7f51c58a1030>, release=<built-in method release of _multiprocessing.SemLock object at remote 0x7f51c58a1030>) at remote 0x7f51c589deb0>) at remote 0x7f51c589d3a0>, <SimpleQueue(_reader=<Connection(_handle=None, _readable=True, _writable=False) at remote 0x7f51c58a7670>, _writer=<Connection(_handle=6, _readable=False, _writable=True) at...(truncated),
    func=<function at remote 0x7f51c6352d30>, tstate=0x22bd160) at ../Python/ceval.c:5034
#26 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:3559
#27 0x0000000000425ed7 in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>) at ../Objects/call.c:283
#28 0x000000000042c5dc in _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>) at ../Include/cpython/abstract.h:127
#29 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x22bd160) at ../Python/ceval.c:4987
#30 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:3486
#31 0x00000000004f9f44 in PyEval_EvalFrameEx (throwflag=0,
    f=Frame 0x23e7f40, for file /usr/lib/python3.8/multiprocessing/process.py, line 569, in _bootstrap (self=<ForkProcess(_identity=(98,), _config={'authkey': <AuthenticationString at remote 0x7f51c66754c0>, 'semprefix': '/mp', 'daemon': True}, _parent_pid=191540, _parent_name='MainProcess', _popen=None, _closed=False, _target=<function at remote 0x7f51c6352d30>, _args=(<SimpleQueue(_reader=<Connection(_handle=3, _readable=True, _writable=False) at remote 0x7f51c589dcd0>, _writer=<Connection(_handle=None, _readable=False, _writable=True) at remote 0x7f51c589d940>, _rlock=<Lock(_semlock=<_multiprocessing.SemLock at remote 0x7f51c58a60b0>, acquire=<built-in method acquire of _multiprocessing.SemLock object at remote 0x7f51c58a60b0>, release=<built-in method release of _multiprocessing.SemLock object at remote 0x7f51c58a60b0>) at remote 0x7f51c589d0d0>, _poll=<method at remote 0x7f51c58ae2c0>, _wlock=<Lock(_semlock=<_multiprocessing.SemLock at remote 0x7f51c58a1030>, acquire=<built-in method acquire of _multiprocessin...(truncated))
    at ../Python/ceval.c:741
#32 _PyEval_EvalCodeWithName (_co=<optimized out>, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, argcount=1, kwnames=0x7f51c589d5c8, kwargs=0x7f51c66fcde8, kwcount=1, kwstep=1,
    defs=0x7f51c6672808, defcount=1, kwdefs=0x0, closure=0x0, name='_bootstrap', qualname='BaseProcess._bootstrap') at ../Python/ceval.c:4298
#33 0x000000000043e7a2 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/call.c:435
#34 0x0000000000676245 in _PyObject_Vectorcall (kwnames=('parent_sentinel',), nargsf=1, args=0x7f51c66fcde0, callable=<function at remote 0x7f51c6681160>) at ../Include/cpython/abstract.h:127
#35 method_vectorcall (method=<optimized out>, args=0x7f51c66fcde8, nargsf=<optimized out>, kwnames=('parent_sentinel',)) at ../Objects/classobject.c:60


This issue doesn't reproduce on python 3.7 or earlier.
msg353938 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2019-10-04 12:34
Several things here:

- you can perform critical cleanup with the atexit module; using a signal handler for that is extremely low-level and error-prone

- you can also try to switch to the "forkserver" method of multiprocessing, perhaps that will fix your issue
msg353947 - (view) Author: Ionel Cristian Mărieș (ionelmc) Date: 2019-10-04 14:53
atexit proved time and time again to be unreliable, not really an option. Not all programs shutdown nicely enough for atexit. 

Should I tell users "use forkserver on 3.8 because broken stuff"?
msg368233 - (view) Author: Michael (mapozyan) Date: 2020-05-06 10:38
Looks like a duplicate of my previous issue

https://bugs.python.org/issue29759

Unfortunately some frameworks like Gunicorn are extensively using signal handlers for their internal purposes.
msg368234 - (view) Author: Michael (mapozyan) Date: 2020-05-06 10:47
Reproducing issue with attached test (Python 3.8.2 on Ubuntu 16.04).
msg368237 - (view) Author: Michael (mapozyan) Date: 2020-05-06 11:46
Attached working patch.
Tested with signal handler set in Lib/test/_test_multiprocessing.py:

2329a2330,2331
> def signal_handler(signum, frame):
>     pass
2335a2338
>         cls.old_handler = signal.signal(signal.SIGTERM, signal_handler)
2342a2346
>         signal.signal(signal.SIGTERM, cls.old_handler)

All passing.
History
Date User Action Args
2020-07-19 17:38:12wumpussetnosy: + wumpus
2020-05-06 11:46:38mapozyansetfiles: + pool.py.patch
keywords: + patch
messages: + msg368237
2020-05-06 10:47:21mapozyansetfiles: + mp-signal-bug-python3.8.py

messages: + msg368234
2020-05-06 10:38:43mapozyansetnosy: + mapozyan
messages: + msg368233
2019-10-04 14:53:50ionelmcsetmessages: + msg353947
2019-10-04 12:34:51pitrousetnosy: + pitrou
messages: + msg353938
2019-10-01 22:27:41ionelmcsetcomponents: + Library (Lib)
2019-09-19 21:41:35ionelmccreate