Issue11768
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2011-04-05 00:19 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
signal_reentrant.patch | vstinner, 2011-04-18 10:40 | review | ||
signal_versionadded.patch | vstinner, 2011-04-18 10:57 |
Messages (17) | |||
---|---|---|---|
msg132993 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2011-04-05 00:19 | |
test_threadsignals hangs on "x86 Tiger 3.x" and "PPC Tiger 3.x": --------------------- [279/354] test_threadsignals Thread 0xa000d000: File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_threadsignals.py", line 46 in test_signals File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 387 in _executeTestPart File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 442 in run File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/case.py", line 494 in __call__ File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 105 in run File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 67 in __call__ File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 105 in run File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/unittest/suite.py", line 67 in __call__ File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1078 in run File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1166 in _run_suite File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/support.py", line 1192 in run_unittest File "/Users/db3l/buildarea/3.x.bolen-tiger/build/Lib/test/test_threadsignals.py", line 210 in test_main File "./Lib/test/regrtest.py", line 1032 in runtest_inner File "./Lib/test/regrtest.py", line 826 in runtest File "./Lib/test/regrtest.py", line 650 in main File "./Lib/test/regrtest.py", line 1607 in <module> make: *** [buildbottest] Error 1 program finished with exit code 2 elapsedTime=3365.320090 --------------------- http://www.python.org/dev/buildbot/all/builders/x86%20Tiger%203.x/builds/2289/steps/test/logs/stdio http://www.python.org/dev/buildbot/all/builders/PPC%20Tiger%203.x/builds/1700/steps/test/logs/stdio test_threadsignals hangs at: ---------------- class ThreadSignals(unittest.TestCase): def test_signals(self): signalled_all.acquire() self.spawnSignallingThread() signalled_all.acquire() <~~~~ here ---------------- self.spawnSignallingThread() calls: ---------------- def send_signals(): os.kill(process_pid, signal.SIGUSR1) os.kill(process_pid, signal.SIGUSR2) signalled_all.release() ---------------- The hang may be related to regrtest timeout: faulthandler had a bug, but the bug is supposed to be fixed (#11753, #11755). See also issue #11223. |
|||
msg132995 - (view) | Author: Roundup Robot (python-dev) ![]() |
Date: 2011-04-05 00:30 | |
New changeset d14eac872a46 by Victor Stinner in branch 'default': Issue #11768: add debug messages in test_threadsignals.test_signals http://hg.python.org/cpython/rev/d14eac872a46 |
|||
msg133257 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2011-04-07 20:59 | |
The failure occurs also on Leopard: ---------------------- [159/354] test_threadsignals Thread 0x00007fff7080f700: File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86-2/build/Lib/test/test_threadsignals.py", line 53 in test_signals File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86-2/build/Lib/unittest/case.py", line 387 in _executeTestPart File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86-2/build/Lib/unittest/case.py", line 442 in run File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86-2/build/Lib/unittest/case.py", line 494 in __call__ File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86-2/build/Lib/unittest/suite.py", line 105 in run File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86-2/build/Lib/unittest/suite.py", line 67 in __call__ File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86-2/build/Lib/unittest/suite.py", line 105 in run File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86-2/build/Lib/unittest/suite.py", line 67 in __call__ File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86-2/build/Lib/test/support.py", line 1078 in run File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86-2/build/Lib/test/support.py", line 1166 in _run_suite File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86-2/build/Lib/test/support.py", line 1192 in run_unittest File "/Users/pythonbuildbot/buildarea/3.x.hansen-osx-x86-2/build/Lib/test/test_threadsignals.py", line 218 in test_main File "./Lib/test/regrtest.py", line 1032 in runtest_inner File "./Lib/test/regrtest.py", line 826 in runtest File "./Lib/test/regrtest.py", line 650 in main File "./Lib/test/regrtest.py", line 1607 in <module> make: *** [buildbottest] Error 1 program finished with exit code 2 elapsedTime=3089.949146 ---------------------- http://www.python.org/dev/buildbot/all/builders/AMD64%20Leopard%203.x/builds/1132/steps/test/logs/stdio Even with d14eac872a46, this trace doesn't give more information :-/ Should I flush manually sys.stderr? |
|||
msg133267 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2011-04-07 22:14 | |
jseutter reproduced it after 112 and 49 runs on Snow Leopard using "regrtest.py -F -v --timeout=60 test_threadsignals" command: ----------- [ 49] test_threadsignals test_interrupted_timed_acquire (test.test_threadsignals.ThreadSignals) ... ok test_lock_acquire_interruption (test.test_threadsignals.ThreadSignals) ... ok test_lock_acquire_retries_on_intr (test.test_threadsignals.ThreadSignals) ... ok test_rlock_acquire_interruption (test.test_threadsignals.ThreadSignals) ... ok test_rlock_acquire_retries_on_intr (test.test_threadsignals.ThreadSignals) ... ok test_signals (test.test_threadsignals.ThreadSignals) ... test_signals: acquire lock (thread 140735082572960) test_signals: wait lock (thread 140735082572960) Thread 0x00007fff709aaca0: File "/Users/jseutter/code/python/cpython/Lib/test/test_threadsignals.py", line 55 in test_signals File "/Users/jseutter/code/python/cpython/Lib/unittest/case.py", line 387 in _executeTestPart File "/Users/jseutter/code/python/cpython/Lib/unittest/case.py", line 442 in run File "/Users/jseutter/code/python/cpython/Lib/unittest/case.py", line 494 in __call__ File "/Users/jseutter/code/python/cpython/Lib/unittest/suite.py", line 105 in run File "/Users/jseutter/code/python/cpython/Lib/unittest/suite.py", line 67 in __call__ File "/Users/jseutter/code/python/cpython/Lib/unittest/suite.py", line 105 in run File "/Users/jseutter/code/python/cpython/Lib/unittest/suite.py", line 67 in __call__ File "/Users/jseutter/code/python/cpython/Lib/unittest/runner.py", line 168 in run File "/Users/jseutter/code/python/cpython/Lib/test/support.py", line 1166 in _run_suite File "/Users/jseutter/code/python/cpython/Lib/test/support.py", line 1192 in run_unittest File "/Users/jseutter/code/python/cpython/Lib/test/test_threadsignals.py", line 221 in test_main File "Lib/test/regrtest.py", line 1032 in runtest_inner File "Lib/test/regrtest.py", line 826 in runtest File "Lib/test/regrtest.py", line 650 in main File "Lib/test/regrtest.py", line 1607 in <module> ----------- So for an unknown reason, send_signals() thread was not created or doesn't start. Example of the output of a success: ---------- test_signals: acquire lock (thread 140735082572960) test_signals: wait lock (thread 140735082572960) send_signals: enter (thread 4326952960) send_signals: raise SIGUSR1 send_signals: raise SIGUSR2 send_signals: release signalled_all send_signals: exit (thread 4326952960) test_signals: lock acquired ---------- |
|||
msg133268 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2011-04-07 22:19 | |
Oh, the traceback only contains one thread, so send_signals() thread was not created or failed very quickly (before its first print instruction). |
|||
msg133941 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2011-04-17 23:12 | |
Oh, I reproduced the bug. ------------------------ [3021] test_threadsignals test_signals (test.test_threadsignals.ThreadSignals) ... test_signals: acquire lock (thread -1610559488) test_signals: wait lock (thread -1610559488) send_signals: enter (thread 27159552) send_signals: raise SIGUSR1 send_signals: raise SIGUSR2 send_signals: release signalled_all send_signals: exit (thread 27159552) ^C ------------------------ There are 2 threads: faulthandler thread and the mainthread. Trace of the main thread: ------- (gdb) where #0 0xffff027a in ___spin_lock () at /System/Library/Frameworks/System.framework/PrivateHeaders/i386/cpu_capabilities.h:216 #1 0x90001433 in pthread_mutex_lock () #2 0x0011b34c in PyThread_acquire_lock_timed (lock=0x1124f10, microseconds=0, intr_flag=0) at Python/thread_pthread.h:450 #3 0x0011b73e in PyThread_acquire_lock (lock=0x1124f10, waitflag=0) at Python/thread_pthread.h:531 #4 0x000775b2 in Py_AddPendingCall (func=0x713dd <checksignals_witharg>, arg=0x0) at Python/ceval.c:509 #5 0x00071445 in signal_handler (sig_num=31) at ./Modules/signalmodule.c:189 #6 <signal handler called> #7 0x9010d494 in spin_unlock () #8 0x90001620 in pthread_mutex_lock () #9 0x0011b34c in PyThread_acquire_lock_timed (lock=0x1124f10, microseconds=0, intr_flag=0) at Python/thread_pthread.h:450 #10 0x0011b73e in PyThread_acquire_lock (lock=0x1124f10, waitflag=0) at Python/thread_pthread.h:531 #11 0x000775b2 in Py_AddPendingCall (func=0x713dd <checksignals_witharg>, arg=0x0) at Python/ceval.c:509 #12 0x00071445 in signal_handler (sig_num=30) at ./Modules/signalmodule.c:189 #13 <signal handler called> #14 0x900248c7 in semaphore_wait_signal_trap () #15 0x900288b4 in pthread_cond_wait () #16 0x0011b52c in PyThread_acquire_lock_timed (lock=0x1171180, microseconds=-1, intr_flag=1) at Python/thread_pthread.h:476 #17 0x00180746 in acquire_timed (lock=0x1171180, microseconds=-1) at ./Modules/_threadmodule.c:66 #18 0x001809aa in lock_PyThread_acquire_lock (self=0x14ff820, args=0x512038, kwds=0x0) at ./Modules/_threadmodule.c:133 #19 0x0005f58e in PyCFunction_Call (func=0x166c5f8, arg=0x512038, kw=0x0) at Objects/methodobject.c:84 #20 0x0008c1cd in call_function (pp_stack=0xbfff5f64, oparg=0) at Python/ceval.c:3859 #21 0x000869dd in PyEval_EvalFrameEx (f=0x1193798, throwflag=0) at Python/ceval.c:2657 ... ------- The main thread was waiting test_signals() lock (signalled_all) while it is was interrupted by a signal. The signal handler calls Py_AddPendingCall() which blocks on acquiring "pending_lock". |
|||
msg133942 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2011-04-17 23:27 | |
> The main thread was waiting test_signals() lock (signalled_all) > while it is was interrupted by a signal. The signal handler calls > Py_AddPendingCall() which blocks on acquiring "pending_lock". Oh, the main thread receives SIGUSR1: the signal handler calls Py_AddPendingCall() to process it later. But while calling Py_AddPendingCall(), it receives a SIGUSR2: the same signal handler is called, and it calls Py_AddPendingCall(). And we have a deadlock because we try to acquire the lock twice, or try to acquire the lock before the lock was released. |
|||
msg133943 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2011-04-17 23:59 | |
> The signal handler calls Py_AddPendingCall() which blocks on acquiring > "pending_lock". It blocks in taking the mutex, not on waiting for the condition variable. Otherwise it wouldn't block (microseconds = 0). I think the solution is to protect signal_handler() against reentrancy: diff --git a/Modules/signalmodule.c b/Modules/signalmodule.c --- a/Modules/signalmodule.c +++ b/Modules/signalmodule.c @@ -185,10 +185,12 @@ signal_handler(int sig_num) Handlers[sig_num].tripped = 1; /* Set is_tripped after setting .tripped, as it gets cleared in PyErr_CheckSignals() before .tripped. */ - is_tripped = 1; - Py_AddPendingCall(checksignals_witharg, NULL); - if (wakeup_fd != -1) - write(wakeup_fd, "\0", 1); + if (!is_tripped) { + is_tripped = 1; + Py_AddPendingCall(checksignals_witharg, NULL); + if (wakeup_fd != -1) + write(wakeup_fd, "\0", 1); + } } #ifndef HAVE_SIGACTION |
|||
msg133957 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2011-04-18 10:02 | |
It looks like pthread_mutex_lock() and pthread_mutex_unlock() are not reentrant on some platforms (in some implementations of the pthread API). Antoine: if I understand correctly your patch, if we have a pending signal, all next signals will be simply ignored. I think that this issue is not specific to Mac OS X: signal_handler() can be called twice at the same time in two different threads or in the same thread (reentrant call). pthread_sigmask() can be used to avoid reentrant call, but it has no effect on the second case: "signal_handler() called twice at the same time in two different threads". Note: SA_NODEFER flag of sigaction() has no effect on this issue because signal_handler() is called twice to handle two different signals (SIGUSR1 and SIGUSR2). |
|||
msg133958 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2011-04-18 10:17 | |
> Antoine: if I understand correctly your patch, if we have a pending > signal, all next signals will be simply ignored. I don't think so, please re-read. |
|||
msg133959 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2011-04-18 10:19 | |
> pthread_sigmask() can be used to avoid reentrant call, > but it has no effect on the second case: "signal_handler() > called twice at the same time in two different threads". Same problem if we use sa_mask field of sigaction() (e.g. use sigfillset(&context.sa_mask); in PyOS_setsig()). |
|||
msg133960 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2011-04-18 10:40 | |
> I don't think so, please re-read. Oh, I thought that Py_AddPendingCall() was used to store the pending signal. But no, it asks Python main loop to check which signal has been trigerred, and we can only do it once for all signals. Attached patch should fix this issue: - signal_handler() and PyErr_SetInterrupt() only call Py_AddPendingCall() on the first signal (if is_tripped is zero): it should fix the deadlock and it is a micro optimization (win-win !) - PyErr_SetInterrupt() uses the wake up API (write "\0" into wakeup_fd, API introduced by issue #1583) - Remove debug code in test_threadsignals Python 2.7 (and 2.6?) should also be patched: signal.set_wakeup_fd() and PySignal_SetWakeupFd() were introduced in 2.6, but there is no versionadded tag => signal_versionadded.patch |
|||
msg133961 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2011-04-18 10:52 | |
Le lundi 18 avril 2011 à 10:40 +0000, STINNER Victor a écrit : > Attached patch should fix this issue: > > - signal_handler() and PyErr_SetInterrupt() only call Py_AddPendingCall() on the first signal (if is_tripped is zero): it should fix the deadlock and it is a micro optimization (win-win !) Tested on OSX: yes, it does fix this issue. Should I apply the fix on 2.7, 3.1, 3.2 and 3.3? (or only 3.3) |
|||
msg133964 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2011-04-18 12:13 | |
Well, since it is a bugfix, you should apply it to all versions. |
|||
msg133970 - (view) | Author: Roundup Robot (python-dev) ![]() |
Date: 2011-04-18 14:30 | |
New changeset 319f7af9ee5e by Victor Stinner in branch '3.1': Issue #11768: The signal handler of the signal module only calls http://hg.python.org/cpython/rev/319f7af9ee5e New changeset 28ab8c6ad8f9 by Victor Stinner in branch '3.2': (Merge 3.1) Issue #11768: The signal handler of the signal module only calls http://hg.python.org/cpython/rev/28ab8c6ad8f9 New changeset fdcbc8453304 by Victor Stinner in branch 'default': (Merge 3.2) Issue #11768: The signal handler of the signal module only calls http://hg.python.org/cpython/rev/fdcbc8453304 |
|||
msg133971 - (view) | Author: Roundup Robot (python-dev) ![]() |
Date: 2011-04-18 14:34 | |
New changeset 7d355c2ff3d4 by Victor Stinner in branch '2.7': (Merge 3.1) Issue #11768: The signal handler of the signal module only calls http://hg.python.org/cpython/rev/7d355c2ff3d4 New changeset ebd080593351 by Victor Stinner in branch '2.7': Issue #11768: signal.set_wakeup_fd() and PySignal_SetWakeupFd() added in 2.6 http://hg.python.org/cpython/rev/ebd080593351 |
|||
msg133973 - (view) | Author: STINNER Victor (vstinner) * ![]() |
Date: 2011-04-18 14:48 | |
The deadlock is fixed. Reopen the issue if you still see test_threadsignals hang on a buildbot. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:15 | admin | set | github: 55977 |
2011-04-18 14:48:12 | vstinner | set | status: open -> closed resolution: fixed messages: + msg133973 |
2011-04-18 14:34:34 | python-dev | set | messages: + msg133971 |
2011-04-18 14:30:25 | python-dev | set | messages: + msg133970 |
2011-04-18 12:13:06 | pitrou | set | messages: + msg133964 |
2011-04-18 10:59:13 | vstinner | set | title: test_signals() of test_threadsignals failure on Mac OS X -> signal_handler() is not reentrant: deadlock in Py_AddPendingCall() |
2011-04-18 10:57:11 | vstinner | set | files: + signal_versionadded.patch |
2011-04-18 10:52:48 | vstinner | set | messages:
+ msg133961 title: signal_handler() is not reentrant: deadlock in Py_AddPendingCall() -> test_signals() of test_threadsignals failure on Mac OS X |
2011-04-18 10:42:02 | vstinner | set | title: test_signals() of test_threadsignals failure on Mac OS X -> signal_handler() is not reentrant: deadlock in Py_AddPendingCall() |
2011-04-18 10:40:30 | vstinner | set | files:
+ signal_reentrant.patch keywords: + patch messages: + msg133960 |
2011-04-18 10:19:00 | vstinner | set | messages: + msg133959 |
2011-04-18 10:17:14 | pitrou | set | messages: + msg133958 |
2011-04-18 10:02:10 | vstinner | set | messages: + msg133957 |
2011-04-17 23:59:23 | pitrou | set | nosy:
+ pitrou messages: + msg133943 |
2011-04-17 23:27:36 | vstinner | set | messages: + msg133942 |
2011-04-17 23:12:23 | vstinner | set | messages: + msg133941 |
2011-04-07 22:19:30 | vstinner | set | messages: + msg133268 |
2011-04-07 22:14:42 | vstinner | set | messages: + msg133267 |
2011-04-07 20:59:28 | vstinner | set | messages:
+ msg133257 title: test_signals() of test_threadsignals failure on Mac OS X Tiger -> test_signals() of test_threadsignals failure on Mac OS X |
2011-04-05 00:30:08 | python-dev | set | nosy:
+ python-dev messages: + msg132995 |
2011-04-05 00:19:17 | vstinner | create |