This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: test_faulthandler failures on FreeBSD 6
Type: Stage:
Components: Library (Lib) Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: loewis, neologix, python-dev, vstinner
Priority: normal Keywords:

Created on 2011-07-01 19:28 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (21)
msg139595 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-07-01 19:28
test_faulthandler fails on the FreeBSD 6 buildbot since my commit 024827a9db64990865d29f9d525694f51197e770:

Issue #12392: fix thread initialization on FreeBSD 6

On FreeBSD6, pthread_kill() doesn't work on the main thread before the creation
of the first thread. Create therefore a dummy thread (no-op) a startup to
initialize the pthread library.

Add also a test for this use case, test written by Charles-François Natali.
msg139598 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-07-01 19:48
Debug session with gdb:
-----------------------------------------------
[vstinner@buildbot-freebsd ~/cpython]$ gdb -args ./python x.py 
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd"...run

(gdb) run
Starting program: /usr/home/vstinner/cpython/python x.py
warning: Unable to get location for thread creation breakpoint: generic error
[New LWP 100106]
[New Thread 0x81f8000 (LWP 100089)]

Program received signal SIGBUS, Bus error.
[Switching to Thread 0x81f8000 (LWP 100083)]
stack_overflow (min_sp=0xb97fe7f0, max_sp=0xc5ffe7f0, depth=0xbfbfe7f0)
    at ./Modules/faulthandler.c:870
870	    buffer[0] = 1;
(gdb) signal SIGBUS
Continuing with signal SIGBUS.
[New LWP 100083]

Program received signal SIGBUS, Bus error.
[Switching to LWP 100083]
0x2824df2a in signalcontext () from /lib/libc.so.6
(gdb) handle SIGBUS nostop
Signal        Stop	Print	Pass to program	Description
SIGBUS        No	Yes	Yes		Bus error
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/home/vstinner/cpython/python x.py
warning: Unable to get location for thread creation breakpoint: generic error
[New LWP 100096]
[New Thread 0x81f8000 (LWP 100096)]

Program received signal SIGBUS, Bus error.
[New LWP 100096]

Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.
-----------------------------------------------

If I revert the commit, faulthandler is able to dump the traceback on a stack overflow. But if I create a thread (and wait until it exits), faulthandler signal handler (for SIGBUS) is no more called.
msg139600 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-07-01 19:58
By the way, the failures:

Re-running test 'test_faulthandler' in verbose mode
test_disable (test.test_faulthandler.FaultHandlerTests) ... ok
test_dump_traceback (test.test_faulthandler.FaultHandlerTests) ... ok
test_dump_traceback_file (test.test_faulthandler.FaultHandlerTests) ... ok
test_dump_traceback_threads (test.test_faulthandler.FaultHandlerTests) ... ok
test_dump_traceback_threads_file (test.test_faulthandler.FaultHandlerTests) ... ok
test_dump_tracebacks_later (test.test_faulthandler.FaultHandlerTests) ... ok
test_dump_tracebacks_later_cancel (test.test_faulthandler.FaultHandlerTests) ... ok
test_dump_tracebacks_later_file (test.test_faulthandler.FaultHandlerTests) ... ok
test_dump_tracebacks_later_repeat (test.test_faulthandler.FaultHandlerTests) ... ok
test_dump_tracebacks_later_twice (test.test_faulthandler.FaultHandlerTests) ... ok
test_enable_file (test.test_faulthandler.FaultHandlerTests) ... ok
test_enable_single_thread (test.test_faulthandler.FaultHandlerTests) ... ok
test_fatal_error (test.test_faulthandler.FaultHandlerTests) ... ok
test_gil_released (test.test_faulthandler.FaultHandlerTests) ... ok
test_is_enabled (test.test_faulthandler.FaultHandlerTests) ... ok
test_read_null (test.test_faulthandler.FaultHandlerTests) ... ok
test_register (test.test_faulthandler.FaultHandlerTests) ... ok
test_register_file (test.test_faulthandler.FaultHandlerTests) ... FAIL
test_register_threads (test.test_faulthandler.FaultHandlerTests) ... FAIL
test_sigabrt (test.test_faulthandler.FaultHandlerTests) ... ok
test_sigbus (test.test_faulthandler.FaultHandlerTests) ... ok
test_sigfpe (test.test_faulthandler.FaultHandlerTests) ... ok
test_sigill (test.test_faulthandler.FaultHandlerTests) ... ok
test_sigsegv (test.test_faulthandler.FaultHandlerTests) ... ok
test_stack_overflow (test.test_faulthandler.FaultHandlerTests) ... FAIL
test_unregister (test.test_faulthandler.FaultHandlerTests) ... ok

======================================================================
FAIL: test_register (test.test_faulthandler.FaultHandlerTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/home/db3l/buildarea/3.x.bolen-freebsd/build/Lib/test/test_faulthandler.py", line 498, in test_register
    self.check_register()
  File "/usr/home/db3l/buildarea/3.x.bolen-freebsd/build/Lib/test/test_faulthandler.py", line 489, in check_register
    self.assertRegex(trace, regex)
AssertionError: Regex didn't match: '^Traceback \\(most recent call first\\):\n  File "<string>", line 6 in func\n  File "<string>", line 17 in <module>$' not found in 'Traceback (most recent call first):'

======================================================================
FAIL: test_register_file (test.test_faulthandler.FaultHandlerTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/home/db3l/buildarea/3.x.bolen-freebsd/build/Lib/test/test_faulthandler.py", line 505, in test_register_file
    self.check_register(filename=filename)
  File "/usr/home/db3l/buildarea/3.x.bolen-freebsd/build/Lib/test/test_faulthandler.py", line 489, in check_register
    self.assertRegex(trace, regex)
AssertionError: Regex didn't match: '^Traceback \\(most recent call first\\):\n  File "<string>", line 6 in func\n  File "<string>", line 17 in <module>$' not found in 'Traceback (most recent call first):\n  File "<string>", line 19 in <module>'

======================================================================
FAIL: test_register_threads (test.test_faulthandler.FaultHandlerTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/home/db3l/buildarea/3.x.bolen-freebsd/build/Lib/test/test_faulthandler.py", line 508, in test_register_threads
    self.check_register(all_threads=True)
  File "/usr/home/db3l/buildarea/3.x.bolen-freebsd/build/Lib/test/test_faulthandler.py", line 489, in check_register
    self.assertRegex(trace, regex)
AssertionError: Regex didn't match: '^Current thread XXX:\n  File "<string>", line 6 in func\n  File "<string>", line 17 in <module>$' not found in 'Current thread XXX:'

======================================================================
FAIL: test_stack_overflow (test.test_faulthandler.FaultHandlerTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/home/db3l/buildarea/3.x.bolen-freebsd/build/Lib/test/test_faulthandler.py", line 185, in test_stack_overflow
    other_regex='unable to raise a stack overflow')
  File "/usr/home/db3l/buildarea/3.x.bolen-freebsd/build/Lib/test/test_faulthandler.py", line 104, in check_fatal_error
    self.assertRegex(output, regex)
AssertionError: Regex didn't match: '^Fatal Python error: (?:Segmentation fault|Bus error)\n\nCurrent\\ thread\\ XXX:\n  File "<string>", line 3 in <module>$|unable to raise a stack overflow' not found in ''

----------------------------------------------------------------------
msg139618 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-07-01 23:10
What happens if you create the thread after having registered the SIGBUS handler?
msg139620 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-07-01 23:49
On FreeBSD 6, os.kill(os.getpid(), signum) calls immediatly the signal handler before the creation of the first thread (which was the case by default before my commit 024827a9db64), whereas the signal handler is called "later" (when exactly?) after the creation of the first thread (default after my commit).

The traceback is sometimes "truncated" because tstate->frame is NULL. I suppose that the signal handler is called after the execution of the last instruction, after PyEval_EvalFrameEx() has set tstate->frame to f->f_back (which is NULL for the last frame).
msg139621 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-07-02 00:15
> What happens if you create the thread after having registered
> the SIGBUS handler?

I reverted my commit and created the thread after the call to faulthandler.register(). It changes nothing, the signal handler is still called "later" sometimes.

> On FreeBSD 6, os.kill(os.getpid(), signum) calls immediatly
> the signal handler before the creation of the first thread (...),
> whereas the signal handler is called "later" (when exactly?) after
> the creation of the first thread (default after my commit).

It looks like a kernel/libc bug. At least, it doesn't conform to POSIX.1-2001. Extract of the Linux manual page of the kill function (syscall):

"POSIX.1-2001  requires that if a process sends a signal to itself, and the sending thread does not have the signal blocked, and no other thread has it unblocked or is waiting for it in sigwait(3), at least  one  unblocked  signal must be delivered to the sending thread before the kill() returns."

I see two options:

 - revert my commit and fix #12392 (test_signal) differently
 - skip test_register, test_register_file, test_register_threads and test_stack_overflow can on freebsd6

I prefer to revert my commit because it introduced an unexpected behaviour on signal handling. It calls the signal handler later when the process sends a signal to itself, even if the application don't use threads.

The new fix for #12392 is to ensure that at least one thread was created. We can for example use the following code at the beginning of test_signal:

if sys.platform in ('freebsd5', 'freebsd6'):
  # On FreeBSD6, pthread_kill() doesn't work on the main thread
  # before the creation of the first thread
  import threading
  t = threading.Thread(target=lambda: None)
  t.start()
  t.join()

Then test_signal.test_pthread_kill_main_thread() should be skipped or patched for freebsd6.
msg139626 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-07-02 08:31
>> On FreeBSD 6, os.kill(os.getpid(), signum) calls immediatly
>> the signal handler before the creation of the first thread (...),
>> whereas the signal handler is called "later" (when exactly?) after
>> the creation of the first thread (default after my commit).
>
> It looks like a kernel/libc bug. At least, it doesn't conform to POSIX.1-2001. Extract of the Linux manual page of the kill function (syscall):
>

Yes, that's definitely a kernel/libc bug, like  #12392.

> I see two options:
>
>  - revert my commit and fix #12392 (test_signal) differently
>  - skip test_register, test_register_file, test_register_threads and test_stack_overflow can on freebsd6
>
> I prefer to revert my commit because it introduced an unexpected behaviour on signal handling. It calls the signal handler later when the process sends a signal to itself, even if the application don't use threads.
>

I'm also in favor of reverting this commit.

> The new fix for #12392 is to ensure that at least one thread was created. We can for example use the following code at the beginning of test_signal:
>
> if sys.platform in ('freebsd5', 'freebsd6'):
>  # On FreeBSD6, pthread_kill() doesn't work on the main thread
>  # before the creation of the first thread
>  import threading
>  t = threading.Thread(target=lambda: None)
>  t.start()
>  t.join()
>
> Then test_signal.test_pthread_kill_main_thread() should be skipped or patched for freebsd6.
>

Yes.

By the way, this also explains the test.test_signal.WakeupSignalTests
failures we had on FreeBSD 6.4.
Here's what I wrote in see http://bugs.python.org/issue8407#msg137382 :

"""
"""
======================================================================
FAIL: test_signum (test.test_signal.WakeupSignalTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/home/db3l/buildarea/3.x.bolen-freebsd/build/Lib/test/test_signal.py",
line 272, in test_signum
    self.check_signum(signal.SIGUSR1, signal.SIGALRM)
  File "/usr/home/db3l/buildarea/3.x.bolen-freebsd/build/Lib/test/test_signal.py",
line 238, in check_signum
    self.assertEqual(raised, signals)
AssertionError: Tuples differ: (14, 30) != (30, 14)

First differing element 0:
14
30

- (14, 30)
+ (30, 14)
"""

This means that the signals are not delivered in order.
Normally, pending signals are checked upon return to user-space, so
trip_signal should be called when the kill syscall returns, so signal
numbers should be written in order to the wakeup FD (and here it looks
like the lowest-numbered signal is delivered first).
You could try adding a short sleep before the second kill (or just
pass unordered=True to check_signum, but in that case we don't check
the correct ordering).
"""

Since signals are not delivered synchronously when the kill() syscall
returns, they are delivered later, and the order is not preserved.

The patch above (creating a dummy thread at the beginning of
test_signal) should fix this, so you might be able to revert
http://hg.python.org/cpython/rev/29e08a98281d .
msg139633 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-07-02 10:24
> I'm also in favor of reverting this commit.

Hum, the problem is that the Python test suite creates a lot of threads. Revert the patch doesn't change anything for the test suite. I mean that all tests relying on signal delivery should (must) be running in a new fresh process, especially if the test expects that the signal is received immediatly (as described in POSIX). If we don't use a subprocess, the tests will fail sometimes if at least one thread was created before.

I will try to write a patch which implement all requirements we listed in this issue. I just fear that it is a little bit overkill just to support an "old" (?) OS. But fixing a test for FreeBSD 6 improves usually the reliability on other OSes, especially when we replaced fork() by subprocess.
msg139634 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-07-02 10:32
> Revert the patch doesn't change anything for the test suite.

I know, but at least it doesn't change the default - be it broken - behaviour on FreeBSD 6.

> I just fear that it is a little bit overkill just to support an "old" (?) OS.

Yes. I mean, we can't expect Python signal machinery to work when the underlying OS is broken.
I personally think we should just skip all those failing tests on FreeBSD6.
msg139779 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-07-04 15:49
New changeset e07b331bf489 by Victor Stinner in branch '3.2':
Issue #12469: Run "wakeup" signal tests in subprocess to run the test in a
http://hg.python.org/cpython/rev/e07b331bf489

New changeset b9de5e55f798 by Victor Stinner in branch 'default':
(merge 3.2) Issue #12469: Run wakeup and pending signal tests in a subprocess
http://hg.python.org/cpython/rev/b9de5e55f798
msg139780 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-07-04 16:06
Commits e07b331bf489 and b9de5e55f798 run wakeup and pending signal tests in a subprocess to avoid border effects with threads. It should make these tests more reliable, not only on FreeBSD 6.

PendingSignalsTests now use os.kill() instead of signal.pthread_kill() in most tests (except test_pthread_kill and test_pthread_kill_main_test). We don't need pthread_kill() here anymore because we know that we have exactly one thread. I prefer to use the simple and common os.kill(), and only use pthread_kill in pthread_kill tests.

I'm not proud of that, but I added a workaround for the kernel bug (create a dummy thread, just to initialize the pthread library) in test_pthread_kill(). I don't know how to write a (simple and) reliable test on FreeBSD 6 without this workaround. But I prefer to use a workaround than skipping the test.

I don't think that it would be revelant to use the workaround in test_pthread_kill_main_test(). I chose to skip the test on FreeBSD 6, even if the test was written for this OS (issue #12392).
msg139781 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-07-04 16:06
New changeset 7eef821ab20d by Victor Stinner in branch 'default':
Issue #12469: replace assertions by explicit if+raise
http://hg.python.org/cpython/rev/7eef821ab20d
msg139782 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-07-04 16:14
TODO:
 - check_signum() of WakeupSignalsTests.check_wakeup(): remove set() to check the order of the received signals (revert 29e08a98281d)
 - run test_main(), test_itimer_virtual() and test_itimer_prof() in subprocesses
 - fix/skip test_faulthandler on FreeBSD 6
 - revert 024827a9db64 (thread initialization)
msg139783 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-07-04 16:17
> TODO: run test_main(), test_itimer_virtual() and test_itimer_prof()
> in subprocesses

Another TODO: check if test_sigtimedwait_poll() still fails after reverting 024827a9db64 (thread initialization).
msg139805 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-07-04 20:55
New changeset 34061f0d35ba by Victor Stinner in branch 'default':
Issue #12469: partial revert of 024827a9db64, freebsd6 thread initialization
http://hg.python.org/cpython/rev/34061f0d35ba
msg139814 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-07-04 22:34
> run test_main() ... in a subprocesses

I created a new issue for this task: issue #12495. I think that the testcase has to be rewritten.
msg139816 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-07-04 23:15
New changeset aad86a719fc6 by Victor Stinner in branch 'default':
Issue #12469: test_signal checks wakeup signals order, except on freebsd6
http://hg.python.org/cpython/rev/aad86a719fc6
msg139817 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-07-04 23:34
New changeset f12b8548b4aa by Victor Stinner in branch 'default':
Issue #12469: fix signal order check of test_signal
http://hg.python.org/cpython/rev/f12b8548b4aa
msg139831 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-07-05 06:59
> When signals are unblocked, pending signal ared delivered in the reverse order
> of their number (also on Linux, not only on FreeBSD 6).

I don't like this.
POSIX doesn't make any guarantee about signal delivery order, except
for real-time signals.
It might work on FreeBSD and Linux, but that's definitely not
documented, and might break with new kernel releases, or other
kernels.
msg139832 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-07-05 08:05
> > When signals are unblocked, pending signal ared delivered in the reverse order
> > of their number (also on Linux, not only on FreeBSD 6).
> 
> I don't like this.
> POSIX doesn't make any guarantee about signal delivery order, except
> for real-time signals.
> It might work on FreeBSD and Linux, but that's definitely not
> documented, and might break with new kernel releases, or other
> kernels.

It looks like it works like this on most OSes (Linux, Mac OS X, Solaris,
FreeBSD): I don't see any test_signal failure on 3.x buildbots. If we
have a failure, we can use set() again, but only for test_pending:
signal order should be reliable if signals are not blocked.
msg139842 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-07-05 09:56
I close this issue because test_signal pass on FreeBSD 6 buildbots (3.2 and 3.x). I will reopen it if test_faulthandler fails or if test_signal fails again, or maybe open new issues.
History
Date User Action Args
2022-04-11 14:57:19adminsetgithub: 56678
2011-07-05 09:56:12vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg139842
2011-07-05 08:05:14vstinnersetmessages: + msg139832
2011-07-05 06:59:53neologixsetmessages: + msg139831
2011-07-04 23:34:35python-devsetmessages: + msg139817
2011-07-04 23:15:36python-devsetmessages: + msg139816
2011-07-04 22:34:39vstinnersetmessages: + msg139814
2011-07-04 20:55:48python-devsetmessages: + msg139805
2011-07-04 16:17:25vstinnersetmessages: + msg139783
2011-07-04 16:14:25vstinnersetmessages: + msg139782
2011-07-04 16:06:53python-devsetmessages: + msg139781
2011-07-04 16:06:40vstinnersetmessages: + msg139780
2011-07-04 15:49:50python-devsetnosy: + python-dev
messages: + msg139779
2011-07-02 10:32:04neologixsetmessages: + msg139634
2011-07-02 10:24:37vstinnersetnosy: + loewis
messages: + msg139633
2011-07-02 08:31:22neologixsetmessages: + msg139626
2011-07-02 00:15:58vstinnersetmessages: + msg139621
2011-07-01 23:49:52vstinnersetmessages: + msg139620
2011-07-01 23:10:22neologixsetmessages: + msg139618
2011-07-01 19:58:40vstinnersetmessages: + msg139600
2011-07-01 19:48:47vstinnersetmessages: + msg139598
2011-07-01 19:29:16vstinnersetnosy: + neologix
2011-07-01 19:28:15vstinnercreate