classification
Title: Display Python backtrace on SIGSEGV, SIGFPE and fatal error
Type: enhancement Stage: patch review
Components: Interpreter Core Versions: Python 3.3, Python 3.2
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, belopolsky, davidfraser, dmalcolm, eric.araujo, haypo, joshbressers, pitrou, scott.dial, sjt, skrah
Priority: normal Keywords: patch

Created on 2010-05-31 19:01 by haypo, last changed 2011-03-07 15:41 by eric.araujo. This issue is now closed.

Files
File name Uploaded Description Edit
segfault_handler-11.patch haypo, 2010-12-23 01:52
Messages (54)
msg106801 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-05-31 19:01
Attached patch implements an handler for the signal SIGSEGV. It uses its own stack to be able to allocate memory on the stack (eg. call a function), even on stack overflow.

The patch requires sigaction() and sigaltstack() functions, but I didn't patched configure.in script. These functions are available on Linux, but should be available on other UNIX OSes.

segfault() signal handler supposes that the thread state is consistent (interp->frame chained list). It calls indirectly PyUnicode_EncodeUTF8() and so call PyBytes_FromStringAndSize() which allocates memory on the heap. It clears PyUnicode "defenc" attribute (the result of PyUnicode_EncodeUTF8()) to free directly the memory.

To test it, try some scripts in Lib/test/crashers/.

One example:
--------------------
$ ./python Lib/test/crashers/recursive_call.py
Fatal Python error: segmentation fault

Traceback (most recent call first):
  File "Lib/test/crashers/recursive_call.py", line 12, depth 15715
  File "Lib/test/crashers/recursive_call.py", line 12, depth 15714
  File "Lib/test/crashers/recursive_call.py", line 12, depth 15713
...
  File "Lib/test/crashers/recursive_call.py", line 12, depth 3
  File "Lib/test/crashers/recursive_call.py", line 12, depth 2
  File "Lib/test/crashers/recursive_call.py", line 9, depth 1
Segmentation fault
--------------------
msg106803 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-05-31 19:10
See also issue #3999: a similar patch to raise an exception on segfault. This patch was rejected because Python internal state may be corrupted, and we cannot guarantee that next instructions will be executed correctly.

This patch is safer because it just tries to display the backtrace and then exit.
msg108156 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-06-19 00:25
New version of the patch:
 - catch also SIGFPE
 - add segfault.o to Makefile.pre.in
 - use abort() to quit instead of _exit(1)
 - call DebugBreak() on Windows before the abort(), as done by Py_FatalError()

TODO: Patch configure to only enable segfault handler if sigaltstack() is available. The alternate stack is maybe not needed for the SIGFPE handler.
msg114289 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-08-18 22:39
dmalcolm asked if it would be possible to display the Python backtrace on Py_FatalError(). I don't know: Py_FatalError() is usually called when Python internals are broken. But well, segfaults do also usually occurs when something is broken :-)
msg118486 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-10-12 23:41
New version of the patch:
 - use more standard function names (_Py_initsegfault => _Py_InitSegfault)
 - use "#ifdef HAVE_SIGACTION" to support system without sigaction(): fallback to signal()
 - usage of the alternative stack is now optional (#ifdef HAVE_SIGALTSTACK)
 - don't call _PyUnicode_AsString() (encode unicode to utf-8) because it allocates memory and it does complex operation which may fail: implement a new simple function to write a unicode string to a FILE* object with ascii+backslashreplace
 - display the function name in the backtrace (but don't display function depth)
 - _Py_FiniSegfault(): restore the old signal handlers before freeing the alternative stack memory (and not the opposite)

I only tested the patch on Linux with a narrow build. It should be tested at least on a wide build and on Windows.

I tested my patch on all Lib/test/crashers/*.py: all segfaults are replaced by nice Python tracebacks.

If you debug a Python program in gdb, gdb stops at the first SIGSEGV (before calling the signal handler).

I didn't tested how the signal handler behaves if it raises a new fault (SIGFPE or SIGSEGV). It is supposes to stop immediatly.

In the worst case, my patch introduces an infinite loop and the program eats all CPU time (and consume maybe a lot of memory?) instead of exiting immediatly. Especially, segfault_handler() doesn't ensure that there is no loop in the frame linked list. A possible solution is to fix a limit of the maximum depth (eg. only display first 100 frames and finished with "...").
msg118487 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-10-12 23:44
Updated example:
----------------------------------------------
$ ./python Lib/test/crashers/recursive_call.py
Fatal Python error: segmentation fault

Traceback (most recent call first):
  File "Lib/test/crashers/recursive_call.py", line 12 in <lambda>
  File "Lib/test/crashers/recursive_call.py", line 12 in <lambda>
  File "Lib/test/crashers/recursive_call.py", line 12 in <lambda>
...
  File "Lib/test/crashers/recursive_call.py", line 12 in <lambda>
  File "Lib/test/crashers/recursive_call.py", line 12 in <lambda>
  File "Lib/test/crashers/recursive_call.py", line 15 in <module>
Segmentation fault
----------------------------------------------

SIGSEGV catched in gdb:
----------------------------------------------
$ gdb -args ./python Lib/test/crashers/recursive_call.py 
...
(gdb) run
Starting program: /home/SHARE/SVN/py3k/python Lib/test/crashers/recursive_call.py

Program received signal SIGSEGV, Segmentation fault.
0x080614e1 in _PyObject_DebugMalloc (nbytes=24) at Objects/obmalloc.c:1398
1398        return _PyObject_DebugMallocApi(_PYMALLOC_OBJ_ID, nbytes);
(gdb)
----------------------------------------------
msg118489 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-10-12 23:55
> It should be tested at least on a wide build and ...

Done: it works correctly for non-BMP characters in narrow and wide builds.

Eg. in wide build with U+10ffff character in the path:
---------------------
$ ./python bla.py 
Fatal Python error: segmentation fault

Traceback (most recent call first):
  File "/home/SHARE/SVN/py3k\U0010ffff/Lib/ctypes/__init__.py", line 486 in string_at
  File "bla.py", line 2 in <module>
Segmentation fault
Abandon
---------------------
msg118491 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-10-13 00:10
Patch version 4:
 - Add segfault.c to pythoncore.vcproj
 - Remove #error "nope" (I used it for debug purpose)
 - Don't include <unistd.h> on Windows

This version works on Windows.
msg118511 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-10-13 11:10
> dmalcolm asked if it would be possible to display the 
> Python backtrace on Py_FatalError()

It works :-) I fixed a bug in ceval.c (r85411) which was not directly related.

Patch version 5:
 - Display the Python backtrace on Py_FatalError() (if no error occurred)
 - Use _PyThreadState_GetFrame(tstate) instead of tstate->frame
 - Create _Py_DumpBacktrace() function
 - _Py_DumpBacktrace() doesn't display anything if there is no frame (tstate->frame == NULL), it's the case during Python initialization
 - replace fprintf() by calls to fputs/fputc in Py_FatalError(): fprintf() might raise a new error, I prefer simple functions (safer and faster)
msg118526 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-10-13 14:41
I posted the patch on Rietveld for a review (as asked by Antoine): http://codereview.appspot.com/2477041
msg118527 - (view) Author: Dave Malcolm (dmalcolm) (Python committer) Date: 2010-10-13 14:59
It looks like this doesn't yet have any test cases.

You probably should invoke a child python process that crashes and examine the output (perhaps running some/all of the examples in Lib/test/crashers ?); you may want to "steal" some of the wrapper code from Lib/test/test_gdb.py to do this.

Test ideas:
  - generate a segfault, verify that the output is sane
  - generate a FPE (likewise)
  - perhaps run all of the crashers, and ensure that something sane happens (e.g. stack overflow when the limit is set to something that's beyond what the OS/CPU can cope with).

Also, please test the interaction of this with the debugger (with gdb, at any rate): as I see it, this ought to gracefully get out of the way if you're running python under a debugger.  See Lib/test/test_gdb.py  for more examples of how to detect gdb, and invoke it in batch mode from a test suite.
msg118528 - (view) Author: Dave Malcolm (dmalcolm) (Python committer) Date: 2010-10-13 15:05
One other concern: many OSes (e.g. Linux distributions) implement some kind of system-wide crash-catching utility; for example in Fedora we have ABRT ( https://fedorahosted.org/abrt/wiki ).

I'm not sure yet exactly how these are implemented, but we'd want to avoid breaking them: segfaults of a system-provided /usr/bin/python (or /usr/bin/python3 ) should continue to be detected by such tools.
msg118529 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-10-13 15:25
By the way, don't you want to handle SIGILL and SIGBUS too?
msg118538 - (view) Author: Josh Bressers (joshbressers) Date: 2010-10-13 16:32
You would be wise to avoid using heap storage once you're in the crash handler. From a security standpoint, if something has managed to damage the heap (which is not uncommon in a crash), you should not attempt to allocate or free heap memory. On modern glibc systems, this isn't much of a concern as there are various memory protection mechanisms that make heap exploitation very very hard (you're just going to end up crashing the crash handler). I'm not sure about other operating systems that python supports though.
msg118539 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-10-13 16:35
> You would be wise to avoid using heap storage once you're in the crash
> handler. From a security standpoint, if something has managed to
> damage the heap (which is not uncommon in a crash), you should not
> attempt to allocate or free heap memory.

As far as I can tell, the signal handler in the patch doesn't call
malloc() or free(), neither directly nor indirectly.
msg118541 - (view) Author: Josh Bressers (joshbressers) Date: 2010-10-13 16:40
I am then confused by this in the initial comment:

> It calls indirectly PyUnicode_EncodeUTF8() and so call
> PyBytes_FromStringAndSize() which allocates memory on the heap.

I've not studied the patch though, so this may have changed.
msg118543 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-10-13 16:45
> I am then confused by this in the initial comment:
> 
> > It calls indirectly PyUnicode_EncodeUTF8() and so call
> > PyBytes_FromStringAndSize() which allocates memory on the heap.
> 
> I've not studied the patch though, so this may have changed.

The patch certainly has changed, yes. In the latest patch, printing of
unicode message is done one code point at a time without allocating any
intermediary area.
msg118586 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-10-13 21:43
Version 6:
 - don't use fputc(), fputs(), fprintf() or fflush() on stderr: use write() on file descriptor 2 (should be stderr)
 - write tests: add sigsegv(), sigfpe() and fatal_error() functions to the _testcapi module

I was too lazy to reimplement functions to convert an integer to a string in bases 10 and 16, so I used snprintf() on a small buffer allocated on the stack.
msg118587 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-10-13 21:50
TODO (maybe): Call the original signal handler to make tools like apport or ABRT able to catch segmentation faults.

> By the way, don't you want to handle SIGILL and SIGBUS too?

Maybe. SIGILL is a very rare exception. To test it, should I send a SIGILL signal to the process? Or should I try to execute arbitrary CPU instructions?

About SIGBUS: I don't know this signal. Is it used on Linux? If not, on which OS is it used?
msg118588 - (view) Author: Dave Malcolm (dmalcolm) (Python committer) Date: 2010-10-13 21:53
> About SIGBUS: I don't know this signal. Is it used on Linux? If not, on 
> which OS is it used?
Yes, IIRC typically you'll only see it on RISC CPUs that require instructions to be word-aligned; you typically see it if the program counter jumps to a broken address; the SIGBUS happens when the CPU tries to fetch the code.
msg119174 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-10-19 23:04
Patch version 7:
 - don't use snprintf() because it is not signal safe
 - catch also SIGBUS and SIGILL if available
 - Py_FatalError() uses fileno(stderr) instead of directly 2
 - Factorize code in segfault.c and test_segfault.py
msg119178 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-10-19 23:28
Patch version 8:
 - fix compiler warning on Windows
 - skip test_sigfpe() on Windows: it looks like the signal handler is not executed, the program exits directly
 - fix tests for Windows end of line (\r\n vs \n)
msg119179 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-10-19 23:48
> One other concern: many OSes (e.g. Linux distributions) implement
> some kind of system-wide crash-catching utility; ...

Even if it would be possible to call the original signal handler, I think that it would be better to just disable this feature if a (better?) program also watchs for segfaults.

We should provide different ways to enable and/or disable this feature:
 - command line option? (eg. see issue #10089)
 - environment variable?
 - add a function to the sys module (or another module) to enable or disable this feature? (eg. sys.getsegfaultenabled(), sys.setsegfaultenabled(False))

I don't think that a command line option and an environment variable is pratical for an OS distributor.

With a sys function, the OS distributor can call it in the site module.
msg119193 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-10-20 08:08
> I don't think that a command line option and an environment variable
> is pratical for an OS distributor.

Environment variables are probably the most practical for OS vendors,
since they can simply set them in /etc/profile.d (Mandriva does that
with PYTHONDONTWRITEBYTECODE :-/).

If it's an env variable, though, it should be clear that it's
CPython-specific, so I'm not sure how to call it.
CPYTHONNOSEGFAULTHANDLER?
msg123037 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-12-02 02:15
> If it's an env variable, though, it should be clear that it's
> CPython-specific, so I'm not sure how to call it.

Why do you think that only this variable is CPython-specific? CPython has an option called PYTHONDONTWRITEBYTECODE, but PyPy doesn't create .pyc files (and I suppose that nor IronPython nor Jython create such files).

I propose to call the variable PYTHONNOFAULTHANDLER. I removed "SEG" because it does catch other faults than segmentation faults.
msg124266 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-12-18 00:54
Version 9 of my patch:
 - Create PYTHONNOHANDLER environment variable: don't install the signal handler if the variable is set
 - Rename "Segfault" by "FaultHandler"
 - reverse_string() does nothing if len==0, even if it cannot occur when it's called by dump_decimal() and dump_hexadecimal().
 - _Py_DumpBacktrace() only writes the first 100 frames (arbitrary limit) to avoid unlimited loops

I don't know 100 frames is a good limit or not. Is it enough?

Summary of the patch:
 - Add a fault handler for SIGSEGV, SIGFPE, SIGBUS and SIGILL signals: display the Python backtrace and abort the process (call the debugger on Windows)
 - Add PYTHONNOHANDLER environment variable to disable the fault handler
 - The fault handler is implemented using only very simple instructions: it calls only signal-safe functions and it doesn't allocate memory (on the heap, only some bytes on the stack)
 - The fault handler uses an alternate stack to be able to display the backtrace (to be able to allocate memory on the stack and call functions)

The fault handler has no more the infinite loop issue (because the backtrace is truncated on frame loop).
msg124267 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-12-18 01:11
Oh, I'm tired...

> Summary of the patch:
> - ...abort the process (call the debugger on Windows)

(the debugger is only called if Python is compiled in debug mode)

> - Add PYTHONNOHANDLER environment variable ...

Oops, the correct name is PYTHONNOFAULTHANDLER (no fault handler).
msg124268 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-12-18 01:20
Why was sys.setsegfaultenabled() omitted?
It may be useful to disable the handler from a script
msg124364 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-12-19 21:59
> Why was sys.setsegfaultenabled() omitted?

Just because I forgot your message, sorry.
msg124373 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-12-20 00:12
Version 10 of my patch:
 - the fault handler restores the previous signal handler instead of calling (DebugBreak() and) abort(): the previous signal handler will be called later to keep the orignal behaviour
 - _testcapi.sigill() and _testcapi.sigbus() send SIGILL/SIGBUS signal in an unlimited loop instead of sending the signal once. So the signal is sent again after calling the Python signal handler, and the previous signal handler is called
 - (minor change) use 2 arrays (fault_handlers and fault_signals) for all signals, instead 2 variables (xxx_enabled, xxx_handler) for each signal

With this patch, the original signal handler is called and so the Python fault handler is compatible with OS fault handlers like grsecurity and Apport.
msg124381 - (view) Author: Scott Dial (scott.dial) Date: 2010-12-20 06:55
FYI, in v10,

+#define NFAULT_SIGNALS (sizeof(fault_signals) / sizeof(fault_signals[0]))
+static fault_handler_t fault_handlers[4];

, should use "NFAULT_SIGNALS" instead of "4".

However, this bit of code bothers me a lot:

+    const int fd = 2; /* should be fileno(stderr) */

To assume that fd=2 is the write place to be writing bytes is assuming quite a bit about the state of the application. It is not unusual at all to close 0,1,2 when writing daemons, which frees them up to be assigned to *anything*. For all you know, fd=2 currently is a network socket that you will be throwing gibberish at, or worse it could be a block device that you are writing gibberish on.

The closest discussion I could find on this subject was on the libstdc++ mailing-list with regard to their verbose termination code:

http://gcc.gnu.org/ml/gcc-patches/2004-02/msg02388.html

AFAICT, their conclusion was that the only reasonable solution was to write to the stderr FILE since it was the only thing that is guaranteed to make sense always (even though it may fail). Their situation is different in that they are handling a C++ exception, so they don't have to stick to async-safe functions, but absent that extra difficulty, I believe the reasoning is the same.

The analogous situation exists in Python, in that sys.stderr(*) represents where the application programmer expects "stderr" writing to go. I think you need to arrange to know what the fd number for that object is or this patch is unacceptable in the vein of "wrote garbage to my harddrive and destroyed my data" sort.

I'm not sure there is a safe way to know what the fileno for "sys.stderr" is because it can be anything, including an object whose fileno changes over time. However, I think it would be fair to support only built-in io types that are obviously safe, since you could cache the fileno() value at assignment to use in your fault handler.

(*) Or perhaps __stderr__ if stderr is None or an unsupported type?
msg124385 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-12-20 13:30
Le lundi 20 décembre 2010 07:55:08, vous avez écrit :
> +#define NFAULT_SIGNALS (sizeof(fault_signals) / sizeof(fault_signals[0]))
> +static fault_handler_t fault_handlers[4];
> 
> , should use "NFAULT_SIGNALS" instead of "4".

Ah yes, NFAULT_SIGNALS is a better choice than the maximum of possible signals 
(4).

> However, this bit of code bothers me a lot:
> 
> +    const int fd = 2; /* should be fileno(stderr) */
> 
> To assume that fd=2 is the write place to be writing bytes is assuming
> quite a bit about the state of the application. It is not unusual at all
> to close 0,1,2 when writing daemons, which frees them up to be assigned to
> *anything*.

Write into a closed file descriptor just does nothing. Closed file descriptors 
are not a problem.

> For all you know, fd=2 currently is a network socket that you
> will be throwing gibberish at, or worse it could be a block device that
> you are writing gibberish on.

The GNU libc has also a fault handler (source code: debug/segfault.c). It uses 
the file descriptor 2, except if SEGFAULT_OUTPUT_NAME environment variable is 
set (value stored in "fname" variable): open the specified file.

  /* This is the name of the file we are writing to.  If none is given
     or we cannot write to this file write to stderr.  */
  fd = 2;
  if (fname != NULL)
    {
      fd = open (fname, O_TRUNC | O_WRONLY | O_CREAT, 0666);
      if (fd == -1)
	fd = 2;
    }

The GNU libc installs handlers for SIGSEGV, SIGILL, SIGBUS, SIGSTKFLT, 
SIGABBRT and SIGFPE signals. The handler restores the default handler and re-
raise the signal:

  /* Pass on the signal (so that a core file is produced).  */
  sa.sa_handler = SIG_DFL;
  sigemptyset (&sa.sa_mask);
  sa.sa_flags = 0;
  sigaction (signal, &sa, NULL);
  raise (signal);

Where "raise(signal);" is something like kill(getpid(), signal).

It only uses an alternate stack if SEGFAULT_USE_ALTSTACK environment variable 
is set.

The handler can display registers, the backtrace and the memory mappings, 
depending on the compilation options and the operating system.

> The closest discussion I could find on this subject was on the libstdc++
> mailing-list with regard to their verbose termination code:
> 
> http://gcc.gnu.org/ml/gcc-patches/2004-02/msg02388.html
> 
> AFAICT, their conclusion was that the only reasonable solution was to write
> to the stderr FILE since it was the only thing that is guaranteed to make
> sense always (even though it may fail). Their situation is different in
> that they are handling a C++ exception, so they don't have to stick to
> async-safe functions, but absent that extra difficulty, I believe the
> reasoning is the same.

The FILE* type cannot be used because fprintf(), fputs(), ... are not signal-
safe.

> I'm not sure there is a safe way to know what the fileno for "sys.stderr"
> is because it can be anything, including an object whose fileno changes
> over time

The signal handler cannot access the Python object. Eg. if sys.stderr is a 
StringIO() (which has no file descriptor), it cannot be used.

> However, I think it would be fair to support only built-in io
> types that are obviously safe, since you could cache the fileno() value at
> assignment to use in your fault handler.

The problem is to detect that stderr file descriptor changes (eg. closed, 
duplicated, reopened, etc.). I don't think that it's possible to detect such 
changes (with a portable function).
msg124388 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-12-20 14:25
The fault handler is unable to retrieve the thread state if the GIL is released. I will try to fix that.
msg124397 - (view) Author: Scott Dial (scott.dial) Date: 2010-12-20 18:31
On 12/20/2010 8:30 AM, STINNER Victor wrote:
> Write into a closed file descriptor just does nothing. Closed file descriptors 
> are not a problem.

My issue not with a closed file descriptor, it is with an open file
descriptor that is not what you think it is.

>> For all you know, fd=2 currently is a network socket that you
>> will be throwing gibberish at, or worse it could be a block device that
>> you are writing gibberish on.
> 
> The GNU libc has also a fault handler (source code: debug/segfault.c). It uses 
> the file descriptor 2, except if SEGFAULT_OUTPUT_NAME environment variable is 
> set (value stored in "fname" variable): open the specified file.

The GNU libc segfault handler is *opt-in* via the SEGFAULT_SIGNALS
environment variable. I do not know of a system where SEGFAULT_SIGNALS
is a part of the default environment. I suspect the only time anyone
uses that variable and code is if they are using the "catchsegv" tool,
which comes with glibc. In any case, that developer should be aware of
the implication of closing fd 2.

> The FILE* type cannot be used because fprintf(), fputs(), ... are not signal-
> safe.

My point was that in C++, they have an object that an application
developer more readily associates with "stderr" than fd 2. That writing
to "stderr" leads to a write to fd 2 is incidental, because it's
possible that "stderr" does *not* lead to a write to fd 2 and that
writing to fd 2 would be a bad thing to do blindly. For instance, you
can call freopen(path, mode, stderr) and fd 2 will end up closed and
another fd will be the target of stderr. I don't believe POSIX
guarantees that open() will not re-use fd 2.

> The problem is to detect that stderr file descriptor changes (eg. closed, 
> duplicated, reopened, etc.). I don't think that it's possible to detect such 
> changes (with a portable function).

When I said that, I hadn't fully investigated the intricacies of the io
types. I was unaware that you could assign to "sys.stderr.buffer.raw"
and change out the target fd. I assumed a BufferedWriter could not have
the target stream changed out from beneath it. So, I don't have a
solution to the problem of knowing the correct (if any) file descriptor
to write to.

If the argument is made that fd 2 is the right place for most Python
applications, then there needs to be a programmatic way to disable this
feature and/or tell it where to write, so that programs that daemonize
can do the right thing.
msg124399 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-12-20 18:40
> > The problem is to detect that stderr file descriptor changes (eg. closed, 
> > duplicated, reopened, etc.). I don't think that it's possible to detect such 
> > changes (with a portable function).
> 
> When I said that, I hadn't fully investigated the intricacies of the io
> types. I was unaware that you could assign to "sys.stderr.buffer.raw"
> and change out the target fd. I assumed a BufferedWriter could not have
> the target stream changed out from beneath it.

AFAICT, this is not deliberate (not documented, and not tested for). It
should probably be fixed, actually, because there's no code that I know
of that ensures it does something meaningful.
msg124527 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-12-23 01:52
Version 11 of my patch:
 - Disable the fault handler (and displaying the backtrace on a fatal error) by default
 - The fault handle can be enabled by setting the PYTHONFAULTHANDLER environment variable or using "-X faulthandler" command line option (instead of being disabled by setting the PYTHONNOFAULTHANDLER env var)
 - Use PyGILState_GetThisThreadState() instead of _Py_atomic_load_relaxed(&_PyThreadState_Current) to get the state of the current thread, especially if the current thread doesn't hold the GIL
 - Add a protection against recursive calls to _Py_DisplayBacktrace() (a recursive call does nothing), even it is very unlikely
 - Don't repeat the exception message after the backtrace
 - Add a test on a segfault in a thread not holding the GIL

Disable the fault handler by default solves many issues reported on the python-dev mailing list:

 - Issues with embedded Python
 - Python 3.2 beta 2 is released
 - The fault handler supposes that the file descriptor 2 is the standard error output, but it may be a critical file or a network socket

Amaury asked for a sys.setsegfaultenabled() option: I think that the command line option and the environment variable are enough. The question is now how to enable the feature for a single run (reproduce a crash to try to get more information), not how to enable/disable it system-wide (because most developers agree that it should be disabled by default).

Should it be backported to Python 2.7 and 3.1?
msg124528 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-12-23 02:04
Stephen,

I wonder if you would have comments on this.  As far as I know emacs installs SEGV handlers similar to the ones proposed here.  How well do they work?  Does it really help users to produce meaningful bug reports?
msg124529 - (view) Author: Scott Dial (scott.dial) Date: 2010-12-23 02:27
On 12/22/2010 8:52 PM, STINNER Victor wrote:
> Amaury asked for a sys.setsegfaultenabled() option: I think that the command line option and the environment variable are enough.

I really think you should think of it as a choice the developer of an
application makes instead of a choice an application user makes. A
setsegfaultenabled() could just be another step of initializing the
application akin to setting up the logging module, for instance. In that
situation, a function call has a much lower barrier for use than a CLI
option or environment variable where you'd have to create a wrapper script.
msg124531 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-12-23 02:45
On Wed, Dec 22, 2010 at 9:27 PM, Scott Dial <report@bugs.python.org> wrote:
>
> Scott Dial <scott@scottdial.com> added the comment:
>
> On 12/22/2010 8:52 PM, STINNER Victor wrote:
> > Amaury asked for a sys.setsegfaultenabled() option: I think that the command line option and the environment variable are enough.
>
> I really think you should think of it as a choice the developer of an
> application makes instead of a choice an application user makes. A
> setsegfaultenabled() could just be another step of initializing the
> application akin to setting up the logging module, for instance. In that
> situation, a function call has a much lower barrier for use than a CLI
> option or environment variable where you'd have to create a wrapper script.

+1

I would actually prefer just sys.setsegfaultenabled() without a
controlling environment variable.  If necessary, the environment
variable can be checked in site.py and sys.setsegfaultenabled()
called.

As I suggested on python-dev, I also think this belongs to a separate
module rather than core or sys.  The relevant code is already
segregated in a file, so turning it into a module should not be
difficult.  The only function that probably must stay in core is
_Py_DumpBacktrace().  With say "segvhandler" module, site.py can
include something like this:

if sys.getenv('PYTHONSEGVHANDLER'):
     import segvhandler
     segvhandler.enable()

Does the latest patch address the GIL/multithreading issues?
msg124533 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-12-23 03:30
> Does the latest patch address the GIL/multithreading issues?

Yes.
msg124534 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-12-23 03:32
Le jeudi 23 décembre 2010 à 02:27 +0000, Scott Dial a écrit :
> Scott Dial <scott@scottdial.com> added the comment:
> 
> On 12/22/2010 8:52 PM, STINNER Victor wrote:
> > Amaury asked for a sys.setsegfaultenabled() option: I think that the command line option and the environment variable are enough.
> 
> I really think you should think of it as a choice the developer of an
> application makes instead of a choice an application user makes.

Why do you think so? Can you give me an use case of
sys.setsegfaultenabled()?

Extract of my email on python-dev:
msg124535 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-12-23 03:35
Le jeudi 23 décembre 2010 à 02:27 +0000, Scott Dial a écrit :
> Scott Dial <scott@scottdial.com> added the comment:
> 
> On 12/22/2010 8:52 PM, STINNER Victor wrote:
> > Amaury asked for a sys.setsegfaultenabled() option: I think that the command line option and the environment variable are enough.
> 
> I really think you should think of it as a choice the developer of an
> application makes instead of a choice an application user makes.

Why do you think so? Can you give me an use case of
sys.setsegfaultenabled()?

Extract of my email on python-dev:

  Use case: when a program crashs, the user reruns its application 
  with the fault handler enabled and tries to reproduce the crash.
  He/She can send the Python backtrace to the developer, or use
  it directly (if he/she understands it).

After the discussion on python-dev, I don't think that the fault handler
should be enabled by default, but only for a single run.
msg124536 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-12-23 03:38
Le jeudi 23 décembre 2010 à 02:45 +0000, Alexander Belopolsky a écrit :
> As I suggested on python-dev, I also think this belongs to a separate
> module rather than core or sys.

Why do you want to move it outside Python core? It is very dependent of
Python internals (GIL/threads, frames, etc.) and so I think that it's
better to keep it in the Python core.
msg124537 - (view) Author: Scott Dial (scott.dial) Date: 2010-12-23 03:45
On 12/22/2010 10:35 PM, STINNER Victor wrote:
> Why do you think so? Can you give me an use case of
> sys.setsegfaultenabled()?

To feed back your own argument on python-dev:

> How do you know that you application will crash? The idea is to give
> informations to the user when an application crashs: the user can use
> the backtrace or send it to the developer. Segmentation faults are
> usually not (easilly) reproductible :-( So even if you enable the
> fault handler, you may not be able to replay the crash. Or even
> worse, the fault may not occurs at all when you enable the fault
> handler... (Heisenbugs!)

> After the discussion on python-dev, I don't think that the fault handler
> should be enabled by default, but only for a single run.

I agree that it should be disabled by default because of the potential
do bad things if the application was not wrote with it in mind. But an
application developer would be in a much better position to decide what
the default should be for their application if they believe they will be
able to get more useful bug reports from their users by enabling it.

I thought that was your position, but if you no longer believe that,
then I will not push for it.
msg124541 - (view) Author: Stephen J. Turnbull (sjt) Date: 2010-12-23 09:42
Re: msg124528

Yes, XEmacs installs a signal handler on what are normally fatal errors.  (I don't know about GNU Emacs but they probably do too.)

The handler has two functions: to display a Lisp backtrace and to output a message explaining how to report bugs (even including a brief introduction to the "bt" command in gdb. ;-)

I personally have never found the Lisp backtrace useful, except that it can be used as a bug signature of sorts ("oh, I think I've seen this one before...").  However, I suspect this is mostly because in Emacs Lisp very often you don't have the name of the function in the backtrace, only a compiled code object.  So in many cases it's almost no help in localizing the fault.  Victor's patch does a lot better on this than XEmacs can, I suspect.

The bug reporting message, OTOH, has been useful to us for the reasons people give for wanting the handler installed by default.  We get more and better bug reports, often including C backtraces, from people who have never participated directly in XEmacs development before.  (It also once served the function of inhibiting people from sending us core files. Fortunately, I don't think that happens much any more. :-)  Occasionally a user will be all proud of themselves because "I never used gdb before," so I'm pretty sure that message is effective.

Quite frequently we see the handler itself crash if there was memory corruption, but certainly it gives useful output well over half the time.  So I want to back up Victor on those aspects.

Finally, although our experience has be very positive, qnote that XEmacs is not an embeddable library, nor is there provision in the mainline versions for embedding other interpreters in XEmacs.  So we've never had to worry about the issues that come with that.

For more technical details, you could ask Ben Wing <ben@xemacs.org> who put a lot of effort into the signal handling implementation, or Hrvoje Niksic (sorry, no address offhand) who posts on python-dev occasionally.  (I don't know if Hrvoje ever worked on the signal handlers, and he hasn't worked on XEmacs for years, but he was more familiar with internals than me then, and might very well still remember more than I ever knew. :-)  I don't think either will disagree with my general statements above, though.
msg124544 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-12-23 11:10
[Alexander]

if sys.getenv('PYTHONSEGVHANDLER'):
     import segvhandler
     segvhandler.enable()

+1

If this doesn't find support, I'd name sys.setsegfaultenabled()
sys.setsegvhandlerenabled() or sys.enable_segvhandler().
msg124545 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-12-23 11:29
Note: To avoid the signal-safe requirement, another solution is to use sigsetjmp()+siglongjmp().
msg124548 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-12-23 12:02
I tested the patch version 11 on Windows: all tests pass. But #include <unistd.h> should be skipped on Windows (Python/fault.c): I will add #ifdef MS_WINDOWS.
msg124549 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-12-23 12:10
> I tested the patch version 11 on Windows: all tests pass.

Oh, and I forgot to say that the Windows fault handler does catch the fault too (Windows opens a popup with a question like "Should the error be reported to Microsoft?").
msg124550 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-12-23 12:12
Tested on FreeBSD 8: all tests pass (all of the 4 signals are supported) and FreeBSD dumps a core file.
msg124551 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2010-12-23 12:31
STINNER Victor <report@bugs.python.org> wrote:
> Note: To avoid the signal-safe requirement, another solution is to use sigsetjmp()+siglongjmp().

FWIW, there is a caveat in the OpenBSD man page concerning the use of
siglongjmp():

http://www.openbsd.org/cgi-bin/man.cgi?query=sigsetjmp&apropos=0&sektion=0&manpath=OpenBSD+Current&arch=i386&format=htm

"Use of longjmp() or siglongjmp() from inside a signal handler is not as
 easy as it might seem.  Generally speaking, all possible code paths
 between the setjmp() and longjmp() must be signal race safe, as discussed
 in signal(3).  Furthermore, the code paths must not do resource
 management (such as open(2) or close(2)) without blocking the signal in
 question, or resources might be mismanaged.  Obviously this makes
 longjmp() much less useful than previously thought."
msg124552 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-12-23 12:33
Tested on Ubuntu 10.04: all tests pass and apport intercepts the fault. Apport ignores the faults because I am testing a Python executable compiled from SVN (py3k). Apport logs (/var/log/apport.log):
---
apport (pid 18148) Thu Dec 23 13:29:25 2010: called for pid 18147, signal 8
apport (pid 18148) Thu Dec 23 13:29:25 2010: executable: /home/haypo/prog/SVN/py3k/python (command line "./python -c import\ _testcapi;\ _testcapi.sigfpe()")
apport (pid 18148) Thu Dec 23 13:29:25 2010: executable does not belong to a package, ignoring
---
msg124583 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2010-12-24 02:17
Georg rejected this patch in Python 3.2:

"I did say I like the feature, but that was a) before beta 2
was released, now the next release is a release candidate, and b) this
thread showed that it is not at all obvious how the feature should look
like.  The fact that it isn't enabled by default also makes it seem less
attractive to me, but I understand the reasons why it shouldn't be on by
default.  Therefore, I'm not willing to make an exception here."

So I created a third party module on the Python cheese shop:
http://pypi.python.org/pypi/faulthandler/

The module works on Linux, FreeBSD and Windows with Python 2.6 through 3.2.
msg130260 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-03-07 15:41
See #11393 for discussion about integration in 3.3.
History
Date User Action Args
2011-03-07 15:41:02eric.araujosetnosy: + eric.araujo
messages: + msg130260
2010-12-24 02:17:17hayposetstatus: open -> closed

messages: + msg124583
resolution: rejected
nosy: amaury.forgeotdarc, davidfraser, belopolsky, scott.dial, pitrou, haypo, sjt, skrah, dmalcolm, joshbressers
2010-12-23 12:33:33hayposetnosy: amaury.forgeotdarc, davidfraser, belopolsky, scott.dial, pitrou, haypo, sjt, skrah, dmalcolm, joshbressers
messages: + msg124552
2010-12-23 12:31:03skrahsetnosy: amaury.forgeotdarc, davidfraser, belopolsky, scott.dial, pitrou, haypo, sjt, skrah, dmalcolm, joshbressers
messages: + msg124551
title: Display Python backtrace on SIGSEGV, SIGFPE and fatal error -> Display Python backtrace on SIGSEGV, SIGFPE and fatal error
2010-12-23 12:12:33hayposetnosy: amaury.forgeotdarc, davidfraser, belopolsky, scott.dial, pitrou, haypo, sjt, skrah, dmalcolm, joshbressers
messages: + msg124550
2010-12-23 12:10:58hayposetnosy: amaury.forgeotdarc, davidfraser, belopolsky, scott.dial, pitrou, haypo, sjt, skrah, dmalcolm, joshbressers
messages: + msg124549
2010-12-23 12:02:52hayposetnosy: amaury.forgeotdarc, davidfraser, belopolsky, scott.dial, pitrou, haypo, sjt, skrah, dmalcolm, joshbressers
messages: + msg124548
2010-12-23 11:29:19hayposetnosy: amaury.forgeotdarc, davidfraser, belopolsky, scott.dial, pitrou, haypo, sjt, skrah, dmalcolm, joshbressers
messages: + msg124545
2010-12-23 11:10:28skrahsetnosy: + skrah
messages: + msg124544
2010-12-23 09:42:39sjtsetnosy: amaury.forgeotdarc, davidfraser, belopolsky, scott.dial, pitrou, haypo, sjt, dmalcolm, joshbressers
messages: + msg124541
2010-12-23 03:45:29scott.dialsetnosy: amaury.forgeotdarc, davidfraser, belopolsky, scott.dial, pitrou, haypo, sjt, dmalcolm, joshbressers
messages: + msg124537
title: Display Python backtrace on SIGSEGV, SIGFPE and fatal error -> Display Python backtrace on SIGSEGV, SIGFPE and fatal error
2010-12-23 03:38:14hayposetnosy: amaury.forgeotdarc, davidfraser, belopolsky, scott.dial, pitrou, haypo, sjt, dmalcolm, joshbressers
messages: + msg124536
2010-12-23 03:35:56hayposetnosy: amaury.forgeotdarc, davidfraser, belopolsky, scott.dial, pitrou, haypo, sjt, dmalcolm, joshbressers
messages: + msg124535
2010-12-23 03:32:45hayposetnosy: amaury.forgeotdarc, davidfraser, belopolsky, scott.dial, pitrou, haypo, sjt, dmalcolm, joshbressers
messages: + msg124534
2010-12-23 03:30:35hayposetnosy: amaury.forgeotdarc, davidfraser, belopolsky, scott.dial, pitrou, haypo, sjt, dmalcolm, joshbressers
messages: + msg124533
2010-12-23 02:45:47belopolskysetnosy: amaury.forgeotdarc, davidfraser, belopolsky, scott.dial, pitrou, haypo, sjt, dmalcolm, joshbressers
messages: + msg124531
title: Display Python backtrace on SIGSEGV, SIGFPE and fatal error -> Display Python backtrace on SIGSEGV, SIGFPE and fatal error
2010-12-23 02:27:00scott.dialsetnosy: amaury.forgeotdarc, davidfraser, belopolsky, scott.dial, pitrou, haypo, sjt, dmalcolm, joshbressers
messages: + msg124529
title: Display Python backtrace on SIGSEGV, SIGFPE and fatal error -> Display Python backtrace on SIGSEGV, SIGFPE and fatal error
2010-12-23 02:04:02belopolskysetnosy: + sjt, belopolsky
messages: + msg124528
2010-12-23 01:52:53hayposetnosy: amaury.forgeotdarc, davidfraser, scott.dial, pitrou, haypo, dmalcolm, joshbressers
versions: + Python 3.2
2010-12-23 01:52:40hayposetfiles: - segfault_handler-10.patch
nosy: amaury.forgeotdarc, davidfraser, scott.dial, pitrou, haypo, dmalcolm, joshbressers
2010-12-23 01:52:38hayposetfiles: - segfault_handler-9.patch
nosy: amaury.forgeotdarc, davidfraser, scott.dial, pitrou, haypo, dmalcolm, joshbressers
2010-12-23 01:52:17hayposetfiles: + segfault_handler-11.patch
nosy: amaury.forgeotdarc, davidfraser, scott.dial, pitrou, haypo, dmalcolm, joshbressers
messages: + msg124527
2010-12-20 18:40:57pitrousetnosy: amaury.forgeotdarc, davidfraser, scott.dial, pitrou, haypo, dmalcolm, joshbressers
messages: + msg124399
title: Display Python backtrace on SIGSEGV, SIGFPE and fatal error -> Display Python backtrace on SIGSEGV, SIGFPE and fatal error
2010-12-20 18:31:53scott.dialsetnosy: amaury.forgeotdarc, davidfraser, scott.dial, pitrou, haypo, dmalcolm, joshbressers
messages: + msg124397
title: Display Python backtrace on SIGSEGV, SIGFPE and fatal error -> Display Python backtrace on SIGSEGV, SIGFPE and fatal error
2010-12-20 14:25:48hayposetnosy: amaury.forgeotdarc, davidfraser, scott.dial, pitrou, haypo, dmalcolm, joshbressers
messages: + msg124388
2010-12-20 13:30:53hayposetnosy: amaury.forgeotdarc, davidfraser, scott.dial, pitrou, haypo, dmalcolm, joshbressers
messages: + msg124385
2010-12-20 06:55:02scott.dialsetnosy: + scott.dial
messages: + msg124381
2010-12-20 00:12:32hayposetfiles: + segfault_handler-10.patch
nosy: amaury.forgeotdarc, davidfraser, pitrou, haypo, dmalcolm, joshbressers
messages: + msg124373
2010-12-19 21:59:47hayposetnosy: amaury.forgeotdarc, davidfraser, pitrou, haypo, dmalcolm, joshbressers
messages: + msg124364
2010-12-18 01:20:10amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg124268
2010-12-18 01:11:46hayposetnosy: davidfraser, pitrou, haypo, dmalcolm, joshbressers
messages: + msg124267
2010-12-18 00:55:07hayposetfiles: - segfault_handler-8.patch
nosy: davidfraser, pitrou, haypo, dmalcolm, joshbressers
2010-12-18 00:55:03hayposetfiles: - segfault_handler-7.patch
nosy: davidfraser, pitrou, haypo, dmalcolm, joshbressers
2010-12-18 00:54:30hayposetfiles: - segfault_handler-6.patch
nosy: davidfraser, pitrou, haypo, dmalcolm, joshbressers
2010-12-18 00:54:27hayposetfiles: - segfault_handler-5.patch
nosy: davidfraser, pitrou, haypo, dmalcolm, joshbressers
2010-12-18 00:54:10hayposetfiles: + segfault_handler-9.patch
nosy: davidfraser, pitrou, haypo, dmalcolm, joshbressers
messages: + msg124266
2010-12-14 03:14:02r.david.murraysetversions: + Python 3.3, - Python 3.2
2010-12-14 03:13:47r.david.murraysettype: enhancement
stage: patch review
2010-12-02 02:15:10hayposetmessages: + msg123037
2010-11-15 18:54:26davidfrasersetnosy: + davidfraser
2010-10-20 08:08:46pitrousetmessages: + msg119193
2010-10-19 23:48:29hayposetmessages: + msg119179
2010-10-19 23:28:11hayposetfiles: + segfault_handler-8.patch

messages: + msg119178
2010-10-19 23:04:24hayposetfiles: + segfault_handler-7.patch

messages: + msg119174
2010-10-13 21:53:24dmalcolmsetmessages: + msg118588
2010-10-13 21:50:31hayposetmessages: + msg118587
2010-10-13 21:43:51hayposetfiles: + segfault_handler-6.patch

messages: + msg118586
2010-10-13 16:45:39pitrousetmessages: + msg118543
2010-10-13 16:40:48joshbresserssetmessages: + msg118541
2010-10-13 16:35:47pitrousetmessages: + msg118539
2010-10-13 16:32:46joshbresserssetnosy: + joshbressers
messages: + msg118538
2010-10-13 15:25:29pitrousetmessages: + msg118529
2010-10-13 15:05:47dmalcolmsetmessages: + msg118528
2010-10-13 14:59:49dmalcolmsetmessages: + msg118527
2010-10-13 14:41:49hayposetmessages: + msg118526
2010-10-13 11:20:47hayposettitle: Segfault handler: display Python backtrace on segfault -> Display Python backtrace on SIGSEGV, SIGFPE and fatal error
2010-10-13 11:12:02hayposetfiles: - segfault_handler-4.patch
2010-10-13 11:11:57hayposetfiles: - segfault_handler-3.patch
2010-10-13 11:10:16hayposetfiles: + segfault_handler-5.patch

messages: + msg118511
2010-10-13 00:10:20hayposetfiles: + segfault_handler-4.patch

messages: + msg118491
2010-10-12 23:55:28hayposetmessages: + msg118489
2010-10-12 23:46:25hayposetfiles: - segfault_handler-2.patch
2010-10-12 23:46:20hayposetfiles: - segfault_handler.patch
2010-10-12 23:46:16hayposetnosy: + dmalcolm
2010-10-12 23:44:18hayposetmessages: + msg118487
2010-10-12 23:41:40hayposetfiles: + segfault_handler-3.patch

messages: + msg118486
2010-08-18 22:39:25hayposetmessages: + msg114289
2010-06-19 00:25:26hayposetfiles: + segfault_handler-2.patch

messages: + msg108156
2010-05-31 19:10:17hayposetnosy: + pitrou
messages: + msg106803
2010-05-31 19:01:23haypocreate