Issue 8863: Display Python backtrace on SIGSEGV, SIGFPE and fatal error

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/53109

classification

Title:	Display Python backtrace on SIGSEGV, SIGFPE and fatal error
Type:	enhancement	Stage:	patch review
Components:	Interpreter Core	Versions:	Python 3.2, Python 3.3

process

Status:	closed	Resolution:	rejected
Dependencies:		Superseder:
Assigned To:		Nosy List:	amaury.forgeotdarc, belopolsky, davidfraser, dmalcolm, eric.araujo, joshbressers, pitrou, scott.dial, sjt, skrah, vstinner
Priority:	normal	Keywords:	patch

Created on 2010-05-31 19:01 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
segfault_handler-11.patch	vstinner, 2010-12-23 01:52

Messages (54)
msg106801 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-05-31 19:01
Attached patch implements an handler for the signal SIGSEGV. It uses its own stack to be able to allocate memory on the stack (eg. call a function), even on stack overflow. The patch requires sigaction() and sigaltstack() functions, but I didn't patched configure.in script. These functions are available on Linux, but should be available on other UNIX OSes. segfault() signal handler supposes that the thread state is consistent (interp->frame chained list). It calls indirectly PyUnicode_EncodeUTF8() and so call PyBytes_FromStringAndSize() which allocates memory on the heap. It clears PyUnicode "defenc" attribute (the result of PyUnicode_EncodeUTF8()) to free directly the memory. To test it, try some scripts in Lib/test/crashers/. One example: -------------------- $ ./python Lib/test/crashers/recursive_call.py Fatal Python error: segmentation fault Traceback (most recent call first): File "Lib/test/crashers/recursive_call.py", line 12, depth 15715 File "Lib/test/crashers/recursive_call.py", line 12, depth 15714 File "Lib/test/crashers/recursive_call.py", line 12, depth 15713 ... File "Lib/test/crashers/recursive_call.py", line 12, depth 3 File "Lib/test/crashers/recursive_call.py", line 12, depth 2 File "Lib/test/crashers/recursive_call.py", line 9, depth 1 Segmentation fault --------------------
msg106803 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-05-31 19:10
See also issue #3999: a similar patch to raise an exception on segfault. This patch was rejected because Python internal state may be corrupted, and we cannot guarantee that next instructions will be executed correctly. This patch is safer because it just tries to display the backtrace and then exit.
msg108156 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-06-19 00:25
New version of the patch: - catch also SIGFPE - add segfault.o to Makefile.pre.in - use abort() to quit instead of _exit(1) - call DebugBreak() on Windows before the abort(), as done by Py_FatalError() TODO: Patch configure to only enable segfault handler if sigaltstack() is available. The alternate stack is maybe not needed for the SIGFPE handler.
msg114289 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-08-18 22:39
dmalcolm asked if it would be possible to display the Python backtrace on Py_FatalError(). I don't know: Py_FatalError() is usually called when Python internals are broken. But well, segfaults do also usually occurs when something is broken :-)
msg118486 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-10-12 23:41
New version of the patch: - use more standard function names (_Py_initsegfault => _Py_InitSegfault) - use "#ifdef HAVE_SIGACTION" to support system without sigaction(): fallback to signal() - usage of the alternative stack is now optional (#ifdef HAVE_SIGALTSTACK) - don't call _PyUnicode_AsString() (encode unicode to utf-8) because it allocates memory and it does complex operation which may fail: implement a new simple function to write a unicode string to a FILE* object with ascii+backslashreplace - display the function name in the backtrace (but don't display function depth) - _Py_FiniSegfault(): restore the old signal handlers before freeing the alternative stack memory (and not the opposite) I only tested the patch on Linux with a narrow build. It should be tested at least on a wide build and on Windows. I tested my patch on all Lib/test/crashers/*.py: all segfaults are replaced by nice Python tracebacks. If you debug a Python program in gdb, gdb stops at the first SIGSEGV (before calling the signal handler). I didn't tested how the signal handler behaves if it raises a new fault (SIGFPE or SIGSEGV). It is supposes to stop immediatly. In the worst case, my patch introduces an infinite loop and the program eats all CPU time (and consume maybe a lot of memory?) instead of exiting immediatly. Especially, segfault_handler() doesn't ensure that there is no loop in the frame linked list. A possible solution is to fix a limit of the maximum depth (eg. only display first 100 frames and finished with "...").
msg118487 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-10-12 23:44
Updated example: ---------------------------------------------- $ ./python Lib/test/crashers/recursive_call.py Fatal Python error: segmentation fault Traceback (most recent call first): File "Lib/test/crashers/recursive_call.py", line 12 in <lambda> File "Lib/test/crashers/recursive_call.py", line 12 in <lambda> File "Lib/test/crashers/recursive_call.py", line 12 in <lambda> ... File "Lib/test/crashers/recursive_call.py", line 12 in <lambda> File "Lib/test/crashers/recursive_call.py", line 12 in <lambda> File "Lib/test/crashers/recursive_call.py", line 15 in <module> Segmentation fault ---------------------------------------------- SIGSEGV catched in gdb: ---------------------------------------------- $ gdb -args ./python Lib/test/crashers/recursive_call.py ... (gdb) run Starting program: /home/SHARE/SVN/py3k/python Lib/test/crashers/recursive_call.py Program received signal SIGSEGV, Segmentation fault. 0x080614e1 in _PyObject_DebugMalloc (nbytes=24) at Objects/obmalloc.c:1398 1398 return _PyObject_DebugMallocApi(_PYMALLOC_OBJ_ID, nbytes); (gdb) ----------------------------------------------
msg118489 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-10-12 23:55
> It should be tested at least on a wide build and ... Done: it works correctly for non-BMP characters in narrow and wide builds. Eg. in wide build with U+10ffff character in the path: --------------------- $ ./python bla.py Fatal Python error: segmentation fault Traceback (most recent call first): File "/home/SHARE/SVN/py3k\U0010ffff/Lib/ctypes/__init__.py", line 486 in string_at File "bla.py", line 2 in <module> Segmentation fault Abandon ---------------------
msg118491 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-10-13 00:10
Patch version 4: - Add segfault.c to pythoncore.vcproj - Remove #error "nope" (I used it for debug purpose) - Don't include <unistd.h> on Windows This version works on Windows.
msg118511 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-10-13 11:10
> dmalcolm asked if it would be possible to display the > Python backtrace on Py_FatalError() It works :-) I fixed a bug in ceval.c (r85411) which was not directly related. Patch version 5: - Display the Python backtrace on Py_FatalError() (if no error occurred) - Use _PyThreadState_GetFrame(tstate) instead of tstate->frame - Create _Py_DumpBacktrace() function - _Py_DumpBacktrace() doesn't display anything if there is no frame (tstate->frame == NULL), it's the case during Python initialization - replace fprintf() by calls to fputs/fputc in Py_FatalError(): fprintf() might raise a new error, I prefer simple functions (safer and faster)
msg118526 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-10-13 14:41
I posted the patch on Rietveld for a review (as asked by Antoine): http://codereview.appspot.com/2477041
msg118527 - (view)	Author: Dave Malcolm (dmalcolm)	Date: 2010-10-13 14:59
It looks like this doesn't yet have any test cases. You probably should invoke a child python process that crashes and examine the output (perhaps running some/all of the examples in Lib/test/crashers ?); you may want to "steal" some of the wrapper code from Lib/test/test_gdb.py to do this. Test ideas: - generate a segfault, verify that the output is sane - generate a FPE (likewise) - perhaps run all of the crashers, and ensure that something sane happens (e.g. stack overflow when the limit is set to something that's beyond what the OS/CPU can cope with). Also, please test the interaction of this with the debugger (with gdb, at any rate): as I see it, this ought to gracefully get out of the way if you're running python under a debugger. See Lib/test/test_gdb.py for more examples of how to detect gdb, and invoke it in batch mode from a test suite.
msg118528 - (view)	Author: Dave Malcolm (dmalcolm)	Date: 2010-10-13 15:05
One other concern: many OSes (e.g. Linux distributions) implement some kind of system-wide crash-catching utility; for example in Fedora we have ABRT ( https://fedorahosted.org/abrt/wiki ). I'm not sure yet exactly how these are implemented, but we'd want to avoid breaking them: segfaults of a system-provided /usr/bin/python (or /usr/bin/python3 ) should continue to be detected by such tools.
msg118529 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-10-13 15:25
By the way, don't you want to handle SIGILL and SIGBUS too?
msg118538 - (view)	Author: Josh Bressers (joshbressers)	Date: 2010-10-13 16:32
You would be wise to avoid using heap storage once you're in the crash handler. From a security standpoint, if something has managed to damage the heap (which is not uncommon in a crash), you should not attempt to allocate or free heap memory. On modern glibc systems, this isn't much of a concern as there are various memory protection mechanisms that make heap exploitation very very hard (you're just going to end up crashing the crash handler). I'm not sure about other operating systems that python supports though.
msg118539 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-10-13 16:35
> You would be wise to avoid using heap storage once you're in the crash > handler. From a security standpoint, if something has managed to > damage the heap (which is not uncommon in a crash), you should not > attempt to allocate or free heap memory. As far as I can tell, the signal handler in the patch doesn't call malloc() or free(), neither directly nor indirectly.
msg118541 - (view)	Author: Josh Bressers (joshbressers)	Date: 2010-10-13 16:40
I am then confused by this in the initial comment: > It calls indirectly PyUnicode_EncodeUTF8() and so call > PyBytes_FromStringAndSize() which allocates memory on the heap. I've not studied the patch though, so this may have changed.
msg118543 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-10-13 16:45
> I am then confused by this in the initial comment: > > > It calls indirectly PyUnicode_EncodeUTF8() and so call > > PyBytes_FromStringAndSize() which allocates memory on the heap. > > I've not studied the patch though, so this may have changed. The patch certainly has changed, yes. In the latest patch, printing of unicode message is done one code point at a time without allocating any intermediary area.
msg118586 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-10-13 21:43
Version 6: - don't use fputc(), fputs(), fprintf() or fflush() on stderr: use write() on file descriptor 2 (should be stderr) - write tests: add sigsegv(), sigfpe() and fatal_error() functions to the _testcapi module I was too lazy to reimplement functions to convert an integer to a string in bases 10 and 16, so I used snprintf() on a small buffer allocated on the stack.
msg118587 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-10-13 21:50
TODO (maybe): Call the original signal handler to make tools like apport or ABRT able to catch segmentation faults. > By the way, don't you want to handle SIGILL and SIGBUS too? Maybe. SIGILL is a very rare exception. To test it, should I send a SIGILL signal to the process? Or should I try to execute arbitrary CPU instructions? About SIGBUS: I don't know this signal. Is it used on Linux? If not, on which OS is it used?
msg118588 - (view)	Author: Dave Malcolm (dmalcolm)	Date: 2010-10-13 21:53
> About SIGBUS: I don't know this signal. Is it used on Linux? If not, on > which OS is it used? Yes, IIRC typically you'll only see it on RISC CPUs that require instructions to be word-aligned; you typically see it if the program counter jumps to a broken address; the SIGBUS happens when the CPU tries to fetch the code.
msg119174 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-10-19 23:04
Patch version 7: - don't use snprintf() because it is not signal safe - catch also SIGBUS and SIGILL if available - Py_FatalError() uses fileno(stderr) instead of directly 2 - Factorize code in segfault.c and test_segfault.py
msg119178 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-10-19 23:28
Patch version 8: - fix compiler warning on Windows - skip test_sigfpe() on Windows: it looks like the signal handler is not executed, the program exits directly - fix tests for Windows end of line (\r\n vs \n)
msg119179 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-10-19 23:48
> One other concern: many OSes (e.g. Linux distributions) implement > some kind of system-wide crash-catching utility; ... Even if it would be possible to call the original signal handler, I think that it would be better to just disable this feature if a (better?) program also watchs for segfaults. We should provide different ways to enable and/or disable this feature: - command line option? (eg. see issue #10089) - environment variable? - add a function to the sys module (or another module) to enable or disable this feature? (eg. sys.getsegfaultenabled(), sys.setsegfaultenabled(False)) I don't think that a command line option and an environment variable is pratical for an OS distributor. With a sys function, the OS distributor can call it in the site module.
msg119193 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-10-20 08:08
> I don't think that a command line option and an environment variable > is pratical for an OS distributor. Environment variables are probably the most practical for OS vendors, since they can simply set them in /etc/profile.d (Mandriva does that with PYTHONDONTWRITEBYTECODE :-/). If it's an env variable, though, it should be clear that it's CPython-specific, so I'm not sure how to call it. CPYTHONNOSEGFAULTHANDLER?
msg123037 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-12-02 02:15
> If it's an env variable, though, it should be clear that it's > CPython-specific, so I'm not sure how to call it. Why do you think that only this variable is CPython-specific? CPython has an option called PYTHONDONTWRITEBYTECODE, but PyPy doesn't create .pyc files (and I suppose that nor IronPython nor Jython create such files). I propose to call the variable PYTHONNOFAULTHANDLER. I removed "SEG" because it does catch other faults than segmentation faults.
msg124266 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-12-18 00:54
Version 9 of my patch: - Create PYTHONNOHANDLER environment variable: don't install the signal handler if the variable is set - Rename "Segfault" by "FaultHandler" - reverse_string() does nothing if len==0, even if it cannot occur when it's called by dump_decimal() and dump_hexadecimal(). - _Py_DumpBacktrace() only writes the first 100 frames (arbitrary limit) to avoid unlimited loops I don't know 100 frames is a good limit or not. Is it enough? Summary of the patch: - Add a fault handler for SIGSEGV, SIGFPE, SIGBUS and SIGILL signals: display the Python backtrace and abort the process (call the debugger on Windows) - Add PYTHONNOHANDLER environment variable to disable the fault handler - The fault handler is implemented using only very simple instructions: it calls only signal-safe functions and it doesn't allocate memory (on the heap, only some bytes on the stack) - The fault handler uses an alternate stack to be able to display the backtrace (to be able to allocate memory on the stack and call functions) The fault handler has no more the infinite loop issue (because the backtrace is truncated on frame loop).
msg124267 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-12-18 01:11
Oh, I'm tired... > Summary of the patch: > - ...abort the process (call the debugger on Windows) (the debugger is only called if Python is compiled in debug mode) > - Add PYTHONNOHANDLER environment variable ... Oops, the correct name is PYTHONNOFAULTHANDLER (no fault handler).
msg124268 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2010-12-18 01:20
Why was sys.setsegfaultenabled() omitted? It may be useful to disable the handler from a script
msg124364 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-12-19 21:59
> Why was sys.setsegfaultenabled() omitted? Just because I forgot your message, sorry.
msg124373 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-12-20 00:12
Version 10 of my patch: - the fault handler restores the previous signal handler instead of calling (DebugBreak() and) abort(): the previous signal handler will be called later to keep the orignal behaviour - _testcapi.sigill() and _testcapi.sigbus() send SIGILL/SIGBUS signal in an unlimited loop instead of sending the signal once. So the signal is sent again after calling the Python signal handler, and the previous signal handler is called - (minor change) use 2 arrays (fault_handlers and fault_signals) for all signals, instead 2 variables (xxx_enabled, xxx_handler) for each signal With this patch, the original signal handler is called and so the Python fault handler is compatible with OS fault handlers like grsecurity and Apport.
msg124381 - (view)	Author: Scott Dial (scott.dial)	Date: 2010-12-20 06:55
FYI, in v10, +#define NFAULT_SIGNALS (sizeof(fault_signals) / sizeof(fault_signals[0])) +static fault_handler_t fault_handlers[4]; , should use "NFAULT_SIGNALS" instead of "4". However, this bit of code bothers me a lot: + const int fd = 2; /* should be fileno(stderr) / To assume that fd=2 is the write place to be writing bytes is assuming quite a bit about the state of the application. It is not unusual at all to close 0,1,2 when writing daemons, which frees them up to be assigned to anything. For all you know, fd=2 currently is a network socket that you will be throwing gibberish at, or worse it could be a block device that you are writing gibberish on. The closest discussion I could find on this subject was on the libstdc++ mailing-list with regard to their verbose termination code: http://gcc.gnu.org/ml/gcc-patches/2004-02/msg02388.html AFAICT, their conclusion was that the only reasonable solution was to write to the stderr FILE since it was the only thing that is guaranteed to make sense always (even though it may fail). Their situation is different in that they are handling a C++ exception, so they don't have to stick to async-safe functions, but absent that extra difficulty, I believe the reasoning is the same. The analogous situation exists in Python, in that sys.stderr() represents where the application programmer expects "stderr" writing to go. I think you need to arrange to know what the fd number for that object is or this patch is unacceptable in the vein of "wrote garbage to my harddrive and destroyed my data" sort. I'm not sure there is a safe way to know what the fileno for "sys.stderr" is because it can be anything, including an object whose fileno changes over time. However, I think it would be fair to support only built-in io types that are obviously safe, since you could cache the fileno() value at assignment to use in your fault handler. (*) Or perhaps __stderr__ if stderr is None or an unsupported type?
msg124385 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-12-20 13:30
Le lundi 20 décembre 2010 07:55:08, vous avez écrit : > +#define NFAULT_SIGNALS (sizeof(fault_signals) / sizeof(fault_signals[0])) > +static fault_handler_t fault_handlers[4]; > > , should use "NFAULT_SIGNALS" instead of "4". Ah yes, NFAULT_SIGNALS is a better choice than the maximum of possible signals (4). > However, this bit of code bothers me a lot: > > + const int fd = 2; /* should be fileno(stderr) / > > To assume that fd=2 is the write place to be writing bytes is assuming > quite a bit about the state of the application. It is not unusual at all > to close 0,1,2 when writing daemons, which frees them up to be assigned to > anything. Write into a closed file descriptor just does nothing. Closed file descriptors are not a problem. > For all you know, fd=2 currently is a network socket that you > will be throwing gibberish at, or worse it could be a block device that > you are writing gibberish on. The GNU libc has also a fault handler (source code: debug/segfault.c). It uses the file descriptor 2, except if SEGFAULT_OUTPUT_NAME environment variable is set (value stored in "fname" variable): open the specified file. / This is the name of the file we are writing to. If none is given or we cannot write to this file write to stderr. / fd = 2; if (fname != NULL) { fd = open (fname, O_TRUNC \| O_WRONLY \| O_CREAT, 0666); if (fd == -1) fd = 2; } The GNU libc installs handlers for SIGSEGV, SIGILL, SIGBUS, SIGSTKFLT, SIGABBRT and SIGFPE signals. The handler restores the default handler and re- raise the signal: / Pass on the signal (so that a core file is produced). / sa.sa_handler = SIG_DFL; sigemptyset (&sa.sa_mask); sa.sa_flags = 0; sigaction (signal, &sa, NULL); raise (signal); Where "raise(signal);" is something like kill(getpid(), signal). It only uses an alternate stack if SEGFAULT_USE_ALTSTACK environment variable is set. The handler can display registers, the backtrace and the memory mappings, depending on the compilation options and the operating system. > The closest discussion I could find on this subject was on the libstdc++ > mailing-list with regard to their verbose termination code: > > http://gcc.gnu.org/ml/gcc-patches/2004-02/msg02388.html > > AFAICT, their conclusion was that the only reasonable solution was to write > to the stderr FILE since it was the only thing that is guaranteed to make > sense always (even though it may fail). Their situation is different in > that they are handling a C++ exception, so they don't have to stick to > async-safe functions, but absent that extra difficulty, I believe the > reasoning is the same. The FILE type cannot be used because fprintf(), fputs(), ... are not signal- safe. > I'm not sure there is a safe way to know what the fileno for "sys.stderr" > is because it can be anything, including an object whose fileno changes > over time The signal handler cannot access the Python object. Eg. if sys.stderr is a StringIO() (which has no file descriptor), it cannot be used. > However, I think it would be fair to support only built-in io > types that are obviously safe, since you could cache the fileno() value at > assignment to use in your fault handler. The problem is to detect that stderr file descriptor changes (eg. closed, duplicated, reopened, etc.). I don't think that it's possible to detect such changes (with a portable function).
msg124388 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-12-20 14:25
The fault handler is unable to retrieve the thread state if the GIL is released. I will try to fix that.
msg124397 - (view)	Author: Scott Dial (scott.dial)	Date: 2010-12-20 18:31
On 12/20/2010 8:30 AM, STINNER Victor wrote: > Write into a closed file descriptor just does nothing. Closed file descriptors > are not a problem. My issue not with a closed file descriptor, it is with an open file descriptor that is not what you think it is. >> For all you know, fd=2 currently is a network socket that you >> will be throwing gibberish at, or worse it could be a block device that >> you are writing gibberish on. > > The GNU libc has also a fault handler (source code: debug/segfault.c). It uses > the file descriptor 2, except if SEGFAULT_OUTPUT_NAME environment variable is > set (value stored in "fname" variable): open the specified file. The GNU libc segfault handler is opt-in via the SEGFAULT_SIGNALS environment variable. I do not know of a system where SEGFAULT_SIGNALS is a part of the default environment. I suspect the only time anyone uses that variable and code is if they are using the "catchsegv" tool, which comes with glibc. In any case, that developer should be aware of the implication of closing fd 2. > The FILE* type cannot be used because fprintf(), fputs(), ... are not signal- > safe. My point was that in C++, they have an object that an application developer more readily associates with "stderr" than fd 2. That writing to "stderr" leads to a write to fd 2 is incidental, because it's possible that "stderr" does not lead to a write to fd 2 and that writing to fd 2 would be a bad thing to do blindly. For instance, you can call freopen(path, mode, stderr) and fd 2 will end up closed and another fd will be the target of stderr. I don't believe POSIX guarantees that open() will not re-use fd 2. > The problem is to detect that stderr file descriptor changes (eg. closed, > duplicated, reopened, etc.). I don't think that it's possible to detect such > changes (with a portable function). When I said that, I hadn't fully investigated the intricacies of the io types. I was unaware that you could assign to "sys.stderr.buffer.raw" and change out the target fd. I assumed a BufferedWriter could not have the target stream changed out from beneath it. So, I don't have a solution to the problem of knowing the correct (if any) file descriptor to write to. If the argument is made that fd 2 is the right place for most Python applications, then there needs to be a programmatic way to disable this feature and/or tell it where to write, so that programs that daemonize can do the right thing.
msg124399 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-12-20 18:40
> > The problem is to detect that stderr file descriptor changes (eg. closed, > > duplicated, reopened, etc.). I don't think that it's possible to detect such > > changes (with a portable function). > > When I said that, I hadn't fully investigated the intricacies of the io > types. I was unaware that you could assign to "sys.stderr.buffer.raw" > and change out the target fd. I assumed a BufferedWriter could not have > the target stream changed out from beneath it. AFAICT, this is not deliberate (not documented, and not tested for). It should probably be fixed, actually, because there's no code that I know of that ensures it does something meaningful.
msg124527 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-12-23 01:52
Version 11 of my patch: - Disable the fault handler (and displaying the backtrace on a fatal error) by default - The fault handle can be enabled by setting the PYTHONFAULTHANDLER environment variable or using "-X faulthandler" command line option (instead of being disabled by setting the PYTHONNOFAULTHANDLER env var) - Use PyGILState_GetThisThreadState() instead of _Py_atomic_load_relaxed(&_PyThreadState_Current) to get the state of the current thread, especially if the current thread doesn't hold the GIL - Add a protection against recursive calls to _Py_DisplayBacktrace() (a recursive call does nothing), even it is very unlikely - Don't repeat the exception message after the backtrace - Add a test on a segfault in a thread not holding the GIL Disable the fault handler by default solves many issues reported on the python-dev mailing list: - Issues with embedded Python - Python 3.2 beta 2 is released - The fault handler supposes that the file descriptor 2 is the standard error output, but it may be a critical file or a network socket Amaury asked for a sys.setsegfaultenabled() option: I think that the command line option and the environment variable are enough. The question is now how to enable the feature for a single run (reproduce a crash to try to get more information), not how to enable/disable it system-wide (because most developers agree that it should be disabled by default). Should it be backported to Python 2.7 and 3.1?
msg124528 - (view)	Author: Alexander Belopolsky (belopolsky) *	Date: 2010-12-23 02:04
Stephen, I wonder if you would have comments on this. As far as I know emacs installs SEGV handlers similar to the ones proposed here. How well do they work? Does it really help users to produce meaningful bug reports?
msg124529 - (view)	Author: Scott Dial (scott.dial)	Date: 2010-12-23 02:27
On 12/22/2010 8:52 PM, STINNER Victor wrote: > Amaury asked for a sys.setsegfaultenabled() option: I think that the command line option and the environment variable are enough. I really think you should think of it as a choice the developer of an application makes instead of a choice an application user makes. A setsegfaultenabled() could just be another step of initializing the application akin to setting up the logging module, for instance. In that situation, a function call has a much lower barrier for use than a CLI option or environment variable where you'd have to create a wrapper script.
msg124531 - (view)	Author: Alexander Belopolsky (belopolsky) *	Date: 2010-12-23 02:45
On Wed, Dec 22, 2010 at 9:27 PM, Scott Dial <report@bugs.python.org> wrote: > > Scott Dial <scott@scottdial.com> added the comment: > > On 12/22/2010 8:52 PM, STINNER Victor wrote: > > Amaury asked for a sys.setsegfaultenabled() option: I think that the command line option and the environment variable are enough. > > I really think you should think of it as a choice the developer of an > application makes instead of a choice an application user makes. A > setsegfaultenabled() could just be another step of initializing the > application akin to setting up the logging module, for instance. In that > situation, a function call has a much lower barrier for use than a CLI > option or environment variable where you'd have to create a wrapper script. +1 I would actually prefer just sys.setsegfaultenabled() without a controlling environment variable. If necessary, the environment variable can be checked in site.py and sys.setsegfaultenabled() called. As I suggested on python-dev, I also think this belongs to a separate module rather than core or sys. The relevant code is already segregated in a file, so turning it into a module should not be difficult. The only function that probably must stay in core is _Py_DumpBacktrace(). With say "segvhandler" module, site.py can include something like this: if sys.getenv('PYTHONSEGVHANDLER'): import segvhandler segvhandler.enable() Does the latest patch address the GIL/multithreading issues?
msg124533 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-12-23 03:30
> Does the latest patch address the GIL/multithreading issues? Yes.
msg124534 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-12-23 03:32
Le jeudi 23 décembre 2010 à 02:27 +0000, Scott Dial a écrit : > Scott Dial <scott@scottdial.com> added the comment: > > On 12/22/2010 8:52 PM, STINNER Victor wrote: > > Amaury asked for a sys.setsegfaultenabled() option: I think that the command line option and the environment variable are enough. > > I really think you should think of it as a choice the developer of an > application makes instead of a choice an application user makes. Why do you think so? Can you give me an use case of sys.setsegfaultenabled()? Extract of my email on python-dev:
msg124535 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-12-23 03:35
Le jeudi 23 décembre 2010 à 02:27 +0000, Scott Dial a écrit : > Scott Dial <scott@scottdial.com> added the comment: > > On 12/22/2010 8:52 PM, STINNER Victor wrote: > > Amaury asked for a sys.setsegfaultenabled() option: I think that the command line option and the environment variable are enough. > > I really think you should think of it as a choice the developer of an > application makes instead of a choice an application user makes. Why do you think so? Can you give me an use case of sys.setsegfaultenabled()? Extract of my email on python-dev: Use case: when a program crashs, the user reruns its application with the fault handler enabled and tries to reproduce the crash. He/She can send the Python backtrace to the developer, or use it directly (if he/she understands it). After the discussion on python-dev, I don't think that the fault handler should be enabled by default, but only for a single run.
msg124536 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-12-23 03:38
Le jeudi 23 décembre 2010 à 02:45 +0000, Alexander Belopolsky a écrit : > As I suggested on python-dev, I also think this belongs to a separate > module rather than core or sys. Why do you want to move it outside Python core? It is very dependent of Python internals (GIL/threads, frames, etc.) and so I think that it's better to keep it in the Python core.
msg124537 - (view)	Author: Scott Dial (scott.dial)	Date: 2010-12-23 03:45
On 12/22/2010 10:35 PM, STINNER Victor wrote: > Why do you think so? Can you give me an use case of > sys.setsegfaultenabled()? To feed back your own argument on python-dev: > How do you know that you application will crash? The idea is to give > informations to the user when an application crashs: the user can use > the backtrace or send it to the developer. Segmentation faults are > usually not (easilly) reproductible :-( So even if you enable the > fault handler, you may not be able to replay the crash. Or even > worse, the fault may not occurs at all when you enable the fault > handler... (Heisenbugs!) > After the discussion on python-dev, I don't think that the fault handler > should be enabled by default, but only for a single run. I agree that it should be disabled by default because of the potential do bad things if the application was not wrote with it in mind. But an application developer would be in a much better position to decide what the default should be for their application if they believe they will be able to get more useful bug reports from their users by enabling it. I thought that was your position, but if you no longer believe that, then I will not push for it.
msg124541 - (view)	Author: Stephen J. Turnbull (sjt) *	Date: 2010-12-23 09:42
Re: msg124528 Yes, XEmacs installs a signal handler on what are normally fatal errors. (I don't know about GNU Emacs but they probably do too.) The handler has two functions: to display a Lisp backtrace and to output a message explaining how to report bugs (even including a brief introduction to the "bt" command in gdb. ;-) I personally have never found the Lisp backtrace useful, except that it can be used as a bug signature of sorts ("oh, I think I've seen this one before..."). However, I suspect this is mostly because in Emacs Lisp very often you don't have the name of the function in the backtrace, only a compiled code object. So in many cases it's almost no help in localizing the fault. Victor's patch does a lot better on this than XEmacs can, I suspect. The bug reporting message, OTOH, has been useful to us for the reasons people give for wanting the handler installed by default. We get more and better bug reports, often including C backtraces, from people who have never participated directly in XEmacs development before. (It also once served the function of inhibiting people from sending us core files. Fortunately, I don't think that happens much any more. :-) Occasionally a user will be all proud of themselves because "I never used gdb before," so I'm pretty sure that message is effective. Quite frequently we see the handler itself crash if there was memory corruption, but certainly it gives useful output well over half the time. So I want to back up Victor on those aspects. Finally, although our experience has be very positive, qnote that XEmacs is not an embeddable library, nor is there provision in the mainline versions for embedding other interpreters in XEmacs. So we've never had to worry about the issues that come with that. For more technical details, you could ask Ben Wing <ben@xemacs.org> who put a lot of effort into the signal handling implementation, or Hrvoje Niksic (sorry, no address offhand) who posts on python-dev occasionally. (I don't know if Hrvoje ever worked on the signal handlers, and he hasn't worked on XEmacs for years, but he was more familiar with internals than me then, and might very well still remember more than I ever knew. :-) I don't think either will disagree with my general statements above, though.
msg124544 - (view)	Author: Stefan Krah (skrah) *	Date: 2010-12-23 11:10
[Alexander] if sys.getenv('PYTHONSEGVHANDLER'): import segvhandler segvhandler.enable() +1 If this doesn't find support, I'd name sys.setsegfaultenabled() sys.setsegvhandlerenabled() or sys.enable_segvhandler().
msg124545 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-12-23 11:29
Note: To avoid the signal-safe requirement, another solution is to use sigsetjmp()+siglongjmp().
msg124548 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-12-23 12:02
I tested the patch version 11 on Windows: all tests pass. But #include <unistd.h> should be skipped on Windows (Python/fault.c): I will add #ifdef MS_WINDOWS.
msg124549 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-12-23 12:10
> I tested the patch version 11 on Windows: all tests pass. Oh, and I forgot to say that the Windows fault handler does catch the fault too (Windows opens a popup with a question like "Should the error be reported to Microsoft?").
msg124550 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-12-23 12:12
Tested on FreeBSD 8: all tests pass (all of the 4 signals are supported) and FreeBSD dumps a core file.
msg124551 - (view)	Author: Stefan Krah (skrah) *	Date: 2010-12-23 12:31
STINNER Victor <report@bugs.python.org> wrote: > Note: To avoid the signal-safe requirement, another solution is to use sigsetjmp()+siglongjmp(). FWIW, there is a caveat in the OpenBSD man page concerning the use of siglongjmp(): http://www.openbsd.org/cgi-bin/man.cgi?query=sigsetjmp&apropos=0&sektion=0&manpath=OpenBSD+Current&arch=i386&format=htm "Use of longjmp() or siglongjmp() from inside a signal handler is not as easy as it might seem. Generally speaking, all possible code paths between the setjmp() and longjmp() must be signal race safe, as discussed in signal(3). Furthermore, the code paths must not do resource management (such as open(2) or close(2)) without blocking the signal in question, or resources might be mismanaged. Obviously this makes longjmp() much less useful than previously thought."
msg124552 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-12-23 12:33
Tested on Ubuntu 10.04: all tests pass and apport intercepts the fault. Apport ignores the faults because I am testing a Python executable compiled from SVN (py3k). Apport logs (/var/log/apport.log): --- apport (pid 18148) Thu Dec 23 13:29:25 2010: called for pid 18147, signal 8 apport (pid 18148) Thu Dec 23 13:29:25 2010: executable: /home/haypo/prog/SVN/py3k/python (command line "./python -c import\ _testcapi;\ _testcapi.sigfpe()") apport (pid 18148) Thu Dec 23 13:29:25 2010: executable does not belong to a package, ignoring ---
msg124583 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-12-24 02:17
Georg rejected this patch in Python 3.2: "I did say I like the feature, but that was a) before beta 2 was released, now the next release is a release candidate, and b) this thread showed that it is not at all obvious how the feature should look like. The fact that it isn't enabled by default also makes it seem less attractive to me, but I understand the reasons why it shouldn't be on by default. Therefore, I'm not willing to make an exception here." So I created a third party module on the Python cheese shop: http://pypi.python.org/pypi/faulthandler/ The module works on Linux, FreeBSD and Windows with Python 2.6 through 3.2.
msg130260 - (view)	Author: Éric Araujo (eric.araujo) *	Date: 2011-03-07 15:41
See #11393 for discussion about integration in 3.3.

History
Date	User	Action	Args
2022-04-11 14:57:01	admin	set	github: 53109
2011-03-07 15:41:02	eric.araujo	set	nosy: + eric.araujo messages: + msg130260
2010-12-24 02:17:17	vstinner	set	status: open -> closed messages: + msg124583 resolution: rejected nosy: amaury.forgeotdarc, davidfraser, belopolsky, scott.dial, pitrou, vstinner, sjt, skrah, dmalcolm, joshbressers
2010-12-23 12:33:33	vstinner	set	nosy: amaury.forgeotdarc, davidfraser, belopolsky, scott.dial, pitrou, vstinner, sjt, skrah, dmalcolm, joshbressers messages: + msg124552
2010-12-23 12:31:03	skrah	set	nosy: amaury.forgeotdarc, davidfraser, belopolsky, scott.dial, pitrou, vstinner, sjt, skrah, dmalcolm, joshbressers messages: + msg124551 title: Display Python backtrace on SIGSEGV, SIGFPE and fatal error -> Display Python backtrace on SIGSEGV, SIGFPE and fatal error
2010-12-23 12:12:33	vstinner	set	nosy: amaury.forgeotdarc, davidfraser, belopolsky, scott.dial, pitrou, vstinner, sjt, skrah, dmalcolm, joshbressers messages: + msg124550
2010-12-23 12:10:58	vstinner	set	nosy: amaury.forgeotdarc, davidfraser, belopolsky, scott.dial, pitrou, vstinner, sjt, skrah, dmalcolm, joshbressers messages: + msg124549
2010-12-23 12:02:52	vstinner	set	nosy: amaury.forgeotdarc, davidfraser, belopolsky, scott.dial, pitrou, vstinner, sjt, skrah, dmalcolm, joshbressers messages: + msg124548
2010-12-23 11:29:19	vstinner	set	nosy: amaury.forgeotdarc, davidfraser, belopolsky, scott.dial, pitrou, vstinner, sjt, skrah, dmalcolm, joshbressers messages: + msg124545
2010-12-23 11:10:28	skrah	set	nosy: + skrah messages: + msg124544
2010-12-23 09:42:39	sjt	set	nosy: amaury.forgeotdarc, davidfraser, belopolsky, scott.dial, pitrou, vstinner, sjt, dmalcolm, joshbressers messages: + msg124541
2010-12-23 03:45:29	scott.dial	set	nosy: amaury.forgeotdarc, davidfraser, belopolsky, scott.dial, pitrou, vstinner, sjt, dmalcolm, joshbressers messages: + msg124537 title: Display Python backtrace on SIGSEGV, SIGFPE and fatal error -> Display Python backtrace on SIGSEGV, SIGFPE and fatal error
2010-12-23 03:38:14	vstinner	set	nosy: amaury.forgeotdarc, davidfraser, belopolsky, scott.dial, pitrou, vstinner, sjt, dmalcolm, joshbressers messages: + msg124536
2010-12-23 03:35:56	vstinner	set	nosy: amaury.forgeotdarc, davidfraser, belopolsky, scott.dial, pitrou, vstinner, sjt, dmalcolm, joshbressers messages: + msg124535
2010-12-23 03:32:45	vstinner	set	nosy: amaury.forgeotdarc, davidfraser, belopolsky, scott.dial, pitrou, vstinner, sjt, dmalcolm, joshbressers messages: + msg124534
2010-12-23 03:30:35	vstinner	set	nosy: amaury.forgeotdarc, davidfraser, belopolsky, scott.dial, pitrou, vstinner, sjt, dmalcolm, joshbressers messages: + msg124533
2010-12-23 02:45:47	belopolsky	set	nosy: amaury.forgeotdarc, davidfraser, belopolsky, scott.dial, pitrou, vstinner, sjt, dmalcolm, joshbressers messages: + msg124531 title: Display Python backtrace on SIGSEGV, SIGFPE and fatal error -> Display Python backtrace on SIGSEGV, SIGFPE and fatal error
2010-12-23 02:27:00	scott.dial	set	nosy: amaury.forgeotdarc, davidfraser, belopolsky, scott.dial, pitrou, vstinner, sjt, dmalcolm, joshbressers messages: + msg124529 title: Display Python backtrace on SIGSEGV, SIGFPE and fatal error -> Display Python backtrace on SIGSEGV, SIGFPE and fatal error
2010-12-23 02:04:02	belopolsky	set	nosy: + sjt, belopolsky messages: + msg124528
2010-12-23 01:52:53	vstinner	set	nosy: amaury.forgeotdarc, davidfraser, scott.dial, pitrou, vstinner, dmalcolm, joshbressers versions: + Python 3.2
2010-12-23 01:52:40	vstinner	set	files: - segfault_handler-10.patch nosy: amaury.forgeotdarc, davidfraser, scott.dial, pitrou, vstinner, dmalcolm, joshbressers
2010-12-23 01:52:38	vstinner	set	files: - segfault_handler-9.patch nosy: amaury.forgeotdarc, davidfraser, scott.dial, pitrou, vstinner, dmalcolm, joshbressers
2010-12-23 01:52:17	vstinner	set	files: + segfault_handler-11.patch nosy: amaury.forgeotdarc, davidfraser, scott.dial, pitrou, vstinner, dmalcolm, joshbressers messages: + msg124527
2010-12-20 18:40:57	pitrou	set	nosy: amaury.forgeotdarc, davidfraser, scott.dial, pitrou, vstinner, dmalcolm, joshbressers messages: + msg124399 title: Display Python backtrace on SIGSEGV, SIGFPE and fatal error -> Display Python backtrace on SIGSEGV, SIGFPE and fatal error
2010-12-20 18:31:53	scott.dial	set	nosy: amaury.forgeotdarc, davidfraser, scott.dial, pitrou, vstinner, dmalcolm, joshbressers messages: + msg124397 title: Display Python backtrace on SIGSEGV, SIGFPE and fatal error -> Display Python backtrace on SIGSEGV, SIGFPE and fatal error
2010-12-20 14:25:48	vstinner	set	nosy: amaury.forgeotdarc, davidfraser, scott.dial, pitrou, vstinner, dmalcolm, joshbressers messages: + msg124388
2010-12-20 13:30:53	vstinner	set	nosy: amaury.forgeotdarc, davidfraser, scott.dial, pitrou, vstinner, dmalcolm, joshbressers messages: + msg124385
2010-12-20 06:55:02	scott.dial	set	nosy: + scott.dial messages: + msg124381
2010-12-20 00:12:32	vstinner	set	files: + segfault_handler-10.patch nosy: amaury.forgeotdarc, davidfraser, pitrou, vstinner, dmalcolm, joshbressers messages: + msg124373
2010-12-19 21:59:47	vstinner	set	nosy: amaury.forgeotdarc, davidfraser, pitrou, vstinner, dmalcolm, joshbressers messages: + msg124364
2010-12-18 01:20:10	amaury.forgeotdarc	set	nosy: + amaury.forgeotdarc messages: + msg124268
2010-12-18 01:11:46	vstinner	set	nosy: davidfraser, pitrou, vstinner, dmalcolm, joshbressers messages: + msg124267
2010-12-18 00:55:07	vstinner	set	files: - segfault_handler-8.patch nosy: davidfraser, pitrou, vstinner, dmalcolm, joshbressers
2010-12-18 00:55:03	vstinner	set	files: - segfault_handler-7.patch nosy: davidfraser, pitrou, vstinner, dmalcolm, joshbressers
2010-12-18 00:54:30	vstinner	set	files: - segfault_handler-6.patch nosy: davidfraser, pitrou, vstinner, dmalcolm, joshbressers
2010-12-18 00:54:27	vstinner	set	files: - segfault_handler-5.patch nosy: davidfraser, pitrou, vstinner, dmalcolm, joshbressers
2010-12-18 00:54:10	vstinner	set	files: + segfault_handler-9.patch nosy: davidfraser, pitrou, vstinner, dmalcolm, joshbressers messages: + msg124266
2010-12-14 03:14:02	r.david.murray	set	versions: + Python 3.3, - Python 3.2
2010-12-14 03:13:47	r.david.murray	set	type: enhancement stage: patch review
2010-12-02 02:15:10	vstinner	set	messages: + msg123037
2010-11-15 18:54:26	davidfraser	set	nosy: + davidfraser
2010-10-20 08:08:46	pitrou	set	messages: + msg119193
2010-10-19 23:48:29	vstinner	set	messages: + msg119179
2010-10-19 23:28:11	vstinner	set	files: + segfault_handler-8.patch messages: + msg119178
2010-10-19 23:04:24	vstinner	set	files: + segfault_handler-7.patch messages: + msg119174
2010-10-13 21:53:24	dmalcolm	set	messages: + msg118588
2010-10-13 21:50:31	vstinner	set	messages: + msg118587
2010-10-13 21:43:51	vstinner	set	files: + segfault_handler-6.patch messages: + msg118586
2010-10-13 16:45:39	pitrou	set	messages: + msg118543
2010-10-13 16:40:48	joshbressers	set	messages: + msg118541
2010-10-13 16:35:47	pitrou	set	messages: + msg118539
2010-10-13 16:32:46	joshbressers	set	nosy: + joshbressers messages: + msg118538
2010-10-13 15:25:29	pitrou	set	messages: + msg118529
2010-10-13 15:05:47	dmalcolm	set	messages: + msg118528
2010-10-13 14:59:49	dmalcolm	set	messages: + msg118527
2010-10-13 14:41:49	vstinner	set	messages: + msg118526
2010-10-13 11:20:47	vstinner	set	title: Segfault handler: display Python backtrace on segfault -> Display Python backtrace on SIGSEGV, SIGFPE and fatal error
2010-10-13 11:12:02	vstinner	set	files: - segfault_handler-4.patch
2010-10-13 11:11:57	vstinner	set	files: - segfault_handler-3.patch
2010-10-13 11:10:16	vstinner	set	files: + segfault_handler-5.patch messages: + msg118511
2010-10-13 00:10:20	vstinner	set	files: + segfault_handler-4.patch messages: + msg118491
2010-10-12 23:55:28	vstinner	set	messages: + msg118489
2010-10-12 23:46:25	vstinner	set	files: - segfault_handler-2.patch
2010-10-12 23:46:20	vstinner	set	files: - segfault_handler.patch
2010-10-12 23:46:16	vstinner	set	nosy: + dmalcolm
2010-10-12 23:44:18	vstinner	set	messages: + msg118487
2010-10-12 23:41:40	vstinner	set	files: + segfault_handler-3.patch messages: + msg118486
2010-08-18 22:39:25	vstinner	set	messages: + msg114289
2010-06-19 00:25:26	vstinner	set	files: + segfault_handler-2.patch messages: + msg108156
2010-05-31 19:10:17	vstinner	set	nosy: + pitrou messages: + msg106803
2010-05-31 19:01:23	vstinner	create