classification
Title: Core error in Py_EvalFrameEx 2.6.2
Type: crash Stage:
Components: Interpreter Core Versions: Python 2.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: mroach, pitrou, tsavannah
Priority: normal Keywords:

Created on 2009-06-02 15:36 by tsavannah, last changed 2009-06-03 17:14 by tsavannah.

Messages (8)
msg88748 - (view) Author: Tim Savannah (tsavannah) Date: 2009-06-02 15:36
I'm getting many segmentation faults (about 1 per half hour) from within
the core of python 2.6.2 on 64-bit machines.

(examples from dmesg:
pythonLaunch.py[13307]: segfault at 0000000000000058 rip
00002b845cfb3550 rsp 0000000041809930 error 4
pythonLaunch.py[27589]: segfault at 0000000000000058 rip
00002b4112287906 rsp 0000000042dab930 error 4
pythonLaunch.py[14436]: segfault at 0000000000000058 rip
00002ae0a4f68550 rsp 0000000042cd9930 error 4
pythonLaunch.py[10374]: segfault at 0000000000000058 rip
00002af43f966906 rsp 000000004214b930 error 4
pythonLaunch.py[17656]: segfault at 0000000000000058 rip
00002aed0cfe8906 rsp 00000000417f0930 error 4
)
pythonLaunch.py is a symbolic link to python 2.6.2 binary.
From disassembling the python binary, I've found the corrosponding line
in source to be ceval.c:2717

if (tstate->frame->f_exc_type != NULL)

tstate->frame is null, and an access on f_exc_type causes a segfault
(trying to access memory 0x58, see above segfaults).

I can't find any clear code path that could cause tstate->frame to go
null, any suggestions? This is preventing us from moving from python 2.4
32-bit to python 2.6 64-bit.
msg88749 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-06-02 16:07
Have you compiled Python yourself?
Are you using any third-party C extensions?
msg88750 - (view) Author: Tim Savannah (tsavannah) Date: 2009-06-02 16:11
Yes I compiled python myself, using ./configure
--prefix=/usr/local/python2.6/ --with-pth --enable-shared

It is a 64-bit compile.

I've done this with both standard config and a config that I modded
which produces optimizations options as -ggdb3 -O0. Both contain the
segfault error.

We are including some external site packages, but there is no consistent
site package import or usage that causes the segfault, it just seems
that heavy stress with many threads going off has a race chance to cause it.

I can send any additional info that can help debug this issue
msg88754 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-06-02 17:22
If the third-party packages include some C extensions, have they been
recompiled with the new Python build?

Also, does the segfault disappear if you disable optimizations?
Have you tried with another compiler version?

(I'm asking all this because to my knowledge it's the first time such
random crashes happen in the Python core with 2.6, which was released
quite a while ago)
msg88755 - (view) Author: Tim Savannah (tsavannah) Date: 2009-06-02 17:32
All site-packages were compiled against python 2.6.1, and python was
upgraded later to 2.6.2 (but upon running a make install with python
2.6.2, it seemed to recompile site-packages on a byte-code level).

And no, there is still segfaults without optimizations, I've tried at
-O2 -O and -O0 ( -O0 being no optimization). Judging by the invalid read
always being on 0x58, and the line of assembly accessing 0x58 offset
from a register, tstate->frame must be being initilized to NULL (or
always being corrupted to point to other NULL data)

The compiler used is gcc version 4.1.2 20071124 (Red Hat 4.1.2-42)

The setup we are using is 8-core xeon 64-bit servers. (We have about 14
of these, Centos based systems, all are experiencing the segfaults).
msg88778 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-06-02 23:35
> And no, there is still segfaults without optimizations, I've tried at
> -O2 -O and -O0 ( -O0 being no optimization).

Then you can try rebuilding Python after "./configure --with-pydebug".
It will add some runtime checks, perhaps it will find the cause of the
problem.
msg88817 - (view) Author: Tim Savannah (tsavannah) Date: 2009-06-03 17:13
recompiled with pydebug enabled, and recompiled all site-packages. Still
getting exceptions, however they are occuring within the python binary
now and not libpython2.6.1 .

pythonLaunch.py[25914]: segfault at 0000000000000068 rip
00000000004c7694 rsp 000000004181a4c0 error 4
pythonLaunch.py[1421]: segfault at 0000000000000068 rip 00000000004c7694
rsp 00000000432914c0 error 4
pythonLaunch.py[2552]: segfault at 0000000000000068 rip 00000000004c7694
rsp 0000000041f7d4c0 error 4
msg88818 - (view) Author: Tim Savannah (tsavannah) Date: 2009-06-03 17:14
to update, no additional output was seen from pydebug.
History
Date User Action Args
2009-06-03 17:14:49tsavannahsetmessages: + msg88818
2009-06-03 17:13:19tsavannahsetmessages: + msg88817
2009-06-02 23:35:02pitrousetmessages: + msg88778
2009-06-02 19:34:22mroachsetnosy: + mroach
2009-06-02 17:32:30tsavannahsetmessages: + msg88755
2009-06-02 17:22:49pitrousetmessages: + msg88754
2009-06-02 16:11:42tsavannahsetmessages: + msg88750
2009-06-02 16:07:04pitrousetnosy: + pitrou
messages: + msg88749
2009-06-02 15:36:11tsavannahcreate