classification
Title: Core error in Py_EvalFrameEx 2.6.2
Type: crash Stage: resolved
Components: Interpreter Core Versions: Python 2.6
process
Status: closed Resolution: works for me
Dependencies: Superseder:
Assigned To: Nosy List: DragonFireCK, mroach, pitrou, tsavannah
Priority: normal Keywords:

Created on 2009-06-02 15:36 by tsavannah, last changed 2013-05-04 21:26 by pitrou. This issue is now closed.

Messages (11)
msg88748 - (view) Author: Tim Savannah (tsavannah) Date: 2009-06-02 15:36
I'm getting many segmentation faults (about 1 per half hour) from within
the core of python 2.6.2 on 64-bit machines.

(examples from dmesg:
pythonLaunch.py[13307]: segfault at 0000000000000058 rip
00002b845cfb3550 rsp 0000000041809930 error 4
pythonLaunch.py[27589]: segfault at 0000000000000058 rip
00002b4112287906 rsp 0000000042dab930 error 4
pythonLaunch.py[14436]: segfault at 0000000000000058 rip
00002ae0a4f68550 rsp 0000000042cd9930 error 4
pythonLaunch.py[10374]: segfault at 0000000000000058 rip
00002af43f966906 rsp 000000004214b930 error 4
pythonLaunch.py[17656]: segfault at 0000000000000058 rip
00002aed0cfe8906 rsp 00000000417f0930 error 4
)
pythonLaunch.py is a symbolic link to python 2.6.2 binary.
From disassembling the python binary, I've found the corrosponding line
in source to be ceval.c:2717

if (tstate->frame->f_exc_type != NULL)

tstate->frame is null, and an access on f_exc_type causes a segfault
(trying to access memory 0x58, see above segfaults).

I can't find any clear code path that could cause tstate->frame to go
null, any suggestions? This is preventing us from moving from python 2.4
32-bit to python 2.6 64-bit.
msg88749 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-06-02 16:07
Have you compiled Python yourself?
Are you using any third-party C extensions?
msg88750 - (view) Author: Tim Savannah (tsavannah) Date: 2009-06-02 16:11
Yes I compiled python myself, using ./configure
--prefix=/usr/local/python2.6/ --with-pth --enable-shared

It is a 64-bit compile.

I've done this with both standard config and a config that I modded
which produces optimizations options as -ggdb3 -O0. Both contain the
segfault error.

We are including some external site packages, but there is no consistent
site package import or usage that causes the segfault, it just seems
that heavy stress with many threads going off has a race chance to cause it.

I can send any additional info that can help debug this issue
msg88754 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-06-02 17:22
If the third-party packages include some C extensions, have they been
recompiled with the new Python build?

Also, does the segfault disappear if you disable optimizations?
Have you tried with another compiler version?

(I'm asking all this because to my knowledge it's the first time such
random crashes happen in the Python core with 2.6, which was released
quite a while ago)
msg88755 - (view) Author: Tim Savannah (tsavannah) Date: 2009-06-02 17:32
All site-packages were compiled against python 2.6.1, and python was
upgraded later to 2.6.2 (but upon running a make install with python
2.6.2, it seemed to recompile site-packages on a byte-code level).

And no, there is still segfaults without optimizations, I've tried at
-O2 -O and -O0 ( -O0 being no optimization). Judging by the invalid read
always being on 0x58, and the line of assembly accessing 0x58 offset
from a register, tstate->frame must be being initilized to NULL (or
always being corrupted to point to other NULL data)

The compiler used is gcc version 4.1.2 20071124 (Red Hat 4.1.2-42)

The setup we are using is 8-core xeon 64-bit servers. (We have about 14
of these, Centos based systems, all are experiencing the segfaults).
msg88778 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-06-02 23:35
> And no, there is still segfaults without optimizations, I've tried at
> -O2 -O and -O0 ( -O0 being no optimization).

Then you can try rebuilding Python after "./configure --with-pydebug".
It will add some runtime checks, perhaps it will find the cause of the
problem.
msg88817 - (view) Author: Tim Savannah (tsavannah) Date: 2009-06-03 17:13
recompiled with pydebug enabled, and recompiled all site-packages. Still
getting exceptions, however they are occuring within the python binary
now and not libpython2.6.1 .

pythonLaunch.py[25914]: segfault at 0000000000000068 rip
00000000004c7694 rsp 000000004181a4c0 error 4
pythonLaunch.py[1421]: segfault at 0000000000000068 rip 00000000004c7694
rsp 00000000432914c0 error 4
pythonLaunch.py[2552]: segfault at 0000000000000068 rip 00000000004c7694
rsp 0000000041f7d4c0 error 4
msg88818 - (view) Author: Tim Savannah (tsavannah) Date: 2009-06-03 17:14
to update, no additional output was seen from pydebug.
msg169996 - (view) Author: Chris Kaynor (DragonFireCK) Date: 2012-09-07 17:24
Was any resolution found for this? I am debugging some intermittent crashes now which have the same visible callstack as Tim reported.

tstate->frame is NULL on line 2717 of ceval.c

I am using an in-house compiled Python 2.6.4, compiled with Visual Studio 2008, compiled for x64 and running on Windows 7.

The Python code that appears to be executing is calling into the pywin32 module a number of times.
msg169998 - (view) Author: Tim Savannah (tsavannah) Date: 2012-09-07 17:31
As an update (since someone else has this problem) this issue stopped once we converted from centos to archlinux (www.archlinux.org). May be an underlying issue with something in the centos environment. We used the same modules same configuration basically same compilation for python.
msg188409 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-04 21:26
> As an update (since someone else has this problem) this issue stopped 
> once we converted from centos to archlinux (www.archlinux.org). May be 
> an underlying issue with something in the centos environment. We used 
> the same modules same configuration basically same compilation for
> python.

Ok, let's close this issue then.
History
Date User Action Args
2013-05-04 21:26:00pitrousetstatus: open -> closed
resolution: works for me
messages: + msg188409

stage: resolved
2012-09-07 17:31:54tsavannahsetmessages: + msg169998
2012-09-07 17:24:47DragonFireCKsetnosy: + DragonFireCK
messages: + msg169996
2009-06-03 17:14:49tsavannahsetmessages: + msg88818
2009-06-03 17:13:19tsavannahsetmessages: + msg88817
2009-06-02 23:35:02pitrousetmessages: + msg88778
2009-06-02 19:34:22mroachsetnosy: + mroach
2009-06-02 17:32:30tsavannahsetmessages: + msg88755
2009-06-02 17:22:49pitrousetmessages: + msg88754
2009-06-02 16:11:42tsavannahsetmessages: + msg88750
2009-06-02 16:07:04pitrousetnosy: + pitrou
messages: + msg88749
2009-06-02 15:36:11tsavannahcreate