This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: python script segment fault at PyMarshal_ReadLastObjectFromFile in import_submodule
Type: crash Stage: resolved
Components: Interpreter Core Versions: Python 3.2, Python 2.7
process
Status: closed Resolution: works for me
Dependencies: Superseder:
Assigned To: Nosy List: Thomas.Smith, dmalcolm, doko, ezio.melotti, keescook, liang, neologix, pitrou, tim.peters, vstinner
Priority: high Keywords: patch

Created on 2009-11-16 07:50 by liang, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
traces.zip Thomas.Smith, 2010-02-16 18:03 backtraces and test script
marshal_stack.diff neologix, 2010-04-17 18:11 Only use malloc to buffer loaded files.
import_stackoverflow.sh vstinner, 2010-04-21 21:17
import_nostack_alloc.patch vstinner, 2010-04-21 23:59
Messages (28)
msg95325 - (view) Author: liang (liang) Date: 2009-11-16 07:50
In our testbed,we have seem serveral sgement fault in our python scrit.
The enviroment is:
linux=2.6.29.6-0.6.smp.gcc4.1.x86_64
python=2.4.4-41.4-1
GCC = GCC 4.1.2 20070626 (rPath Inc.)] on linux2
Below are the detail call stack:
(gdb) bt
#0  PyMarshal_ReadLastObjectFromFile (fp=0x73a550) at 
Python/marshal.c:748
#1  0x000000000047bbf9 in read_compiled_module 
(cpathname=0x7fff184ba600
"/usr/lib64/python2.4/sre_constants.pyc", 
fp=0x73a550) at Python/import.c:728
#2  0x000000000047da2c in load_source_module (name=0x7fff184bc740
"sre_constants", pathname=0x7fff184bb680 
"/usr/lib64/python2.4/sre_constants.py", fp=0x737df0)
    at Python/import.c:896
#3  0x000000000047e7bd in import_submodule (mod=0x6ea570,
subname=0x7fff184bc740 "sre_constants", fullname=0x7fff184bc740 
"sre_constants") at Python/import.c:2276
#4  0x000000000047ec3c in load_next (mod=0x6ea570, altmod=0x6ea570,
p_name=<value optimized out>, buf=0x7fff184bc740 
"sre_constants", p_buflen=0x7fff184bc73c)
    at Python/import.c:2096
#5  0x000000000047ee47 in PyImport_ImportModuleEx 
(name=0x7fff18bac298 "\001",
globals=0x7fff18bac2bc, locals=<value 
optimized out>, fromlist=0x7fff18c90990)
    at Python/import.c:1931
#6  0x000000000045f963 in builtin___import__ (self=<value optimized 
out>,
args=<value optimized out>) at 
Python/bltinmodule.c:45
#7  0x00000000004148e0 in PyObject_Call (func=0x73a550, arg=0x73a550,
kw=0x46e829e3) at Objects/abstract.c:1795
#8  0x00000000004628fd in PyEval_CallObjectWithKeywords 
(func=0x7fff18ca5440,
arg=0x7fff18c944c8, kw=0x0) at 
Python/ceval.c:3435
#9  0x000000000046461a in PyEval_EvalFrame (f=0x744650) at 
Python/ceval.c:2020
#10 0x0000000000468ce0 in PyEval_EvalCodeEx (co=0x7fff18c95ab0, 
globals=<value
optimized out>, locals=<value optimized out>, 
args=0x0, argcount=0, kws=0x0, kwcount=0,
    defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2741
#11 0x0000000000468d92 in PyEval_EvalCode (co=0x73a550, 
globals=0x73a550,
locals=0x46e829e3) at Python/ceval.c:484
#12 0x000000000047d29a in PyImport_ExecCodeModuleEx 
(name=0x7fff184bfce0
"sre_compile", co=0x7fff18c95ab0, 
pathname=0x7fff184bdba0 "/usr/lib64/python2.4/sre_compile.pyc")
    at Python/import.c:636
#13 0x000000000047d7d0 in load_source_module (name=0x7fff184bfce0
"sre_compile", pathname=0x7fff184bdba0 
"/usr/lib64/python2.4/sre_compile.pyc", fp=<value optimized out>)
    at Python/import.c:915
#14 0x000000000047e7bd in import_submodule (mod=0x6ea570,
subname=0x7fff184bfce0 "sre_compile", fullname=0x7fff184bfce0 
"sre_compile") at Python/import.c:2276
#15 0x000000000047ec3c in load_next (mod=0x6ea570, altmod=0x6ea570,
p_name=<value optimized out>, buf=0x7fff184bfce0 
"sre_compile", p_buflen=0x7fff184bfcdc)
    at Python/import.c:2096
#16 0x000000000047ee47 in PyImport_ImportModuleEx 
(name=0x7fff18c8fbd0 "\001",
globals=0x7fff18c8fbf4, locals=<value 
optimized out>, fromlist=0x6ea570) at Python/import.c:1931
#17 0x000000000045f963 in builtin___import__ (self=<value optimized 
out>,
args=<value optimized out>) at 
Python/bltinmodule.c:45
#18 0x00000000004148e0 in PyObject_Call (func=0x73a550, arg=0x73a550,
kw=0x46e829e3) at Objects/abstract.c:1795
#19 0x00000000004628fd in PyEval_CallObjectWithKeywords 
(func=0x7fff18ca5440,
arg=0x7fff18c94208, kw=0x0) at 
Python/ceval.c:3435
#20 0x000000000046461a in PyEval_EvalFrame (f=0x7b6680) at 
Python/ceval.c:2020
#21 0x0000000000468ce0 in PyEval_EvalCodeEx (co=0x7fff18c95500, 
globals=<value
optimized out>, locals=<value optimized out>, 
args=0x0, argcount=0, kws=0x0, kwcount=0,
    defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2741
#22 0x0000000000468d92 in PyEval_EvalCode (co=0x73a550, 
globals=0x73a550,
locals=0x46e829e3) at Python/ceval.c:484
#23 0x000000000047d29a in PyImport_ExecCodeModuleEx 
(name=0x7fff184c3280 "sre",
co=0x7fff18c95500, pathname=0x7fff184c1140 
"/usr/lib64/python2.4/sre.pyc")
    at Python/import.c:636
#24 0x000000000047d7d0 in load_source_module 
(name=0x7fff184c3280 "sre",
pathname=0x7fff184c1140 
"/usr/lib64/python2.4/sre.pyc", fp=<value optimized out>)
    at Python/import.c:915
#25 0x000000000047e7bd in import_submodule (mod=0x6ea570,
subname=0x7fff184c3280 "sre", fullname=0x7fff184c3280 "sre") at 
Python/import.c:2276
#26 0x000000000047ec3c in load_next (mod=0x6ea570, altmod=0x6ea570,
p_name=<value optimized out>, buf=0x7fff184c3280 "sre", 
p_buflen=0x7fff184c327c) at Python/import.c:2096
#27 0x000000000047ee47 in PyImport_ImportModuleEx 
(name=0x7fff18c8cc90 "\001",
globals=0x7fff18c8ccb4, locals=<value 
optimized out>, fromlist=0x7fff18c90450)
    at Python/import.c:1931
#28 0x000000000045f963 in builtin___import__ (self=<value optimized 
out>,
args=<value optimized out>) at 
Python/bltinmodule.c:45
#29 0x00000000004148e0 in PyObject_Call (func=0x73a550, arg=0x73a550,
kw=0x46e829e3) at Objects/abstract.c:1795
#30 0x00000000004628fd in PyEval_CallObjectWithKeywords 
(func=0x7fff18ca5440,
arg=0x7fff18c83788, kw=0x0) at 
Python/ceval.c:3435
#31 0x000000000046461a in PyEval_EvalFrame (f=0x753bb0) at 
Python/ceval.c:2020
#32 0x0000000000468ce0 in PyEval_EvalCodeEx (co=0x7fff18c8a7a0, 
globals=<value
optimized out>, locals=<value optimized out>, 
args=0x0, argcount=0, kws=0x0, kwcount=0,
    defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2741
#33 0x0000000000468d92 in PyEval_EvalCode (co=0x73a550, 
globals=0x73a550,
locals=0x46e829e3) at Python/ceval.c:484
#34 0x000000000047d29a in PyImport_ExecCodeModuleEx 
(name=0x7fff184c6820 "re",
co=0x7fff18c8a7a0, pathname=0x7fff184c46e0 
"/usr/lib64/python2.4/re.pyc") at Python/import.c:636
#35 0x000000000047d7d0 in load_source_module (name=0x7fff184c6820 "re",
pathname=0x7fff184c46e0 
"/usr/lib64/python2.4/re.pyc", fp=<value optimized out>) at 
Python/import.c:915
#36 0x000000000047e7bd in import_submodule (mod=0x6ea570,
subname=0x7fff184c6820 "re", fullname=0x7fff184c6820 "re") at 
Python/import.c:2276
#37 0x000000000047ec3c in load_next (mod=0x6ea570, altmod=0x6ea570,
p_name=<value optimized out>, buf=0x7fff184c6820 "re", 
p_buflen=0x7fff184c681c) at Python/import.c:2096
#38 0x000000000047ee47 in PyImport_ImportModuleEx 
(name=0x7fff18c8ca50 "\032",
globals=0x7fff18c8ca74, locals=<value 
optimized out>, fromlist=0x6ea570) at Python/import.c:1931
#39 0x000000000045f963 in builtin___import__ (self=<value optimized 
out>,
args=<value optimized out>) at 
Python/bltinmodule.c:45
#40 0x00000000004148e0 in PyObject_Call (func=0x73a550, arg=0x73a550,
kw=0x46e829e3) at Objects/abstract.c:1795
#41 0x00000000004628fd in PyEval_CallObjectWithKeywords 
(func=0x7fff18ca5440,
arg=0x7fff18c83680, kw=0x0) at 
Python/ceval.c:3435
#42 0x000000000046461a in PyEval_EvalFrame (f=0x7932d0) at 
Python/ceval.c:2020
#43 0x0000000000468ce0 in PyEval_EvalCodeEx (co=0x7fff18c8a730, 
globals=<value
optimized out>, locals=<value optimized out>, 
args=0x0, argcount=0, kws=0x0, kwcount=0,
    defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2741
#44 0x0000000000468d92 in PyEval_EvalCode (co=0x73a550, 
globals=0x73a550,
locals=0x46e829e3) at Python/ceval.c:484
#45 0x000000000047d29a in PyImport_ExecCodeModuleEx 
(name=0x7fff184c9dc0
"difflib", co=0x7fff18c8a730, 
pathname=0x7fff184c7c80 "/usr/lib64/python2.4/difflib.pyc")
    at Python/import.c:636
#46 0x000000000047d7d0 in load_source_module 
(name=0x7fff184c9dc0 "difflib",
pathname=0x7fff184c7c80 
"/usr/lib64/python2.4/difflib.pyc", fp=<value optimized out>)
    at Python/import.c:915
#47 0x000000000047e7bd in import_submodule (mod=0x6ea570,
subname=0x7fff184c9dc0 "difflib", fullname=0x7fff184c9dc0 
"difflib") at Python/import.c:2276
#48 0x000000000047ec3c in load_next (mod=0x6ea570, altmod=0x6ea570,
p_name=<value optimized out>, buf=0x7fff184c9dc0 
"difflib", p_buflen=0x7fff184c9dbc) at Python/import.c:2096
#49 0x000000000047ee47 in PyImport_ImportModuleEx 
(name=0x7fff18cb9300 "\001",
globals=0x7fff18cb9324, locals=<value 
optimized out>, fromlist=0x6ea570) at Python/import.c:1931
#50 0x000000000045f963 in builtin___import__ (self=<value optimized 
out>,
args=<value optimized out>) at 
Python/bltinmodule.c:45
#51 0x00000000004148e0 in PyObject_Call (func=0x73a550, arg=0x73a550,
kw=0x46e829e3) at Objects/abstract.c:1795
#52 0x00000000004628fd in PyEval_CallObjectWithKeywords 
(func=0x7fff18ca5440,
arg=0x7fff18c810a8, kw=0x0) at 
Python/ceval.c:3435
#53 0x000000000046461a in PyEval_EvalFrame (f=0x7921c0) at 
Python/ceval.c:2020
#54 0x0000000000468ce0 in PyEval_EvalCodeEx (co=0x7fff18623490, 
globals=<value
optimized out>, locals=<value optimized out>, 
args=0x0, argcount=0, kws=0x0, kwcount=0,
    defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2741
#55 0x0000000000468d92 in PyEval_EvalCode (co=0x73a550, 
globals=0x73a550,
locals=0x46e829e3) at Python/ceval.c:484
#56 0x00000000004853d9 in run_node (n=<value optimized out>, 
filename=<value
optimized out>, globals=0x718650, 
locals=0x718650, flags=<value optimized out>)
    at Python/pythonrun.c:1285
#57 0x00000000004868b8 in PyRun_SimpleFileExFlags (fp=<value optimized 
out>,
filename=0x7fff184ccbcc 
"/usr/local/maui/ganglia/lib/ganglia/python_modules/maui_svc.py",
    closeit=1, flags=0x7fff184cb350) at Python/pythonrun.c:869
#58 0x000000000041168d in Py_Main (argc=<value optimized out>,
argv=0x7fff184cb478) at Modules/main.c:493
#59 0x00007fff177f48a4 in __libc_start_main () from /lib64/libc.so.6
#60 0x0000000000410a59 in _start ()
Segment fault when it try to load sre_constants.pyc.

Another stack:

#0  PyMarshal_ReadLastObjectFromFile (fp=0x7f33f0) at 
Python/marshal.c:748
#1  0x000000000047bbf9 in read_compiled_module 
(cpathname=0x7fff069fe830
"/usr/lib64/python2.4/inspect.pyc", fp=0x7f33f0) at Python/import.c:728
#2  0x000000000047da2c in load_source_module 
(name=0x7fff06a00970 "inspect",
pathname=0x7fff069ff8b0 "/usr/lib64/python2.4/inspect.py", 
fp=0x7d97d0) at
Python/import.c:896
#3  0x000000000047e7bd in import_submodule (mod=0x6ea570,
subname=0x7fff06a00970 "inspect", fullname=0x7fff06a00970 "inspect") at
Python/import.c:2276
#4  0x000000000047ec3c in load_next (mod=0x6ea570, altmod=0x6ea570,
p_name=<value optimized out>, buf=0x7fff06a00970 "inspect",
p_buflen=0x7fff06a0096c) at Python/import.c:2096

Segment fault when it try to load inspect.pyc.

Another core at:
(gdb) bt
#0  PyMarshal_ReadLastObjectFromFile (fp=0x7dd190) at 
Python/marshal.c:748
#1  0x000000000047bbf9 in read_compiled_module 
(cpathname=0x7fff1bc03de0
"/usr/lib64/python2.4/string.pyc", fp=0x7dd190) at Python/import.c:728
#2  0x000000000047da2c in load_source_module 
(name=0x7fff1bc05f20 "string",
pathname=0x7fff1bc04e60 "/usr/lib64/python2.4/string.py", fp=0x7dc6f0) 
at
Python/import.c:896
#3  0x000000000047e7bd in import_submodule (mod=0x6ea570,
subname=0x7fff1bc05f20 "string", fullname=0x7fff1bc05f20 "string") at
Python/import.c:2276
#4  0x000000000047ec3c in load_next (mod=0x6ea570, altmod=0x6ea570,
p_name=<value optimized out>, buf=0x7fff1bc05f20 "string",
p_buflen=0x7fff1bc05f1c) at Python/import.c:2096
#5  0x000000000047ee47 in PyImport_ImportModuleEx 
(name=0x7fff1c6694b0 "\001",
globals=0x7fff1c6694d4, locals=<value optimized out>, 
fromlist=0x6ea570) at
Python/import.c:1931
#6  0x000000000045f963 in builtin___import__ (self=<value optimized 
out>,
args=<value optimized out>) at Python/bltinmodule.c:45
#7  0x00000000004148e0 in PyObject_Call (func=0x7dd190, arg=0x7dd190,
kw=0x46e829e3) at Objects/abstract.c:1795
#8  0x00000000004628fd in PyEval_CallObjectWithKeywords 
(func=0x7fff1c741440,
arg=0x7fff1c663890, kw=0x0) at Python/ceval.c:3435
#9  0x000000000046461a in PyEval_EvalFrame (f=0x744650) at 
Python/ceval.c:2020
#10 0x0000000000468ce0 in PyEval_EvalCodeEx (co=0x7fff1c66a8f0, 
globals=<value
optimized out>, locals=<value optimized out>, args=0x0, argcount=0, 
kws=0x0,
kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2741
#11 0x0000000000468d92 in PyEval_EvalCode (co=0x7dd190, 
globals=0x7dd190,
locals=0x46e829e3) at Python/ceval.c:484
#12 0x000000000047d29a in PyImport_ExecCodeModuleEx 
(name=0x7fff1bc094c0
"inspect", co=0x7fff1c66a8f0, pathname=0x7fff1bc07380
"/usr/lib64/python2.4/inspect.pyc") at Python/import.c:636
#13 0x000000000047d7d0 in load_source_module 
(name=0x7fff1bc094c0 "inspect",
pathname=0x7fff1bc07380 "/usr/lib64/python2.4/inspect.pyc", fp=<value 
optimized
out>) at Python/import.c:915
#14 0x000000000047e7bd in import_submodule (mod=0x6ea570,
subname=0x7fff1bc094c0 "inspect", fullname=0x7fff1bc094c0 "inspect") at
Python/import.c:2276
#15 0x000000000047ec3c in load_next (mod=0x6ea570, altmod=0x6ea570,
p_name=<value optimized out>, buf=0x7fff1bc094c0 "inspect",
p_buflen=0x7fff1bc094bc) at Python/import.c:2096
#16 0x000000000047ee47 in PyImport_ImportModuleEx 
(name=0x7fff1c65dba0 "\002",
globals=0x7fff1c65dbc4, locals=<value optimized out>, 
fromlist=0x6ea570) at
Python/import.c:1931

Segment fault when it try to load string.pyc.

We have seen it several times.However,the script is long running and 
we can not sure how it happened and how to make it reproduce.

Does anyone have any ideas on this?
msg99428 - (view) Author: resc (Thomas.Smith) Date: 2010-02-16 18:03
I'm also getting segfaults in PyMarshal_ReadLastObjectFromFile in Python 2.6.2 (on Ubuntu Jaunty).  It's very sporadic, I've been reproducing it by running a minimal script 100,000 times, and getting a few core dumps.  There are several Ubuntu bugreports in various packages that use Python:

https://bugs.launchpad.net/ubuntu/+source/apport/+bug/393022
https://bugs.launchpad.net/ubuntu/+source/gnome-python/+bug/432546
https://bugs.launchpad.net/ubuntu/+source/streamtuner/+bug/336331

I've attached a zip file with my test scripts and some gdb backtraces.  I am happy to spend time on this bug, although I only have a rudimentary knowledge of C, so I'd mainly be useful for testing.

The computer I'm having trouble on is a Dell PowerEdge T410, with a Xeon E5502, and it had another sporadic segfault problem in a should-be-reliable program, ImageMagick.  Switching to GraphicsMagick fixed that one, somehow.  If it's a hardware-specific bug, Python is the only program that's tickling it right now...
msg103370 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2010-04-16 21:58
It's definitely a stack overflow.
Most of the backtraces show an important number of frames.
The last frame is this:
#0  PyMarshal_ReadLastObjectFromFile (fp=0x13e8200)
    at ../Python/marshal.c:1026
	filesize = <value optimized out>

and a disassembly show us that the segfault is generated on a callq:
0x4bd4d6 <PyMarshal_ReadLastObjectFromFile+54>:	callq  0x4168e8 <fileno@plt>

And if you look at the code, it's obvious what's happening:
PyObject *
PyMarshal_ReadLastObjectFromFile(FILE *fp)
{
/* 75% of 2.1's .pyc files can exploit SMALL_FILE_LIMIT.
 * REASONABLE_FILE_LIMIT is by defn something big enough for Tkinter.pyc.
 */
#define SMALL_FILE_LIMIT (1L << 14)
#define REASONABLE_FILE_LIMIT (1L << 18)
#ifdef HAVE_FSTAT
	off_t filesize;
#endif
#ifdef HAVE_FSTAT
	filesize = getfilesize(fp);
	if (filesize > 0) {
		char buf[SMALL_FILE_LIMIT];
		char* pBuf = NULL;
		if (filesize <= SMALL_FILE_LIMIT)
			pBuf = buf;
		else if (filesize <= REASONABLE_FILE_LIMIT)
			pBuf = (char *)PyMem_MALLOC(filesize);
		if (pBuf != NULL) {
[...]
}

SMALL_FILE_LIMIT is 1 << 14 which is roughly 16K (not that reasonable :-).
So when we enter PyMarshal_ReadLastObjectFromFile and allocate buf, we push around 16K on the stack, which is a lot. That's why we segfault soon after when we call a function (callq), there's no space left on the stack.
So there are several solutions:
- make buf static, but it would increase Python size by almost 16K
- reduce SMALL_FILE_LIMIT, or remove it altogether. I guess SMALL_FILE_LIMIT is there to speedup loading of small files, but I'm not sure that malloc() would incur an important overhead
- reading the whole file to memory sounds weird, we sould probably be using mmap() here

Peers ?
msg103405 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-17 14:50
I agree that we can consider dropping the static buffer and always using PyMem_MALLOC().
It looks a bit strange for this bug to happen, though. Does Ubuntu use a small stack size?
msg103406 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-17 15:25
Oh, and the record of the original patch conversation (when this optimization was added) can be found here:
http://mail.python.org/pipermail/patches/2001-January/003500.html
msg103407 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-17 15:44
A small benchmark shows no difference in startup time when disabling the stack buffer. (this is on Linux: of course, the problem might be that the glibc is heavily optimized)

The benchmark was a simple:
$ time ./python -E -c "import logging, pydoc, xmlrpclib, urllib, urllib2, unittest, doctest, profile, smtplib, httplib, fractions, decimal, codecs, difflib, argparse, distutils, email, imaplib, idlelib, json, _pyio, poplib, ftplib"
msg103408 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2010-04-17 16:35
> It looks a bit strange for this bug to happen, though. Does Ubuntu use a small stack size?

There are other possible reasons:
- the programs that crash (or the libraries they're using) use the stack a lot
- somehow, pthread_attr_setstacksize is called and set to a reduced thread stack size
- notice that this is reported on x86_64 architectures, and stack usage is increased on 64 bits

Since I don't have an Ubuntu box, it would be nice if one of the reporters could:
- return the result of "ulimit -s" on their box
- call "ulimit -s unlimited", and see if the bug goes away
- check that the attached patch marshal_stack.diff solves this
- return the result of "ltrace -e pthread_attr_setstacksize <command used to start program>"

> A small benchmark shows no difference in startup time when disabling the stack buffer. (this is on Linux: of course, the problem might be that the glibc is heavily optimized)

Yeap, there as some crappy systems out there (no name :-), that's why it would be nice to have some feedback and small benchmarks on various platforms. Anyway, even if compiled files are small most of the time, I'm not sure that this "let's copy the file to the stack/heap" approcah is optimal, and maybe mmap would be worth considering if we find that the overhead is not negligible (I haven't looked at the code in detail, so maybe it's not possible to use in this case).
msg103417 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2010-04-17 18:11
Ok, I've done too some trivial benchmarking on my Linux box, and I get this:
right now:
$ time ./python /tmp/test_import.py
real    0m1.258s
user    0m1.111s
sys     0m0.101s

with mmap:
$ time ./python /tmp/test_import.py
real    0m1.262s
user    0m1.170s
sys     0m0.090s

with malloc only:
$ time ./python /tmp/test_import.py
real    0m1.213s
user    0m1.111s
sys     0m0.099s

The test script just imports every module available.
So I'd agree with Antoine, and think we should just use malloc. The attached patch marshal_stack.diff just does that.
msg103702 - (view) Author: Matthias Klose (doko) * (Python committer) Date: 2010-04-20 12:58
> Does Ubuntu use a small stack size?

it's 8192 on all architectures.
msg103703 - (view) Author: Matthias Klose (doko) * (Python committer) Date: 2010-04-20 13:05
I'm told it's 10240 on Fedora 12, x86 and x86_64
msg103704 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-04-20 13:11
Allocate more than 16 bytes on the stack is never a good idea. Eg. Linux does never resize the size automatically, and the only way to catch "allocatation failed" error is to handle the SIGSEGV signal...

Remove buf allocated on the stack by a buffer allocated on the heap is definitly a good ide :-)
msg103705 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-20 13:14
A 16KB stack buffer is tiny compared to a 8MB stack. I'm not sure removing that buffer would really fix the problems.
Perhaps other threads get a smaller stack?
msg103707 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-20 13:43
What's the value of MAXPATHLEN and PATH_MAX on those systems?
msg103708 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2010-04-20 13:46
The problem is highlighted with recursive imports:
a module which imports another module, which imports another module, etc. PyMarshal_ReadLastObjectFromFile is not the only function to use stack-allocated buffers, there are also load_source_module, load_package, import_module_level, which use char buf[MAXPATHLEN+1]: with a MAXPATHLEN to 1024, you lose 2 or 3K every time you do a recursive import.
And, as has been said, it might very well happen that new threads get a reduced stack size.
msg103710 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-20 14:00
> The problem is highlighted with recursive imports:
> a module which imports another module, which imports another module,
> etc. PyMarshal_ReadLastObjectFromFile is not the only function to use
> stack-allocated buffers, there are also load_source_module,
> load_package, import_module_level, which use char buf[MAXPATHLEN+1]:
> with a MAXPATHLEN to 1024, you lose 2 or 3K every time you do a
> recursive import.

Let's assume we lose ten times 1024 bytes, that's still only 10KB. The
stack is 8MB. We are argueing about less than 1% of the total stack
size.

I just went through all of the functions highlighted in one of these
stack traces (*). The only big consumers of stack space seem to be the
stack buffer in PyMarshal_ReadLastObjectFromFile, and the various file
path buffers using MAXPATHLEN.

(*) https://bugs.launchpad.net/ubuntu/+source/python2.6/+bug/432546

And that report shows only a single thread, so I have to assume that the
8MB figure applies there.

Nevertheless, we can remove the stack buffer since it's probably
useless. It just seems unlikely to me to be the root cause of the stack
overflow.
msg103716 - (view) Author: resc (Thomas.Smith) Date: 2010-04-20 14:25
Hi,
I'm working on reproducing this again, but it's always been a very
sporadic bug, and I haven't gotten a bingo yet.

- return the result of "ulimit -s" on their box
8192

- return the result of "ltrace -e pthread_attr_setstacksize <command
used to start program>"
There's no output from ltrace when I do this (except "exited (status
0)"), so I guess that function isn't called.

I wish I had a test case that would trigger the bug more reliably...
-Thomas
msg103719 - (view) Author: Matthias Klose (doko) * (Python committer) Date: 2010-04-20 14:28
PATH_MAX/MAXPATHLEN is 4096
msg103812 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2010-04-21 10:19
> And that report shows only a single thread, so I have to assume that the
8MB figure applies there.

> Nevertheless, we can remove the stack buffer since it's probably
useless. It just seems unlikely to me to be the root cause of the stack
overflow.

If we really have an 8MB stack, yes, it's unlikely. But max stack size is inherited by child processes, and see for example streamtuner (one of the reports):
http://bugs.gentoo.org/274056

--- src/streamtuner/st-thread.c
+++ src/streamtuner/st-thread.c
@@ -108,1 +108,1 @@
-			     0x18000, /* 96k, big enough for libcurl */
+			     0x40000, /* change from 96k to 256k */

So if we start with this stack size, we can run out of stack space really easily: I counted around 20 bufs allocation in some backtraces, and with MAXPATHLEN to 4K, it's 20 * 4 + 16 = 96K used.

There might be another reason. I think that Ubuntu's using gcc SSP feature by default, to prevent buffer overflows and friends, so maybe there's something going on with this. That would explain why it's only reported on Ubuntu (well, they also have more users, but let's assume there's really something specific on Ubuntu).

> I'm also getting segfaults in PyMarshal_ReadLastObjectFromFile in Python 2.6.2 (on Ubuntu Jaunty).  It's very sporadic, I've been reproducing it by running a minimal script 100,000 times, and getting a few core dumps.

I've had a look at your backtraces, and when it segfaults, the stack size is _really_ far from 8M. So there's realy somthing fishy going on here. Are you getting an error message printed beside the usual segmentation fault ? Could you try to reproduce with your test script with a python compiled with -fno-stack-protector and -U_FORTIFY_SOURCE ?
msg103817 - (view) Author: Matthias Klose (doko) * (Python committer) Date: 2010-04-21 10:57
> That would explain why it's only reported on Ubuntu

the original report is from the rPath distribution.
msg103819 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2010-04-21 11:22
> the original report is from the rPath distribution.

Never heard of this one, but http://wiki.rpath.com/wiki/rPath_Linux:rPath_Linux_2 states:

Compile with --fstack-protectorand FORTIFY_SOURCE=2 (override in your recipes by modifying the securityflags Conary macro), link with GNU hash and -O1, and use -fPIE for some key executables.
msg103885 - (view) Author: Kees Cook (keescook) Date: 2010-04-21 18:44
The stack protector will add 8 (aligned, so possibly padded) bytes to each stack frame of functions with arrays of 8 or greater bytes.  So if things are marginal, this could make the difference between Pythons compiled with/without -fstack-protector.

N.B. if rPath is compiled with -D_FORTIFY_SOURCE=2 and -O1, then -D_FORTIFY_SOURCE=2 has no effect (it is only activated at -O2 or higher).

Details on Ubuntu's compiler flag defaults:
https://wiki.ubuntu.com/CompilerFlags

Putting MAXPATH on the stack certainly seems like a big waste of space, though.  :)
msg103912 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-04-21 21:17
Here is a short shell script to reproduce the stack overflow:
 - create 100 Python modules: stack1 imports stack2, stack2 imports stack3, ...., and stack100 prints "hello"
 - each module calls os.system("cat /proc/%s/maps|grep stack" % os.getpid()) to display the stack map
 - set the max stack size to 128 KB

The stack starts with 86016 bytes and it crashs at import depth 6.

I don't know if my script is realistic (128 KB stack), but at least it shows a crash.

I think that most programs crash with small stack.
msg103916 - (view) Author: Kees Cook (keescook) Date: 2010-04-21 22:32
So, digging a little further, I think this is a now-fixed kernel bug with stack growth.  There were known issues prior to Sep 2009 with 64bit stack growth with ASLR, which is enabled by default.  Upstream fix:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=80938332d8cf652f6b16e0788cf0ca136befe0b5

This was fixed in stable releases of the Ubuntu kernels on Mar 16, 2010 (though the fix was included in Ubuntu 9.10 when it was released Oct 29, 2009).

The Launchpad bugs 432546 and 393022 were both filed prior to these kernel fixes, and show an un-maximized stack segment that has bumped up against the next-lower segment, which is how this kernel bug was manifesting.  (See their attached ProcMaps.txt files.)

I don't believe this is a Python bug, and I think the issue is solved for any distro that contains the above kernel fix.
msg103920 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-21 22:48
Thank you Kees, this sounds quite likely. I will still commit the patch to remove the stack buffer, and then close this issue.
msg103921 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-04-21 23:01
Patch committed in trunk (r80325) and py3k (r80326). I won't backport it to 2.6/3.1 since it's not likely to fix anything in practice -- it's just a nice simplification. Thanks everyone for comments and patches.
msg103927 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-04-21 23:59
I tried to limit memory allocated on the stack while importing modules. Number of bytes allocated on the stack:
 - without my patch: 13792 bytes per import
 - with my patch: 1632 bytes per import
(using import_stackoverflow.sh, import a short Python module)

I guess that it will not fix the issue, only report the crash to another function.

I'm attaching the patch to this issue only to keep a copy of it. The patch is complex and there is no good reason to commit it since the problem doesn't come from Python.

The patch allocates filename buffers on the heap in import.c, zipimport.c and marshal.c.
msg103962 - (view) Author: resc (Thomas.Smith) Date: 2010-04-22 13:17
> This was fixed in stable releases of the Ubuntu
> kernels on Mar 16, 2010 (though the fix was
> included in Ubuntu 9.10 when it was released
> Oct 29, 2009).
msg103965 - (view) Author: resc (Thomas.Smith) Date: 2010-04-22 13:21
Argh, that e-mail didn't work.  Anyway, I just wanted to say that the kernel explanation is consistent with my experience, I had a crash every week up until recently, when I upgraded, but in the past few days I haven't been able to reproduce it.
History
Date User Action Args
2022-04-11 14:56:54adminsetgithub: 51581
2010-04-29 17:50:49mark.dickinsonlinkissue770280 superseder
2010-04-22 13:21:58Thomas.Smithsetmessages: + msg103965
2010-04-22 13:17:58Thomas.Smithsetmessages: + msg103962
2010-04-21 23:59:37vstinnersetfiles: + import_nostack_alloc.patch

messages: + msg103927
2010-04-21 23:01:16pitrousetstatus: open -> closed
versions: - Python 2.6, Python 3.1
messages: + msg103921

resolution: works for me
stage: needs patch -> resolved
2010-04-21 22:48:18pitrousetmessages: + msg103920
2010-04-21 22:32:10keescooksetmessages: + msg103916
2010-04-21 21:17:08vstinnersetfiles: + import_stackoverflow.sh

messages: + msg103912
2010-04-21 18:44:04keescooksetnosy: + keescook
messages: + msg103885
2010-04-21 11:22:47neologixsetmessages: + msg103819
2010-04-21 10:57:37dokosetmessages: + msg103817
2010-04-21 10:19:11neologixsetmessages: + msg103812
2010-04-20 14:28:43dokosetmessages: + msg103719
2010-04-20 14:25:09Thomas.Smithsetmessages: + msg103716
2010-04-20 14:00:08pitrousetmessages: + msg103710
2010-04-20 13:46:59neologixsetmessages: + msg103708
2010-04-20 13:43:58pitrousetmessages: + msg103707
2010-04-20 13:14:44pitrousetmessages: + msg103705
2010-04-20 13:11:10vstinnersetnosy: + vstinner
messages: + msg103704
2010-04-20 13:05:03dokosetmessages: + msg103703
2010-04-20 12:58:54dokosetnosy: + doko
messages: + msg103702
2010-04-17 18:11:37neologixsetfiles: - marshal_stack.diff
2010-04-17 18:11:23neologixsetfiles: + marshal_stack.diff

messages: + msg103417
2010-04-17 16:35:28neologixsetfiles: + marshal_stack.diff
keywords: + patch
messages: + msg103408
2010-04-17 15:57:54dmalcolmsetnosy: + dmalcolm
2010-04-17 15:44:02pitrousetmessages: + msg103407
2010-04-17 15:25:44pitrousetpriority: normal -> high

messages: + msg103406
versions: + Python 3.1, Python 2.7, Python 3.2
2010-04-17 14:50:22pitrousetnosy: + pitrou, tim.peters
messages: + msg103405
2010-04-17 08:45:10neologixsetnosy: + ezio.melotti
2010-04-16 21:59:01neologixsetnosy: + neologix
messages: + msg103370
2010-02-16 23:44:48ezio.melottisetpriority: normal
stage: needs patch
versions: - Python 2.4
2010-02-16 18:03:11Thomas.Smithsetfiles: + traces.zip
versions: + Python 2.6
nosy: + Thomas.Smith

messages: + msg99428
2009-11-16 07:50:32liangcreate