classification
Title: Crash on mac os x leopard in mimetypes.guess_type (or PyObject_Malloc)
Type: crash Stage: test needed
Components: Interpreter Core, macOS Versions: Python 2.6
process
Status: closed Resolution: duplicate
Dependencies: Superseder:
Assigned To: ronaldoussoren Nosy List: jab, jrus, ncoghlan, r.david.murray, ronaldoussoren, santagada
Priority: high Keywords:

Created on 2009-08-22 21:26 by santagada, last changed 2010-10-20 14:56 by ronaldoussoren. This issue is now closed.

Files
File name Uploaded Description Edit
threadboom.py santagada, 2009-08-23 02:49
Messages (9)
msg91876 - (view) Author: Leonardo Santagada (santagada) Date: 2009-08-22 21:26
Python 2.6.2 (and the maint branch if using old mimetypes.py) crash
(with a bus error) on mac os x (10.5.7 & 10.5.8) with the file I posted.
The problem appears to be in the allocation of memory by the GC.

What I do is I call mimetypes.guess_type in more than one thread at the
same time, then I guess what is happening is this:

1. The first thread to run notices mimetypes.inited is false so it call
its init funtion.
2. Somehow the first thread loses the gil while still executing the init
3. Another thread tries to execute guess_type as it is already inited it
calls itself, in vain as the init still hasn't exchanged it value for
the new function so it goes into recursion
4. Somehow the allocator fails during the recursion

here is the final pieces of my stack trace (its a very long sequence of
recursions into guess_type):

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0xb0000ffc
[Switching to process 61544 thread 0x117]
0x96912122 in szone_malloc ()

#0  0x96912122 in szone_malloc ()
#1  0x969120d8 in malloc_zone_malloc ()
#2  0x9691206c in malloc ()
#3  0x0006f32c in PyObject_Malloc (nbytes=376) at Objects/obmalloc.c:913
913             return (void *)malloc(nbytes);
#4  0x0006fe61 in _PyObject_DebugMalloc (nbytes=360) at
Objects/obmalloc.c:1347
1347            p = (uchar *)PyObject_Malloc(total);
#5  0x00149b13 in _PyObject_GC_Malloc (basicsize=344) at
Modules/gcmodule.c:1351
1351            g = (PyGC_Head *)PyObject_MALLOC(
#6  0x00149c24 in _PyObject_GC_NewVar (tp=0x193500, nitems=5) at
Modules/gcmodule.c:1383
1383            PyVarObject *op = (PyVarObject *) _PyObject_GC_Malloc(size);
#7  0x00048a06 in PyFrame_New (tstate=0x33df30, code=0x473148,
globals=0x48e380, locals=0x0) at Objects/frameobject.c:642
642                         f = PyObject_GC_NewVar(PyFrameObject,
&PyFrame_Type,
#8  0x00100816 in PyEval_EvalCodeEx (co=0x473148, globals=0x48e380,
locals=0x0, args=0x374fb4, argcount=2, kws=0x374fbc, kwcount=0,
defs=0x4a6f9c, defcount=1, closure=0x0) at Python/ceval.c:2755
2755            f = PyFrame_New(tstate, co, globals, locals);
msg91883 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-08-23 02:15
Seems like issue 6626 could be helpful here.
msg91885 - (view) Author: Leonardo Santagada (santagada) Date: 2009-08-23 02:49
Well, the mimetypes module from 2.6 maintenance branch make this problem
not show up with mimetypes.guess_type, but I still think this is a bug
because pure python code should not crash the interpreter right?

I'm attaching the file I mentioned, I hope this counts as a test (it
needs the mimetypes module from python 2.6.2). I can probably extract
just the needed functions from the old mimetypes module if requested.
msg93828 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2009-10-10 13:24
The thread safety problem comes from the fact that performing file IO as
mimetypes.init() does will release the GIL - if you want to ensure
thread safety in that context, you have to do your own locking.
mimetypes ignore this thread syncrhonisation problem completely with
unhelpful results.

An attempt was made to address the race condition in r72045 by
eliminating the infinite recursion. Instead, you just get init() being
invoked multiple times on different MimeTypes instances, with the last
one "winning" and being kept as the _db module global (which does
eliminate the crash, but has problems of its own).

This crash probably involves hitting the recursion limit and that's
always a bit dicey as to whether we actually manage to trap it before
the C stack goes boom. With this case being an infinite recursion in an
__init__() method leading to an infinite number of a given object type
being allocated, that's a bit special since it raises the prospect of
potentially running out of heap memory (although that's unlikely with
the default recursion limit unless there are an awful lot of threads
involved).

To check the simple failure mechnism, I tried threading out the following:

class Broken():
  def __init__(self):
    break_it()

def break_it():
  Broken()

from threading import Thread
threads = [Thread(target=break_it) for x in range(100)]
for t in threads: t.start()

On my machine, the threads fail with "RuntimeError: maximum recursion
depth exceeded" for a recursion limit of 1000 or 10000, but segfault at
100,000. However, the 100k recursion limit segfaults even if I only use
a single thread (i.e. call break_it() directly without involving the
threading module at all).

For the OP:

What value do you get for sys.getrecursionlimit()?
Do you still get the segfault if you use sys.setrecursionlimit() to
lower the maximum allowed level of recursion? (e.g. limit it to 200 or
500 recursions)
msg93830 - (view) Author: Leonardo Santagada (santagada) Date: 2009-10-10 15:46
I'm on os x 10.6 where threadboom.py doesn't segfault anymore at least
on the system provided python. The problem that I see is that it
shouldn't be segfaulting on mac os x 10.5 with the default recursion
limit (I think it is 1000) with 2 threads. IIRC in a simple recursive
function (not on object __init__ like you did) I could put 10000 or more
as a recursion limit and still get a traceback, so I thought that 2
threads each with 1000 recursion limit should not be using the whole
stack. Also I think I did try to raise the stack limit with ulimit, but
I could be wrong.

Nick Coghlan, did you do your experments on os x 10.5? Can you try
threadboom.py on a python before the corrected mimetype lib landed
(somewhere between 2.6.2 and 2.6.3) or with an old version of the
mimetype lib?

I got the same errors both on my old white macbook core duo machine and
with a macbook pro core 2 duo and with both python 2.6.2 or 2.6.3 with
the old mimetypes lib.

I was worried with this bug because I guessed that maybe there is a race
condition of some sort on object creation on python 2.6. If someone can
reproduce the bug and understand the bug tell me it is a problem of
stack size I would rest my case and be happy with the segfault :).

ps: I will try to compile python2.6.2 here and reproduce the errors, if
I can I will reply with more info.
msg93833 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2009-10-10 16:41
Knew I forgot to mention something - I'm not on OS X at all (Linux,
Ubuntu 8.04). I was only looking at this bug because RDM cross-linked it
to the mimetypes patch I was reviewing this evening.

Running the threadboom code, it passes fine for me on all of SVN head,
the 2.6 maintenance branch and the system Python (2.5.2).

I've added the OS X maintainer to the nosy list.
msg105081 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2010-05-05 19:47
The script works fine for me (OSX 10.6.3, /usr/bin/python2.5, /usr/bin/python2.6, a recent build of 2.6.x, a recent build of 3.2 and the trunk)

The breakit example in msg93828 works in 64-bit binaries, and fails on 32-bit ones. This is almost certainly a stack overrun: I can remove the crash by increasing the stacksize using thread.stacksize(N) for a sufficiently large value of N.  (I don't mention a value for N because I don't know yet what the minimum size is to avoid the crash).

There are three possible actions w.r.t. this:

1) Ignore the issue (users can call thread.stack_size when they
   want deep recursion in threads)

2) Reduce the recursion limit on OSX to something that fits in the 
   default stack size of pthread

3) Increase the default stack size for new threads.

I have a slight preference for the first choice, although the last choice would be fine too. Reducing the recursion limit would also harm code that uses deep recursion on the main thread, which is why I'd be -1 on that.
msg114792 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-08-24 14:41
Issue1454481, which introduced the ability to set the thread stack size, indicates that the FreeBSD port maintainers were bumping the default limit higher.  So I think (3) is probably the correct solution.
msg119211 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2010-10-20 14:56
I'm closing this as a duplicate of #9670, that is: too deep recursion in a thread doesn't trigger the appropriate exception but causes a hard crash instead.

I have attached a patch to that issue (but haven't applied it yet, I'd like someone else too look at the patch as well).

BTW. I don't think this issue is serious enough to warrant a backport to 2.6.
History
Date User Action Args
2010-10-20 14:56:28ronaldoussorensetstatus: open -> closed
resolution: duplicate
messages: + msg119211
2010-08-24 14:41:39r.david.murraysetmessages: + msg114792
2010-05-05 19:47:55ronaldoussorensetmessages: + msg105081
2010-05-03 06:23:55belopolskysetassignee: ronaldoussoren
components: + macOS
nosy: ronaldoussoren, ncoghlan, r.david.murray, jab, jrus, santagada
2009-10-10 16:41:14ncoghlansetnosy: + ronaldoussoren
messages: + msg93833
2009-10-10 15:46:56santagadasetmessages: + msg93830
2009-10-10 13:24:49ncoghlansetnosy: + ncoghlan
messages: + msg93828
2009-08-23 02:49:23santagadasetfiles: + threadboom.py

messages: + msg91885
2009-08-23 02:17:03r.david.murraysetnosy: + jrus
2009-08-23 02:15:54r.david.murraysetpriority: high

nosy: + r.david.murray
messages: + msg91883

stage: test needed
2009-08-23 00:46:28jabsetnosy: + jab
2009-08-22 21:26:34santagadacreate