Author pitrou
Recipients Albert.Zeyer, neologix, pitrou, r.david.murray
Date 2013-02-23.22:35:38
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1361658728.3600.7.camel@localhost.localdomain>
In-reply-to <CAH_1eM2SEonbgXCmrwjAjCmXYZJMv76vfUpji=1yQa68iHzhwg@mail.gmail.com>
Content
> > - how do you know the crash really happens because of thread 5?
> 
> All other threads are blocked on locks or condition variables, it's
> the only runnable thread.

Hm, you are right.

> > Another question: are threads being started or stopped while the thread local object is being deleted?
> 
> >From the stack trace, thread 2 is being stopped.
> 
> I guess the problem is similar to above: thread 2 is in the middle of
> stopping, its TLS dict is deallocated, which triggers the thread local
> object deallocation, which releases the GIL. Thread 5 becomes running,
> and must somehow access thread 2 tstate.

I've read the code several times and I find it unlikely that it's the
cause of the problem:
- the thread state's thread-local dict (tstate->dict) is deallocated
using Py_CLEAR(), meaning it's unreachable from other threads when
deallocating one of the values releases the GIL
- the thread-local object's deallocator checks that tstate->dict is
non-NULL before using it; the only thing that could go wrong is if
PyDict_GetItem() releases the GIL, which sounds unlikely on tstate->dict

(also, I've checked that threadmodule.c holds the GIL when inserting and
removing thread states from the interpreter's thread states list; it
would be more future-proof for local_dealloc to use pystate.c's
HEAD_LOCK() and HEAD_UNLOCK() APIs, though)

I'm wondering if there's something else interfering here. My attempts at
writing a stress-test script have failed to produce any crash.
History
Date User Action Args
2013-02-23 22:35:38pitrousetrecipients: + pitrou, r.david.murray, neologix, Albert.Zeyer
2013-02-23 22:35:38pitroulinkissue17263 messages
2013-02-23 22:35:38pitroucreate