Message189123
We have observed this crash with some frequency when running our compilation scripts using multiprocessing.Pool()
By analysing the crashes, this is what is happening:
1) The Pool has a "daemon" thread managing the pool.
2) the worker is asleep, waiting for the GIL
3) The main thread exits. The system starts its shutdown. During PyInterpreterState_Clear, it has cleared among other things the sys dict. During this, it clears an old traceback. The traceback contains a multiprocessing.connection object.
4) The connection object is cleared. It it contains this code:
Py_BEGIN_ALLOW_THREADS
CLOSE(self->handle);
Py_END_ALLOW_THREADS
5) The sleeping daemon thread is woken up and starts prancing around. Upon calling sys.exc_clear() it crashes, since the tstate->interp->sysdict == NULL.
I have a workaround in place in our codebase:
static void
connection_dealloc(ConnectionObject* self)
{
if (self->weakreflist != NULL)
PyObject_ClearWeakRefs((PyObject*)self);
if (self->handle != INVALID_HANDLE_VALUE) {
/* CCP Change. Cannot release threads here, because this
* deallocation may be running during process shutdown, and
* releaseing a daemon thread will cause a crash
Py_BEGIN_ALLOW_THREADS
CLOSE(self->handle);
Py_END_ALLOW_THREADS
*/
CLOSE(self->handle);
}
PyObject_Del(self);
}
In general, deallocators should have no side effects, I think. Releaseing the GIL is certainly a side effect.
I realize that process shutdown is a delicate matter. One delicate thing is that we cannot allow worker threads to run anymore. I see no general mechanism for ensuring this, but surely at least not releasing the GIL for deallocators is a first step? |
|
Date |
User |
Action |
Args |
2013-05-13 11:31:34 | kristjan.jonsson | set | recipients:
+ kristjan.jonsson |
2013-05-13 11:31:34 | kristjan.jonsson | set | messageid: <1368444694.56.0.376585959821.issue17969@psf.upfronthosting.co.za> |
2013-05-13 11:31:34 | kristjan.jonsson | link | issue17969 messages |
2013-05-13 11:31:34 | kristjan.jonsson | create | |
|