Message 189123 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	kristjan.jonsson
Recipients	kristjan.jonsson
Date	2013-05-13.11:31:33
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1368444694.56.0.376585959821.issue17969@psf.upfronthosting.co.za>
In-reply-to

Content
We have observed this crash with some frequency when running our compilation scripts using multiprocessing.Pool() By analysing the crashes, this is what is happening: 1) The Pool has a "daemon" thread managing the pool. 2) the worker is asleep, waiting for the GIL 3) The main thread exits. The system starts its shutdown. During PyInterpreterState_Clear, it has cleared among other things the sys dict. During this, it clears an old traceback. The traceback contains a multiprocessing.connection object. 4) The connection object is cleared. It it contains this code: Py_BEGIN_ALLOW_THREADS CLOSE(self->handle); Py_END_ALLOW_THREADS 5) The sleeping daemon thread is woken up and starts prancing around. Upon calling sys.exc_clear() it crashes, since the tstate->interp->sysdict == NULL. I have a workaround in place in our codebase: static void connection_dealloc(ConnectionObject* self) { if (self->weakreflist != NULL) PyObject_ClearWeakRefs((PyObject)self); if (self->handle != INVALID_HANDLE_VALUE) { / CCP Change. Cannot release threads here, because this * deallocation may be running during process shutdown, and * releaseing a daemon thread will cause a crash Py_BEGIN_ALLOW_THREADS CLOSE(self->handle); Py_END_ALLOW_THREADS */ CLOSE(self->handle); } PyObject_Del(self); } In general, deallocators should have no side effects, I think. Releaseing the GIL is certainly a side effect. I realize that process shutdown is a delicate matter. One delicate thing is that we cannot allow worker threads to run anymore. I see no general mechanism for ensuring this, but surely at least not releasing the GIL for deallocators is a first step?

We have observed this crash with some frequency when running our compilation scripts using multiprocessing.Pool()

By analysing the crashes, this is what is happening:
1) The Pool has a "daemon" thread managing the pool.
2) the worker is asleep, waiting for the GIL
3) The main thread exits.  The system starts its shutdown. During PyInterpreterState_Clear, it has cleared among other things the sys dict.  During this, it clears an old traceback.  The traceback contains a multiprocessing.connection object.
4) The connection object is cleared.  It it contains this code:
        Py_BEGIN_ALLOW_THREADS
        CLOSE(self->handle);
        Py_END_ALLOW_THREADS
5) The sleeping daemon thread is woken up and starts prancing around.  Upon calling sys.exc_clear() it crashes, since the tstate->interp->sysdict == NULL.


I have a workaround in place in our codebase:


static void
connection_dealloc(ConnectionObject* self)
{
    if (self->weakreflist != NULL)
        PyObject_ClearWeakRefs((PyObject*)self);

    if (self->handle != INVALID_HANDLE_VALUE) {
        /* CCP Change.  Cannot release threads here, because this
         * deallocation may be running during process shutdown, and
         * releaseing a daemon thread will cause a crash
        Py_BEGIN_ALLOW_THREADS
        CLOSE(self->handle);
        Py_END_ALLOW_THREADS
         */
        CLOSE(self->handle);
    }
    PyObject_Del(self);
}


In general, deallocators should have no side effects, I think.  Releaseing the GIL is certainly a side effect.

I realize that process shutdown is a delicate matter.  One delicate thing is that we cannot allow worker threads to run anymore.  I see no general mechanism for ensuring this, but surely at least not releasing the GIL for deallocators is a first step?

History
Date	User	Action	Args
2013-05-13 11:31:34	kristjan.jonsson	set	recipients: + kristjan.jonsson
2013-05-13 11:31:34	kristjan.jonsson	set	messageid: <1368444694.56.0.376585959821.issue17969@psf.upfronthosting.co.za>
2013-05-13 11:31:34	kristjan.jonsson	link	issue17969 messages
2013-05-13 11:31:34	kristjan.jonsson	create