classification
Title: Thread local storage and PyGILState_* mucked up by os.fork()
Type: crash Stage:
Components: Interpreter Core Versions: Python 2.5.3
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: Nosy List: Rhamphoryncus, barry, benjamin.peterson, christian.heimes, gregory.p.smith, jnoller, loewis, roudkerk, schmir
Priority: release blocker Keywords:

Created on 2007-12-21 18:53 by roudkerk, last changed 2008-12-13 14:59 by loewis. This issue is now closed.

Files
File name Uploaded Description Edit
dbg2.py roudkerk, 2007-12-21 18:53
fork-thread-patch roudkerk, 2008-01-06 15:04
fork-thread-patch-2 Rhamphoryncus, 2008-06-11 23:27
Messages (13)
msg58954 - (view) Author: roudkerk (roudkerk) Date: 2007-12-21 18:53
I got a report that one of the tests for processing
(http://cheeseshop.python.org/pypi/processing) was failing with

    Fatal Python error: Invalid thread state for this thread

when run with a debug interpreter.  This appears to be caused by the
interaction between os.fork() and threads.  The following attached
program reliably reproduces the problem for me on Ubuntu edgy for x86.

All that happens is that the parent process starts a subthread then
forks, and then the child process starts a subthread.

With the normal interpreter I get

    started thread -1211028576 in process 18683
    I am thread -1211028576 in process 18683
    started thread -1211028576 in process 18685
    I am thread -1211028576 in process 18685

as expected, but with the debug interpreter I get

    started thread -1210782816 in process 18687
    I am thread -1210782816 in process 18687
    started thread -1210782816 in process 18689
    Fatal Python error: Invalid thread state for this thread
    [5817 refs]

Notice that the child process is reusing a thread id that was 
being used by the parent process at the time of the fork.

The code raising the error seems to be in pystate.c:

    PyThreadState *
    PyThreadState_Swap(PyThreadState *new)
    {
            PyThreadState *old = _PyThreadState_Current;

            _PyThreadState_Current = new;
            /* It should not be possible for more than one thread state
               to be used for a thread.  Check this the best we can in
debug 
               builds.
            */
    #if defined(Py_DEBUG) && defined(WITH_THREAD)
            if (new) {
                    PyThreadState *check = PyGILState_GetThisThreadState();
                    if (check && check != new)
                            Py_FatalError("Invalid thread state for this
thread");
            }
    #endif
            return old;
    }

It looks as though PyGILState_GetThisThreadState() is returning the
thread state of the thread from the parent process which has the same
id as the current thread.  Therefore the check fails.

I think the thread local storage implementation in thread.c should
provide a function _PyThread_ReInitTLS() which PyOS_AfterFork() can
call.  I think _PyThread_ReInitTLS() just needs to remove and free
each "struct key" in the linked list which does not match the id of
the calling thread.
msg59379 - (view) Author: roudkerk (roudkerk) Date: 2008-01-06 15:04
The included patch against python2.51 fixes the problem for me.
msg59576 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2008-01-09 00:06
Bug day task
msg68024 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-06-11 21:26
we need this in before 2.6 is released.
msg68031 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008-06-11 22:59
Gregory, go ahead and apply and see if can stop the hell in the buildbots.
msg68036 - (view) Author: Adam Olsen (Rhamphoryncus) Date: 2008-06-11 23:27
Updated version of roudkerk's patch.  Adds the new function to
pythread.h and is based off of current trunk.

Note that Parser/intrcheck.c isn't used on my box, so it's completely
untested.

roudkerk's original analysis is correct.  The TLS is never informed that
the old thread is gone, so when it seems the same id again it assumes it
is the old thread, which PyThreadState_Swap doesn't like.
msg68038 - (view) Author: Adam Olsen (Rhamphoryncus) Date: 2008-06-11 23:57
Incidentally, it doesn't seem necessary to reinitialize the lock.  Posix
duplicates the lock, so if you hold it when you fork your child will be
able to unlock it and use it as normal.  Maybe there's some non-Posix
behaviour or something even more obscure when #401226 was done?

(reinitializing is essentially harmless though, so in no way should this
hold up release.)
msg68109 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008-06-13 00:10
I applied this in r64212.
msg76535 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2008-11-28 15:15
The fix is required to run multiprocessing on Python 2.4 and 2.5, see
#4451. I suggest we fix the issue in 2.5.3. The fork-thread-patch-2
patch doesn't work on Python 2.5. I'm getting a segfault on my system:

test_connection (multiprocessing.tests.WithProcessesTestConnection) ...
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fa2e999f6e0 (LWP 10594)]
0x000000000052065f in PyCFunction_Call (func=Cannot access memory at
address 0x7ffeffffffd8
) at Objects/methodobject.c:73
73                              return (*meth)(self, arg);

Linux on AMD64 with Python 2.5 svn --with-pydebug.
msg76597 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2008-11-29 15:00
I was wrong and the patch is right. Something is wrong in
multiprocessings connection_recvbytes_into() function for the old buffer
protocol. Somehow PyArg_ParseTuple(args, "w#|" ...) fucks up the
process' memory.

Martin, are you fine with the patch? fork-thread-patch-2 still applies
with some fuzzying.
msg76627 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-11-29 22:44
Looks fine to me, somebody please backport it.

I'm concerned about the memory leak that this may cause, though (IIUC):
all the values are garbage, provided that they are pointers to dynamic
memory in the first place.

I think in the long run, we should allow to store a pointer to a release
function along with actual value to release.
msg76658 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2008-11-30 19:47
Since this is a Python 2.5.3 issue, I'm lowering to deferred blocker
until after 3.0 and 2.6.1 are released.
msg77739 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-12-13 14:59
Committed fork-thread-patch-2 as r67736 into 2.5 branch.
History
Date User Action Args
2008-12-13 14:59:36loewissetstatus: open -> closed
messages: + msg77739
2008-12-04 02:04:43barrysetpriority: deferred blocker -> release blocker
2008-11-30 19:47:10barrysetpriority: release blocker -> deferred blocker
nosy: + barry
messages: + msg76658
2008-11-29 22:44:04loewissetpriority: critical -> release blocker
assignee: loewis ->
resolution: accepted
messages: + msg76627
2008-11-29 15:00:13christian.heimessetassignee: christian.heimes -> loewis
messages: + msg76597
nosy: + loewis
2008-11-28 22:44:34gregory.p.smithsetassignee: gregory.p.smith -> christian.heimes
2008-11-28 17:58:30jnollersetnosy: + jnoller
2008-11-28 15:16:47benjamin.petersonsetversions: - Python 2.6, Python 2.5, Python 3.0
2008-11-28 15:15:33christian.heimessetstatus: closed -> open
resolution: fixed -> (no value)
messages: + msg76535
versions: + Python 2.5.3
2008-06-13 00:10:00benjamin.petersonsetstatus: open -> closed
resolution: fixed
messages: + msg68109
2008-06-11 23:57:51Rhamphoryncussetmessages: + msg68038
2008-06-11 23:27:30Rhamphoryncussetfiles: + fork-thread-patch-2
messages: + msg68036
2008-06-11 22:59:14benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg68031
2008-06-11 21:26:35gregory.p.smithsetpriority: high -> critical
assignee: gregory.p.smith
messages: + msg68024
nosy: + gregory.p.smith
2008-06-11 21:23:52gregory.p.smithlinkissue3082 dependencies
2008-06-06 13:09:41benjamin.petersonlinkissue3050 dependencies
2008-03-07 17:18:25schmirsetnosy: + schmir
2008-01-09 07:12:18Rhamphoryncussetnosy: + Rhamphoryncus
2008-01-09 00:06:10christian.heimessetpriority: high
nosy: + christian.heimes
messages: + msg59576
versions: - Python 2.4
2008-01-06 15:04:00roudkerksetfiles: + fork-thread-patch
messages: + msg59379
versions: + Python 2.6, Python 3.0
2007-12-21 18:53:46roudkerkcreate