Author neologix
Recipients bquinlan, dmalcolm, jnoller, kristjan.jonsson, lukasz.langa, neologix, sandro.tosi, ysj.ray
Date 2011-04-15.19:00:05
SpamBayes Score 6.66134e-16
Marked as misclassified No
Message-id <>
This is due to a bug in the TLS key management when mixed with fork.
Here's what happens:
When a thread is created, a tstate is allocated and stored in the thread's TLS:
thread_PyThread_start_new_thread -> t_bootstrap -> _PyThreadState_Init -> _PyGILState_NoteThreadState:

    if (PyThread_set_key_value(autoTLSkey, (void *)tstate) < 0)
        Py_FatalError("Couldn't create autoTLSkey mapping");

PyThread_set_key_value(int key, void *value)
    int fail;
    void *oldValue = pthread_getspecific(key);
    if (oldValue != NULL)
        return 0;
    fail = pthread_setspecific(key, value);
    return fail;

A pthread_getspecific(key) is performed to see if there was already a value associated to this key.
The problem is that, if a process has a thread with a given thread ID (and a tstate stored in its TLS), and then the process forks (from another thread), if a new thread is created with the same thread ID as the thread in the child process, pthread_getspecific(key) will return the value stored by the other thread (with the same thread ID). In short, thread-specific values are inherited across fork, and if you're unlucky and create a thread with a thread ID already existing in the parent process, you're screwed.
To conclude, PyGILState_GetThisThreadState, which calls PyThread_get_key_value(autoTLSkey) will return the other thread's tstate, which will triggers this fatal error in PyThreadState_Swap.

The patch attached fixes this issue by removing the call to pthread_getspecific(key) from PyThread_set_key_value. This solves the problem and doesn't seem to cause any regression in test_threading and test_multiprocessing, and I think that if we were to call PyThread_set_key_value twice on the same key it's either an error, or we want the last version to be stored, not the old one.
test_threading and test_multiprocessing now run fine without any fatal error.

Note that this is probably be a bug in RHEL pthread's implementation, but given how widespread RHEL and derived distros are, I think this should be fixed.
I've attached a patch and a small test program to check if thread-specific data is inherited across a fork.
Here's a sample run on a RHEL4U8 box:

$ /tmp/test
PID: 17922, TID: 3086187424, init value: (nil)
PID: 17924, TID: 3086187424, init value: 0xdeadbeef

The second thread has been created in the child process and inherited the first thread's (created by the parent) key's value (one condition for this to happen is of course that the second thread is allocated the same thread ID as the first one).
Date User Action Args
2011-04-15 19:00:07neologixsetrecipients: + neologix, bquinlan, kristjan.jonsson, jnoller, dmalcolm, sandro.tosi, ysj.ray, lukasz.langa
2011-04-15 19:00:07neologixsetmessageid: <>
2011-04-15 19:00:06neologixlinkissue10517 messages
2011-04-15 19:00:06neologixcreate