Issue13156
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2011-10-12 06:33 by grahamd, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
pystate.c.diff | grahamd, 2011-10-12 06:33 | |||
tstate_after_fork.diff | neologix, 2011-10-12 20:14 | review |
Messages (14) | |||
---|---|---|---|
msg145383 - (view) | Author: Graham Dumpleton (grahamd) | Date: 2011-10-12 06:33 | |
This is a followup bug report to fix wrong implementation of _PyGILState_Reinit() introduced by http://bugs.python.org/issue10517. I don't have a standalone test case yet. Problem occurs under mod_wsgi with Python 2.7.2 and thus similarly 3.2 where _PyGILState_Reinit() was also added. The Python code part which triggers the problem is: pid = os.fork() if pid: sys.stderr.write('Fork succeeded (PID=%s)\n' % pid) else: sys.stderr.write('Fork succeeded (child PID=%s)\n' % os.getpid()) time.sleep(60.0) os._exit(0) To trigger the problem requires this code be executed from a thread originally created outside of Python and then calling into a sub interpreter. This thread would have created its own thread state object for the sub interpreter call since auto thread states don't work for sub interpreters. Further, it would never have called into the main interpreter so auto thread state simply doesn't exist for main interpreter. When this thread has a fork() occur and _PyGILState_Reinit() is called, the call of PyGILState_GetThisThreadState() returns NULL because auto thread state for main interpreter was never initialised for this thread. When it then calls into PyThread_set_key_value() it is value of NULL and rather than set it, it thinks internally in find_key() you are doing a get which results in PyThread_set_key_value() returning -1 and so the fatal error. So _PyGILState_Reinit() is broken because it assumes that an auto thread state will always exist for the thread for it to reinit, which will not always be the case. The simple fix may be that if PyGILState_GetThisThreadState() returns NULL then don't do any reinit. Making that change does seem to fix the problem. Code that works then is: void _PyGILState_Reinit(void) { PyThreadState *tstate = PyGILState_GetThisThreadState(); if (tstate) { PyThread_delete_key(autoTLSkey); if ((autoTLSkey = PyThread_create_key()) == -1) Py_FatalError("Could not allocate TLS entry"); /* re-associate the current thread state with the new key */ if (PyThread_set_key_value(autoTLSkey, (void *)tstate) < 0) Py_FatalError("Couldn't create autoTLSkey mapping"); } } Diff file also attached. |
|||
msg145384 - (view) | Author: Graham Dumpleton (grahamd) | Date: 2011-10-12 06:37 | |
Whoops. Missed the error. The fatal error that occurs is: Fatal Python error: Couldn't create autoTLSkey mapping |
|||
msg145385 - (view) | Author: Charles-François Natali (neologix) * | Date: 2011-10-12 07:10 | |
Note that this doesn't apply to default: the problem is that 2.7 and 3.2 don't use native TLS, and with the ad-hoc TLS implementation, a NULL value isn't supported: """ /* Internal helper. * If the current thread has a mapping for key, the appropriate struct key* * is returned. NB: value is ignored in this case! * If there is no mapping for key in the current thread, then: * If value is NULL, NULL is returned. * Else a mapping of key to value is created for the current thread, * and a pointer to a new struct key* is returned; except that if * malloc() can't find room for a new struct key*, NULL is returned. * So when value==NULL, this acts like a pure lookup routine, and when * value!=NULL, this acts like dict.setdefault(), returning an existing * mapping if one exists, else creating a new mapping. """ So PyThread_set_key_value() has different semantics between 2.7/3.2 and default... > So _PyGILState_Reinit() is broken because it assumes that an auto > thread state will always exist for the thread for it to reinit, which > will not always be the case. Hmm... Please see http://docs.python.org/c-api/init.html#non-python-created-threads """ When threads are created using the dedicated Python APIs (such as the threading module), a thread state is automatically associated to them and the code showed above is therefore correct. However, when threads are created from C (for example by a third-party library with its own thread management), they don’t hold the GIL, nor is there a thread state structure for them. If you need to call Python code from these threads (often this will be part of a callback API provided by the aforementioned third-party library), you must first register these threads with the interpreter by creating a thread state data structure, then acquiring the GIL, and finally storing their thread state pointer, before you can start using the Python/C API. When you are done, you should reset the thread state pointer, release the GIL, and finally free the thread state data structure. The PyGILState_Ensure() and PyGILState_Release() functions do all of the above automatically. The typical idiom for calling into Python from a C thread is: """ I think you should be calling call PyGILState_Ensure() before (whoch does associate the thread state to the autoTLS key. I let Antoine answer, he's got much more experience than me. |
|||
msg145386 - (view) | Author: Graham Dumpleton (grahamd) | Date: 2011-10-12 08:04 | |
The PyGILState_Ensure() function is only for when working with the main interpreter. These external threads are not calling into the main interpreter. Because they are external threads, calling PyGILState_Ensure() and then PyGILState_Release() will cause a thread state to be created for the main interpreter, but it will also be destroyed on the PyGILState_Release(). The only way to avoid that situation and ensure that the thread state for the main interpreter is therefore maintained would be to call PyGILState_Ensure() and then call PyThreadState_Swap() to change to thread state for the sub interpreter. Problem is that you aren't supposed to use PyThreadState_Swap() any more and recollect that newer Python 3.X even prohibits it in some way through some checks. So, the documentation you quote is only to do with the main interpreter and is not how things work for sub interpreters. |
|||
msg145387 - (view) | Author: Charles-François Natali (neologix) * | Date: 2011-10-12 08:32 | |
> So, the documentation you quote is only to do with the main > interpreter and is not how things work for sub interpreters. You're right, my bad. However, it would probably be better to destroy/reset the autoTLSkey even if the current thread doesn't have an associated TLS key (to avoid stumbling upon the original libc bug of issue #10517): """ void _PyGILState_Reinit(void) { PyThreadState *tstate = PyGILState_GetThisThreadState(); PyThread_delete_key(autoTLSkey); if ((autoTLSkey = PyThread_create_key()) == -1) Py_FatalError("Could not allocate TLS entry"); /* re-associate the current thread state with the new key */ if (tstate && PyThread_set_key_value(autoTLSkey, (void *)tstate) < 0) Py_FatalError("Couldn't create autoTLSkey mapping"); } """ Now that i think about it, the problem is even simpler: this patch shouldn't have been applied to 2.7 and 3.2, it was only relevant for native pthread TLS implementation (which does allow NULL values). So the solution would be simply to backout this patch on 2.7 and 3.2. |
|||
msg145388 - (view) | Author: Charles-François Natali (neologix) * | Date: 2011-10-12 08:39 | |
> So the solution would be simply to backout this patch on 2.7 and 3.2. Actually, I just checked, and the native TLS implementation is present in 3.2, so this problem shouldn't show up: did you test it with 3.2? AFAICT, this should only affect 2.7 (for which this patch wasn't relevant). |
|||
msg145389 - (view) | Author: Graham Dumpleton (grahamd) | Date: 2011-10-12 09:51 | |
True. Doesn't appear to be an issue with Python 3.2.2. Only Python 2.7.2. I was not aware that the TLS mechanism was changed in Python 3.X so assumed would also affect it. So, looks like the change shouldn't have been applied to Python 2.7. How many moons before Python 2.7.3 though? |
|||
msg145391 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2011-10-12 11:51 | |
> How many moons before Python 2.7.3 though? If you convince python-dev that's it's a critical bug (is it?) I suppose it could happen soon. |
|||
msg145421 - (view) | Author: Roundup Robot (python-dev) | Date: 2011-10-12 19:07 | |
New changeset ee4fe16d9b48 by Charles-François Natali in branch '2.7': Issue #13156: revert changeset f6feed6ec3f9, which was only relevant for native http://hg.python.org/cpython/rev/ee4fe16d9b48 |
|||
msg145424 - (view) | Author: Charles-François Natali (neologix) * | Date: 2011-10-12 20:14 | |
Here's a patch for 3.2 and default which calls PyThread_set_key_value() only if there was an auto thread state previously associated (while the current code works with pthread TLS, there are other implementations which may behave strangely, and there's still the ad-hoc implementation in Python/thread.c). |
|||
msg145429 - (view) | Author: Roundup Robot (python-dev) | Date: 2011-10-12 20:26 | |
New changeset 3313ce92cef7 by Charles-François Natali in branch '2.7': Issue #13156: Add an entry in Misc/NEWS. http://hg.python.org/cpython/rev/3313ce92cef7 |
|||
msg148134 - (view) | Author: Roundup Robot (python-dev) | Date: 2011-11-22 18:52 | |
New changeset ba90839c4993 by Charles-François Natali in branch '3.2': Issue #13156: _PyGILState_Reinit(): Re-associate the auto thread state with the http://hg.python.org/cpython/rev/ba90839c4993 New changeset aa6ce09d2350 by Charles-François Natali in branch 'default': Issue #13156: _PyGILState_Reinit(): Re-associate the auto thread state with the http://hg.python.org/cpython/rev/aa6ce09d2350 |
|||
msg148135 - (view) | Author: Charles-François Natali (neologix) * | Date: 2011-11-22 18:54 | |
Alright, should be fixed now. Graham, thanks for the report! |
|||
msg160761 - (view) | Author: Daniel Franke (dfranke) | Date: 2012-05-15 20:39 | |
Just FYI, I've lately been trying to track down a different threading-related bug, and I think the fix for this one also fixed that one by accident. Leave the following program running for a while under python-2.7.2: ---snip--- import subprocess import threading import random class Forkfuzz(threading.Thread): def run(self): while random.randint(0,100) > threading.active_count(): Forkfuzz().start() p = subprocess.Popen(['/bin/sleep', '1']) p.communicate() if __name__ == "__main__": while True: Forkfuzz().run() ---snip--- Eventually, you'll see a bunch of python process that are deadlocked in the following state: #0 sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:86 #1 0x00000000004e02ad in PyThread_acquire_lock (lock=<value optimized out>, waitflag=<value optimized out>) at Python/thread_pthread.h:321 #2 0x00000000004e0447 in find_key (key=1, value=0x0) at Python/thread.c:268 #3 0x00000000004e053b in PyThread_get_key_value (key=38356064) at Python/thread.c:360 #4 0x00000000004cb2f8 in PyGILState_GetThisThreadState () at Python/pystate.c:598 #5 _PyGILState_Reinit () at Python/pystate.c:547 #6 0x00000000004e43f9 in PyOS_AfterFork () at ./Modules/signalmodule.c:979 ... The problem happens when one thread forks while another thread holds a lock on the TLS mutex. The child process then tries to acquire this mutex, and can never do so because the thread that holds it only exists in the parent. When the same program is run under python-2.7.3, the problem goes away. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:22 | admin | set | github: 57365 |
2013-02-04 17:09:40 | jcea | set | nosy:
+ jcea |
2012-05-15 20:39:19 | dfranke | set | nosy:
+ dfranke messages: + msg160761 |
2011-11-22 18:54:50 | neologix | set | status: open -> closed resolution: fixed messages: + msg148135 stage: resolved |
2011-11-22 18:52:20 | python-dev | set | messages: + msg148134 |
2011-10-12 20:26:27 | python-dev | set | messages: + msg145429 |
2011-10-12 20:14:08 | neologix | set | files:
+ tstate_after_fork.diff messages: + msg145424 |
2011-10-12 19:07:34 | python-dev | set | nosy:
+ python-dev messages: + msg145421 |
2011-10-12 11:51:16 | pitrou | set | messages: + msg145391 |
2011-10-12 09:51:26 | grahamd | set | messages:
+ msg145389 versions: - Python 3.2 |
2011-10-12 08:39:31 | neologix | set | messages: + msg145388 |
2011-10-12 08:33:07 | vstinner | set | nosy:
+ vstinner |
2011-10-12 08:32:30 | neologix | set | messages: + msg145387 |
2011-10-12 08:04:01 | grahamd | set | messages: + msg145386 |
2011-10-12 07:10:03 | neologix | set | messages: + msg145385 |
2011-10-12 06:57:48 | neologix | set | nosy:
+ pitrou |
2011-10-12 06:37:31 | grahamd | set | messages: + msg145384 |
2011-10-12 06:33:46 | grahamd | create |