msg181045 - (view) |
Author: Stefan Ring (Ringding) |
Date: 2013-01-31 21:39 |
After a fork, the list of thread states contains all the threads from before. It is especially unfortunate that, at least for me, it usually happens that threads created in the forked process reuse the same thread ids, and sys._current_frames will then return wrong stack frames for existing threads because the old entries occur towards the tail of the linked list and overwrite the earlier, correct ones, in _PyThread_CurrentFrames.
The attached patch is certainly not the most complete solution, but it solves my problem.
With Python 2.7.3 I get:
Exception in thread Thread-6:
Traceback (most recent call last):
File "/usr/lib64/python2.7/threading.py", line 551, in __bootstrap_inner
self.run()
File "/usr/lib64/python2.7/threading.py", line 504, in run
self.__target(*self.__args, **self.__kwargs)
File "test-fork-frames.py", line 24, in do_verify
assert frame_from_sys is real_frame
AssertionError
With my patch applied, it passes silently. I can reproduce this on CentOS 6.3 x86_64 as well as Fedora 18 x86_64.
|
msg182897 - (view) |
Author: Charles-François Natali (neologix) *  |
Date: 2013-02-24 21:46 |
Thanks, it's surprising this was never noticed before.
Your patch has two issues:
- it doesn't clear the thread state before deleting it, which would leak the frame, thread-specific dict, etc
- it only clears the thread states after the current thread state: if the current thread is not at the head of the linked list - if it's not the most recently created thread - some thread states won't get cleared
I'm attaching a patch doing the cleanup in PyEval_ReInitThreads(), with test.
|
msg183352 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2013-03-02 23:43 |
hi Charles-François,
> I'm attaching a patch doing the cleanup in PyEval_ReInitThreads(), with
> test.
Did you forget to attach the patch?
|
msg183377 - (view) |
Author: Charles-François Natali (neologix) *  |
Date: 2013-03-03 14:17 |
> Did you forget to attach the patch?
Oops...
|
msg183440 - (view) |
Author: Stefan Ring (Ringding) |
Date: 2013-03-04 12:53 |
When I originally worked on this, I noticed that _PyThread_CurrentFrames also iterates over all interpreters. Because I have no experience with or use for multiple interpreters, I intentionally left it out of my patch, but shouldn't it be taken into account for a real patch?
|
msg183445 - (view) |
Author: Stefan Ring (Ringding) |
Date: 2013-03-04 13:14 |
(Regarding your test)
I have also noticed in the past that joining threads after a fork has caused hangs occasionally, although that might have resulted from the messed up _current_frames.
|
msg184343 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2013-03-16 18:42 |
Isn't the thread state clearing subject to a race condition? PyThreadState_Clear() will release a bunch of frames, deallocating arbitrary objects and therefore potentially releasing the GIL. This lets the main thread run and potentially spawn other threads, which may be wrongly deallocated in the following loop iteration.
A solution would be to detach the thread states from the linked list and clear them afterwards.
|
msg184344 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2013-03-16 19:10 |
Here is an updated patch.
Note that I think this patch could break some programs. For example, if you have a thread in your main process which has a database connection open, deleting the thread state in a child process might shutdown the database connection (depending on the exact protocol). Therefore, I think it would be better to only apply the patch in 3.4.
|
msg186011 - (view) |
Author: Charles-François Natali (neologix) *  |
Date: 2013-04-04 07:57 |
> Here is an updated patch.
_PyThreadState_DeleteExcept uses HEAD_LOCK: ISTM that
PyThreadState_Clear() can trigger arbitrary code execution: if a
thread ends up being created/destroyed, I think we can get a deadlock
when trying to acquire the head lock. I think it should be turned into
an open call if possible.
Also, as noted by Stefan, shouldn't we also iterate over other interpreters?
> Note that I think this patch could break some programs. For example, if you have a thread in your main process which has a database connection open, deleting the thread state in a child process might shutdown the database connection (depending on the exact protocol). Therefore, I think it would be better to only apply the patch in 3.4.
Indeed. For the database example, there's this other issue where the
database connection is stored in a thread-local storage... Some people
will definitely get bitten by this...
|
msg186210 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2013-04-07 13:55 |
> if a
> thread ends up being created/destroyed, I think we can get a deadlock
> when trying to acquire the head lock. I think it should be turned into
> an open call if possible.
How would you do that in a simple way?
> Also, as noted by Stefan, shouldn't we also iterate over other interpreters?
The problem is that, AFAIK, we don't know which thread states of the other interpreters should be kept alive.
|
msg188452 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2013-05-05 18:43 |
Here is an updated patch which doesn't hold the lock while calling PyThreadState_Clear(). It looks like it should be ok. Also, I've added some comments.
|
msg188469 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2013-05-05 21:47 |
New changeset 847692b9902a by Antoine Pitrou in branch 'default':
Issue #17094: Clear stale thread states after fork().
http://hg.python.org/cpython/rev/847692b9902a
|
msg188470 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2013-05-05 21:48 |
I've committed after the small change suggested by Charles-François. Hopefully there aren't any applications relying on the previous behaviour.
|
msg244353 - (view) |
Author: Devin Jeanpierre (Devin Jeanpierre) * |
Date: 2015-05-28 23:47 |
This bug also affects 2.7. The main problem I'm dealing with is "sys._current_frames will then return wrong stack frames for existing threads". One fix to just this would be to change how the dict is created, to keep newer threads rather than tossing them.
Alternatively, we could backport the 3.4 fix.
Thoughts?
|
msg244429 - (view) |
Author: Devin Jeanpierre (Devin Jeanpierre) * |
Date: 2015-05-29 23:09 |
The patch I'm providing with this comment has a ... really hokey test case, and a two line + whitespace diff for pystate.c . The objective of the patch is only to have _current_frames report the correct frame for any live thread. It continues to report dead threads' frames, up until they would conflict with a live thread. IMO it's the minimal possible fix for this aspect of the bug, and suitable for 2.7.x. Let me know what you think.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:41 | admin | set | github: 61296 |
2015-05-29 23:09:14 | Devin Jeanpierre | set | files:
+ _current_frames_27_setdefault.diff
messages:
+ msg244429 |
2015-05-28 23:47:23 | Devin Jeanpierre | set | nosy:
+ Devin Jeanpierre messages:
+ msg244353
|
2013-05-05 21:48:15 | pitrou | set | status: open -> closed versions:
- Python 2.7, Python 3.3 messages:
+ msg188470
resolution: fixed stage: patch review -> resolved |
2013-05-05 21:47:17 | python-dev | set | nosy:
+ python-dev messages:
+ msg188469
|
2013-05-05 18:43:32 | pitrou | set | files:
+ tstates-afterfork2.patch
messages:
+ msg188452 |
2013-04-07 13:55:09 | pitrou | set | messages:
+ msg186210 |
2013-04-04 07:57:23 | neologix | set | messages:
+ msg186011 |
2013-04-03 21:23:26 | pitrou | set | stage: patch review versions:
- Python 3.2 |
2013-03-17 14:51:26 | vstinner | set | nosy:
+ vstinner
|
2013-03-16 19:10:02 | pitrou | set | files:
+ tstates-afterfork.patch
messages:
+ msg184344 |
2013-03-16 18:42:47 | pitrou | set | messages:
+ msg184343 |
2013-03-04 13:14:48 | Ringding | set | messages:
+ msg183445 |
2013-03-04 12:53:26 | Ringding | set | messages:
+ msg183440 |
2013-03-03 14:17:11 | neologix | set | files:
+ tstate_after_fork.diff
messages:
+ msg183377 |
2013-03-02 23:43:05 | pitrou | set | messages:
+ msg183352 |
2013-02-24 21:46:09 | neologix | set | messages:
+ msg182897 |
2013-02-10 18:40:17 | pitrou | set | nosy:
+ pitrou, neologix
versions:
+ Python 3.2, Python 3.3, Python 3.4 |
2013-01-31 21:40:35 | Ringding | set | files:
+ test-fork-frames.py |
2013-01-31 21:39:23 | Ringding | create | |