classification
Title: sys._current_frames() reports too many/wrong stack frames
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Ringding, haypo, neologix, pitrou, python-dev
Priority: normal Keywords: patch

Created on 2013-01-31 21:39 by Ringding, last changed 2013-05-05 21:48 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
current-frames-cleanup.patch Ringding, 2013-01-31 21:39 Minimal patch review
test-fork-frames.py Ringding, 2013-01-31 21:40 Test program
tstate_after_fork.diff neologix, 2013-03-03 14:17
tstates-afterfork.patch pitrou, 2013-03-16 19:10 review
tstates-afterfork2.patch pitrou, 2013-05-05 18:43 review
Messages (13)
msg181045 - (view) Author: Stefan Ring (Ringding) Date: 2013-01-31 21:39
After a fork, the list of thread states contains all the threads from before. It is especially unfortunate that, at least for me, it usually happens that threads created in the forked process reuse the same thread ids, and sys._current_frames will then return wrong stack frames for existing threads because the old entries occur towards the tail of the linked list and overwrite the earlier, correct ones, in _PyThread_CurrentFrames.

The attached patch is certainly not the most complete solution, but it solves my problem.

With Python 2.7.3 I get:

Exception in thread Thread-6:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/threading.py", line 551, in __bootstrap_inner
    self.run()
  File "/usr/lib64/python2.7/threading.py", line 504, in run
    self.__target(*self.__args, **self.__kwargs)
  File "test-fork-frames.py", line 24, in do_verify
    assert frame_from_sys is real_frame
AssertionError

With my patch applied, it passes silently. I can reproduce this on CentOS 6.3 x86_64 as well as Fedora 18 x86_64.
msg182897 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-02-24 21:46
Thanks, it's surprising this was never noticed before.
Your patch has two issues:
- it doesn't clear the thread state before deleting it, which would leak the frame, thread-specific dict, etc
- it only clears the thread states after the current thread state: if the current thread is not at the head of the linked list - if it's not the most recently created thread - some thread states won't get cleared

I'm attaching a patch doing the cleanup in PyEval_ReInitThreads(), with test.
msg183352 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-03-02 23:43
hi Charles-François,

> I'm attaching a patch doing the cleanup in PyEval_ReInitThreads(), with 
> test.

Did you forget to attach the patch?
msg183377 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-03-03 14:17
> Did you forget to attach the patch?

Oops...
msg183440 - (view) Author: Stefan Ring (Ringding) Date: 2013-03-04 12:53
When I originally worked on this, I noticed that _PyThread_CurrentFrames also iterates over all interpreters. Because I have no experience with or use for multiple interpreters, I intentionally left it out of my patch, but shouldn't it be taken into account for a real patch?
msg183445 - (view) Author: Stefan Ring (Ringding) Date: 2013-03-04 13:14
(Regarding your test)
I have also noticed in the past that joining threads after a fork has caused hangs occasionally, although that might have resulted from the messed up _current_frames.
msg184343 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-03-16 18:42
Isn't the thread state clearing subject to a race condition? PyThreadState_Clear() will release a bunch of frames, deallocating arbitrary objects and therefore potentially releasing the GIL. This lets the main thread run and potentially spawn other threads, which may be wrongly deallocated in the following loop iteration.

A solution would be to detach the thread states from the linked list and clear them afterwards.
msg184344 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-03-16 19:10
Here is an updated patch.

Note that I think this patch could break some programs. For example, if you have a thread in your main process which has a database connection open, deleting the thread state in a child process might shutdown the database connection (depending on the exact protocol). Therefore, I think it would be better to only apply the patch in 3.4.
msg186011 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-04-04 07:57
> Here is an updated patch.

_PyThreadState_DeleteExcept uses HEAD_LOCK: ISTM that
PyThreadState_Clear() can trigger arbitrary code execution: if a
thread ends up being created/destroyed, I think we can get a deadlock
when trying to acquire the head lock. I think it should be turned into
an open call if possible.

Also, as noted by Stefan, shouldn't we also iterate over other interpreters?

> Note that I think this patch could break some programs. For example, if you have a thread in your main process which has a database connection open, deleting the thread state in a child process might shutdown the database connection (depending on the exact protocol). Therefore, I think it would be better to only apply the patch in 3.4.

Indeed. For the database example, there's this other issue where the
database connection is stored in a thread-local storage... Some people
will definitely get bitten by this...
msg186210 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-04-07 13:55
> if a
> thread ends up being created/destroyed, I think we can get a deadlock
> when trying to acquire the head lock. I think it should be turned into
> an open call if possible.

How would you do that in a simple way?

> Also, as noted by Stefan, shouldn't we also iterate over other interpreters?

The problem is that, AFAIK, we don't know which thread states of the other interpreters should be kept alive.
msg188452 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-05 18:43
Here is an updated patch which doesn't hold the lock while calling PyThreadState_Clear(). It looks like it should be ok. Also, I've added some comments.
msg188469 - (view) Author: Roundup Robot (python-dev) Date: 2013-05-05 21:47
New changeset 847692b9902a by Antoine Pitrou in branch 'default':
Issue #17094: Clear stale thread states after fork().
http://hg.python.org/cpython/rev/847692b9902a
msg188470 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-05 21:48
I've committed after the small change suggested by Charles-François. Hopefully there aren't any applications relying on the previous behaviour.
History
Date User Action Args
2013-05-05 21:48:15pitrousetstatus: open -> closed
versions: - Python 2.7, Python 3.3
messages: + msg188470

resolution: fixed
stage: patch review -> resolved
2013-05-05 21:47:17python-devsetnosy: + python-dev
messages: + msg188469
2013-05-05 18:43:32pitrousetfiles: + tstates-afterfork2.patch

messages: + msg188452
2013-04-07 13:55:09pitrousetmessages: + msg186210
2013-04-04 07:57:23neologixsetmessages: + msg186011
2013-04-03 21:23:26pitrousetstage: patch review
versions: - Python 3.2
2013-03-17 14:51:26hayposetnosy: + haypo
2013-03-16 19:10:02pitrousetfiles: + tstates-afterfork.patch

messages: + msg184344
2013-03-16 18:42:47pitrousetmessages: + msg184343
2013-03-04 13:14:48Ringdingsetmessages: + msg183445
2013-03-04 12:53:26Ringdingsetmessages: + msg183440
2013-03-03 14:17:11neologixsetfiles: + tstate_after_fork.diff

messages: + msg183377
2013-03-02 23:43:05pitrousetmessages: + msg183352
2013-02-24 21:46:09neologixsetmessages: + msg182897
2013-02-10 18:40:17pitrousetnosy: + pitrou, neologix

versions: + Python 3.2, Python 3.3, Python 3.4
2013-01-31 21:40:35Ringdingsetfiles: + test-fork-frames.py
2013-01-31 21:39:23Ringdingcreate