New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tracebacks eat up memory by holding references to locals and globals when they are not wanted #44031
Comments
Attached is a file which demonstrates an oddity about The observed behaviour is that when the tuple from If you run the test with "s.e = sys.exc_info()" If you uncomment that line, the memory footprint If you uncomment the "gc.collect()" line, the process This was observed in production code, where exc_info |
Logged In: YES Your memory bloat is mostly due to the d = range(100000) line. Python has no problem collecting the cyclic trash,
Note that memory usage in your program remains low and There is no simple-minded way to "repair" this, BTW. For Note that the library reference manual warns against storing For example, no cyclic trash is created if you add this def get_traceback(self):
self.e = sys.exc_info() and inside foo() invoke: s.get_traceback() instead of doing: s.e = sys.exc_info() Is that unreasonable? Perhaps simpler is to define a def get_exc_info():
return sys.exc_info() and inside foo() do: s.e = get_exc_info() No cyclic trash gets created that way either. These are the |
Logged In: YES I have read the exc_info suggestions before, but they have Weakrefs might be slow, I offered them as an alternative to |
Logged In: YES I'm still having problems figuring out what the bug is that |
Logged In: YES The bug is the circular reference which is non-obvious and |
Logged In: YES I disagree that the circular reference is non-obvious. I'm If you are saying that it is unavoidable in your s.e = sys.exc_info()[:2] This would drop the traceback, and thus not create a cyclic As for the time of cleanup not being controllable: you can tim_one: Why do you think your proposed modification of |
Logged In: YES [Martin]
Sorry about that! It was an illusion, of course. I wanted For the OP, I had need last year of capturing a traceback BTW, I must repeat that there is no simple-minded way to |
Logged In: YES A quick grep of the stdlib turns up various uses of |
"This was observed in production code, where exc_info I would like the traceback object so that I can re-raise the error. I can stringify it as tim_one suggests, but that can't be used with 'raise' and 'try','except' later. It is not important for my application to have all the references that the traceback object contains, which is what is causing the massive memory requirement. If I could replace the exc_info()[2] with a traceback look-alike that only held file, line, etc information for printing a standard traceback, that would solve this problem. |
Or, being able to remove the references to the locals and globals from Something like this: try: |
Similar issue: bpo-4034 proposes to be able to set tb.tb_frame=None. |
I wrote a patch to support <traceback object>.tb_frame=None. It works I also tried: ... and it doesn't work because the tbi variable is also removed! A traceback object have to contain the full frame, but the frame |
But a list of strings is not re-raisable. The co_filename, co_name, and |
I tried to remove the frame from the traceback type (to keep only the Lib/unittest.py: Lib/traceback.py: Doc/tools/jinga/debugger.py: Lib/idlelib/StackViewer.py: Lib/logging/init.py: Lib/types.py: (...) co_name/co_filename can be stored directly in the traceback. But what |
Greg Hazel> But a list of strings is not re-raisable Do you need the original traceback? Why not only raising the exception? Eg. import sys
try:
raise Exception("hm!")
except:
t, v, tb = sys.exc_info()
raise v |
STINNER Victor> Do you need the original traceback? Why not only raising If the exception was captured in one stack, and is being re-raised in |
Given comments like "I'm still having problems figuring out what the bug is that you are reporting. (Martin)" and "I must repeat that there is no simple-minded way to 'repair' this. (Tim)", and subsequent changes to Python, should this still be open? If so, for which version(s)? |
This is still an issue. The bug I'm reporting had been explained well, I thought, but I'll repeat it in summary: Traceback raising typically does not use these references at all, so having some way to discard them would be very valuable. This allows storing and passing tracebacks between threads (or coroutines or async tasks) without dying quickly due to memory bloat. The simple-minded way to fix this is to allow the user to break the reference themselves. Fixing this bug would invalidate the need for hacks like the one Twisted has come up with in their twisted.python.Failure object which stringifies the traceback object, making it impossible to re-raise the exception. Failure has a lot of re-implementations of Exceptions and traceback objects as a result. |
I still don't understand the issue. You say that you want a traceback, but then you say you don't want the objects in the traceback. So what *precisely* is it that you want, and what is it that you don't want? In any case, whatever the solution, it is likely a new feature, which aren't acceptable anymore for 2.x release. So please don't target this report for any 2.x version. |
The objects I do want in the traceback are the objects necessary to print a traceback, but not the locals and globals of each frame. For example: def bar():
x = "stuff"
raise Exception("example!")
bar() prints:
Traceback (most recent call last):
Line 4, in <module>
bar()
Line 3, in bar
raise Exception("example!")
Exception: example! There is no reason in that example to have a reference to "x" in the traceback, since it's not used in the output. This becomes important when I try to save a reference to the traceback object and raise it later: import sys
def bar():
x = "stuff"
raise Exception("example!")
try:
bar()
except:
exc_info = sys.exc_info()
def foo(e):
raise e[0], e[1], e[2]
# sometime possibly much later...
foo(exc_info) Traceback (most recent call last):
Line 12, in <module>
foo(exc_info)
Line 6, in <module>
bar()
Line 4, in bar
raise Exception("example!")
Exception: example! During that "sometime possibly much later..." comment, a reference to "x" is held, when it will not be used in printing the traceback later. So, I would not like to keep a reference to "x", and currently there is no way to do that without also dropping a reference to the data needed to print the traceback. |
To call this a bug for tracker purposes, there would have to be a specific discrepancy between doc promise and observed behavior. Every feature request fixes a 'design bug' ;-). |
Excellent. As for twisted, I'm just repeating what I understood of what he said when I asked. It could well be that this feature would help them, I don't know. |
This seems to be about reducing internal resource usage in a way that would be mostly invisible to the normal user. A core surface feature request would have to be put off to 3.3, but I do not see that as such. |
frame.clear() was committed in bpo-17934. |
How should it be used to workaround this issue ("tracebacks eat up memory by holding references to locals and globals when they are not wanted")? We need maybe an helper to clear all frames referenced by a traceback? |
Yes. Something in the traceback module would be fine. |
Here's a patch implementing traceback.clear_tb_frames(). (Feel free to bikeshed about the name.) One more substantial question: the top frame of the traceback is possibly still running. Currently the code skips it by doing an initial 'tb = tb.tb_next'. Would it be better to catch and ignore the RuntimeError |
Yes, I think it would be better. |
Revised version of the patch: catches RuntimeError instead of skipping the first frame; adds versionadded tag; adds entry to NEWS and whatsnew files. |
I would prefer a clear_frames() method on the traceback object rather than |
Looks good to me, thank you. |
I tried to implement the feature as a new traceback.clear_frames() method. I tried to follow the chain of frame objects (using frame.f_back), but it does not work as expected. The method needs to follow the chain of traceback objects (tb.tb_next). So it makes sense to define a function instead of a method (a method usually only affect the object, not a chain of objects). clear-tb-frames-2.txt:
"F.clear(): clear most references held by the frame"); So I suggest a more permissive documentation: "Clear most reference held by frames." |
Actually, the documentation is right: "references" != "locals" ;-) |
I'm happy to change the function name, though I'll note that the traceback module does have print_tb(), format_tb() and extract_tb(). I'm OK with both of Victor's suggestions but personally slightly prefer traceback.clear_frames(tb). Rationale: People who are keeping tracebacks around and want to save memory are probably using other functions from the traceback module, and the module has fairly short names (print_tb, format_exception) so I doubt they'll often do 'from traceback import clear_traceback_frames'. |
Ping! I'd like to change the function name to clear_frames() and then commit this. Antoine or anyone, want to disagree with using clear_frames() as the name? |
clear_frames() sounds fine to me :-) |
New changeset 100606ef02cf by Andrew Kuchling in branch 'default': |
traceback.clear_frames() doesn't really clear all frames: see bpo-31321. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: