msg184515 - (view) |
Author: Anssi Kääriäinen (Anssi.Kääriäinen) |
Date: 2013-03-18 20:34 |
A generator is leaked to gc.garbage in a situation where `__del__` isn't defined. See the attached file for as-minimalistic test case as I could make. Tested on Python 3.3.0, 3.2.3 and 2.7.3.
Note that if the try-except is removed from iterator(), then there is no leak.
This is related to Django bug #19895 (https://code.djangoproject.com/ticket/19895).
|
msg184635 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * |
Date: 2013-03-19 14:32 |
Well, there is a __del__: the one from the generator function. And it's part of the cycle.
|
msg184671 - (view) |
Author: Anssi Kääriäinen (Anssi.Kääriäinen) |
Date: 2013-03-19 20:28 |
I am trying to read the code, and it seems objects of type generator are uncollectable if the code of the generator has a block of other type than SETUP_LOOP. I can see how try-finally for example in the generator would cause problems - the finally might reference objects which are already collected. But plain try-except? What is the problem with that? (Not saying there isn't one, I am new to the code...).
I tried to change the code to also allow collecting generators with plain try-except (add in a check for SETUP_EXCEPTION in addition to SETUP_LOOP in PyGen_NeedsFinalizing()). This resolves the leak, and all tests pass. I can't tell if it is actually safe to return 0 in SETUP_EXCEPTION case.
After the change try-finally will still leak but that seems correct.
Maybe this limitation should be documented? I think I understand now why finally or with statement inside a cyclic referenced generator will end up in gc.garbage. However the docs tell that only objects with __del__ defined will leak. From user perspective generator doesn't have __del__, at least it is not to be found with pdb. Generators have potentially unsafe finalizer code but that isn't technically equivalent to having __del__ method defined as far as I can tell.
|
msg187354 - (view) |
Author: Benjamin Peterson (benjamin.peterson) * |
Date: 2013-04-19 12:33 |
Generators exit using GeneratorExit, which you could possible catch.
|
msg187359 - (view) |
Author: Anssi Kääriäinen (Anssi.Kääriäinen) |
Date: 2013-04-19 13:51 |
True, except GeneratorExit will run at garbage collection time and this will result in reference cycle problems. Checking if there is no except GeneratorExit clause might be too complicated.
I still think this is worth a brief note in the gc docs. The gc docs talk only about __del__ methods, and there isn't one defined for the example generator. The generator does have something technically equivalent, but I don't spot any actual __del__ methods in the reference loop.
A docs mention will not help much in avoiding this problem, but it will at least make the code behaviour equivalent to documentation. A mention in the gc docs would have helped me when trying to debug the leak in Django.
|
msg187360 - (view) |
Author: Benjamin Peterson (benjamin.peterson) * |
Date: 2013-04-19 13:54 |
A documentation patch sounds good to me.
|
msg187367 - (view) |
Author: Alyssa Coghlan (ncoghlan) * |
Date: 2013-04-19 16:03 |
Adding Guido because this appears to be due to a longstanding difference between the handling of tp_del and most other slots
Specifically, the reason for the odd behaviour appears to be that generator objects define tp_del [1] (which is what the cyclic gc *really* looks for), but while defining __del__ in Python will populate tp_del in the type slots, defining tp_del directly in a builtin or extension type doesn't create an exposed __del__ at the Python level (there's no wrapper function identified in the slot definition).
So, at the very least, the fact that builtin and extension types can define tp_del without creating a visible __del__ method needs to be documented as a CPython implementation detail.
However, I'm not sure we actually have a good *reason* for tp_del failing to generate __del__ automatically - it's been that way since Guido implemented it [2], but Guido's comment in that commit message about not needing a wrapper because types written in C can't define tp_del is clearly no longer accurate.
[1] http://hg.python.org/cpython/file/2.7/Objects/genobject.c#l328
[2] http://hg.python.org/cpython/annotate/71dca7c76fa2/Objects/typeobject.c#3955
|
msg187369 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-04-19 16:25 |
With the advent of yield-based asynchronous programming, it is going to be problematic if a generator caught in a reference cycle can create a memory leak.
|
msg187372 - (view) |
Author: Benjamin Peterson (benjamin.peterson) * |
Date: 2013-04-19 16:33 |
I think the creation of __del__ wrappers for extension types is separate from this issue of generator finalization.
|
msg187374 - (view) |
Author: Benjamin Peterson (benjamin.peterson) * |
Date: 2013-04-19 16:37 |
Yes. Hopefully, the async framework using generators can properly can close() on them, though.
|
msg187375 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-04-19 16:39 |
> Yes. Hopefully, the async framework using generators can properly can
> close() on them, though.
With yield from-based generators, you don't need a trampoline anymore,
so the close() call is now left to the application developers (if I'm
not mistaken).
|
msg187403 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2013-04-20 02:59 |
Unless I'm the only person on earth who can understand this I beg to be left out of this bug. I just don't have the bandwidth right now to look into it. If it's really about GC and __del__ and generators, maybe Tim Peters would be willing to look into it? Or perhaps Neil Schemenauer, who IIRC did a lot of the early work on GC.
|
msg187408 - (view) |
Author: Alyssa Coghlan (ncoghlan) * |
Date: 2013-04-20 06:30 |
I'll create a separate issue for the tp_del -> __del__ question, since that's a language design decision that *does* need Guido's input, but doesn't relate to the broader question of generators, cycles and potential memory leaks.
|
msg187410 - (view) |
Author: Alyssa Coghlan (ncoghlan) * |
Date: 2013-04-20 06:40 |
Issue 17800 is anyone wants to weigh in on the tp_del -> __del__ question (I ended up not adding Guido back to that one either, since the original design was based on an assumption that's now demonstrably false, so it makes sense to update the behaviour)
|
msg187413 - (view) |
Author: Anssi Kääriäinen (Anssi.Kääriäinen) |
Date: 2013-04-20 09:44 |
I wonder if it would be better to reword the garbage collection docs to mention that Python can't collect objects if they are part of a reference cycle, and some of the objects in the reference cycle need to run code at gc time. Then mention that such objects include objects with __del__ and also generators if they do have some other blocks than loops in them (for example try or with blocks).
To me it seems this issue could be mitigated somewhat by collecting the objects if there is just one object with finalizer code in the cycle. It should be safe to run the single finalizer, then collect the whole cycle, right?
|
msg187415 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-04-20 11:31 |
> I wonder if it would be better to reword the garbage collection docs
> to mention that Python can't collect objects if they are part of a
> reference cycle, and some of the objects in the reference cycle need
> to run code at gc time. Then mention that such objects include objects
> with __del__ and also generators if they do have some other blocks
> than loops in them (for example try or with blocks).
>
> To me it seems this issue could be mitigated somewhat by collecting
> the objects if there is just one object with finalizer code in the
> cycle. It should be safe to run the single finalizer, then collect the
> whole cycle, right?
This is tricky because it relies on the fact that the object with
finalizer will be collected before any objects the finalizer relies on.
For a generator, this means it has to be finalized before its frame
object is cleared (otherwise, the finally block can't execute
correctly).
However, the GC doesn't collect those objects directly. It calls
tp_clear on each of them, hoping that tp_clear will break the reference
cycle (and at the same time collect some of the objects in the cycle).
But it's difficult to influence which objects are collected first.
In gcgen.py's case, the reference cycle is comprised of the MyObj
instance, the generator, and the generator's frame:
`self` (MyObj) -> generator -> frame -> `self` (MyObj)
So if `self` is collected first, it will collect the generator before
the frame, and the generator's finally block can execute fine.
But if the frame is collected first, it will clear itself and it will be
too late for the generator's finally block to execute.
And if the generator is collected first... well, it can't, as it doesn't
have a tp_clear slot; but if it had, that would clear the frame and make
the finally block uncallable. One might argue that a generator's
tp_clear should call the finally block *before* clearing the frame.
|
msg187419 - (view) |
Author: Anssi Kääriäinen (Anssi.Kääriäinen) |
Date: 2013-04-20 12:19 |
I was imagining that the collection should happen in two passes. First check for tp_dels and call them if safe (single tp_del in cycle allowed). Then free the memory. The first pass is already there, it just doesn't collect callable tp_dels.
If it would be possible to turn an object into inaccessible state after tp_del was called it would be possible to call multiple tp_dels in a cycle. If the del tries to use already deleted object you will get some sort of runtime exception indicating access of already collected object (another option is to allow access to object which has already had __del__ called, but that seems problematic).
Now, both of the above are likely way more complicated to implement than what I imagine. I have very little knowledge of the technical details involved.
If I understand correctly your idea was to do something similar to above, but only for generators.
The current situation is somewhat ugly. First, I can imagine people wishing to do try-finally or something "with self.lock:" in a generator. These will leak in circular reference cases. The generator case is surprising, so surprising that even experienced programmers will do such mistakes (see the django ticket for one example). Second, from application programmers perspective the fact that __del__ wasn't called is usually similar to having a silent failure in their code - for example this can result in leaking database connections or other resources, not to mention possible problems if one has a "with self.lock:" block in a generator.
|
msg187423 - (view) |
Author: Benjamin Peterson (benjamin.peterson) * |
Date: 2013-04-20 13:19 |
In a sense, doing something like "with self.lock" in a generator is already a leak. Even if there wasn't a cycle, collection could be arbitrarily delayed (in partincular on non-CPython VMs). I wonder if making generators context managers which call close() on exit would help.
|
msg187424 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-04-20 13:27 |
Those are two different issues:
- not calling the finalizer in a timely manner
- never calling the finalizer and creating a memory leak through
gc.garbage
|
msg187425 - (view) |
Author: Benjamin Peterson (benjamin.peterson) * |
Date: 2013-04-20 13:33 |
I realize, but if people were responsible and closed their generators, the second one would be as much of a problem.
|
msg187431 - (view) |
Author: Alyssa Coghlan (ncoghlan) * |
Date: 2013-04-20 14:55 |
We can't make ordinary generators innately context managers, as it makes the error too hard to detect when you accidentally leave out @contextmanager when using a generator to write a custom one.
You can already use contextlib.closing to forcibly close them when appropriate, so providing a decorator to implicitly map __exit__ to close wouldn't really save much.
|
msg187434 - (view) |
Author: Alyssa Coghlan (ncoghlan) * |
Date: 2013-04-20 15:12 |
To get back to Anssi's original suggestion... I think Anssi's proposal to allow finalisation to be skipped for try/except/else is legitimate. It's only finally clauses that we try to guarantee will execute, there's no such promise implied for ordinary except clauses.
We *don't care* if the generator *would* have caught the thrown GeneratorExit, we only care about ensuring that finally blocks are executed (including those implied by with statements). So if there aren't any finally clauses or with statements in the block stack, we should be able to just let the generator and frame get collected (as Anssi suggested), without trying to allow execution to complete.
|
msg187437 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-04-20 15:53 |
> We *don't care* if the generator *would* have caught the thrown
> GeneratorExit, we only care about ensuring that finally blocks are
> executed (including those implied by with statements). So if there
> aren't any finally clauses or with statements in the block stack, we
> should be able to just let the generator and frame get collected (as
> Anssi suggested), without trying to allow execution to complete.
That's a good point.
I'm also contemplating that the generator close() could be done from the
*frame*'s tp_clear, which would sidestep the issue of collect order
entirely: the finally block doesn't actually need the generator to
execute, it only needs the frame and its locals.
|
msg187483 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-04-21 01:38 |
I've opened issue17807 for an alternative generator cleanup scheme which solves the present issue.
|
msg188854 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-05-10 17:39 |
The issue17807 patch has now been committed and should solve this issue. I would like to encourage anyone to test this change, as it is quite a significant departure from the previous semantics:
http://mail.python.org/pipermail/python-dev/2013-May/126102.html
|
msg253331 - (view) |
Author: Vikas (Vikas) |
Date: 2015-10-22 14:18 |
I see this comment :
>> When iteration over a queryset raised an exception, the result cache remained initialized with an empty list, so subsequent iterations returned an empty list instead of raising an exception>>
What if we catch the exceptions? Will that cause not to leak memory?
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:43 | admin | set | github: 61670 |
2015-10-22 14:18:14 | Vikas | set | nosy:
+ Vikas messages:
+ msg253331
|
2013-05-10 17:39:50 | pitrou | set | status: open -> closed superseder: Generator cleanup without tp_del messages:
+ msg188854
assignee: docs@python -> components:
+ Interpreter Core, - Documentation resolution: duplicate |
2013-04-21 04:48:14 | ncoghlan | unlink | issue17800 dependencies |
2013-04-21 01:38:59 | pitrou | set | messages:
+ msg187483 |
2013-04-20 16:01:18 | ncoghlan | link | issue17800 dependencies |
2013-04-20 15:53:29 | pitrou | set | messages:
+ msg187437 |
2013-04-20 15:12:20 | ncoghlan | set | messages:
+ msg187434 |
2013-04-20 14:55:34 | ncoghlan | set | messages:
+ msg187431 |
2013-04-20 13:33:36 | benjamin.peterson | set | messages:
+ msg187425 |
2013-04-20 13:27:23 | pitrou | set | messages:
+ msg187424 |
2013-04-20 13:19:43 | benjamin.peterson | set | messages:
+ msg187423 |
2013-04-20 12:19:32 | Anssi.Kääriäinen | set | messages:
+ msg187419 |
2013-04-20 11:31:12 | pitrou | set | messages:
+ msg187415 |
2013-04-20 09:44:53 | Anssi.Kääriäinen | set | messages:
+ msg187413 |
2013-04-20 06:40:14 | ncoghlan | set | messages:
+ msg187410 |
2013-04-20 06:30:49 | ncoghlan | set | messages:
+ msg187408 |
2013-04-20 03:23:57 | benjamin.peterson | set | nosy:
- gvanrossum
|
2013-04-20 02:59:21 | gvanrossum | set | messages:
+ msg187403 |
2013-04-19 18:11:30 | isoschiz | set | nosy:
+ pconnell
|
2013-04-19 16:39:47 | pitrou | set | messages:
+ msg187375 |
2013-04-19 16:37:42 | benjamin.peterson | set | messages:
+ msg187374 |
2013-04-19 16:33:33 | benjamin.peterson | set | messages:
+ msg187372 |
2013-04-19 16:25:59 | pitrou | set | nosy:
+ pitrou messages:
+ msg187369
|
2013-04-19 16:03:24 | ncoghlan | set | nosy:
+ gvanrossum
messages:
+ msg187367 versions:
+ Python 3.4 |
2013-04-19 13:54:13 | benjamin.peterson | set | nosy:
+ docs@python messages:
+ msg187360
assignee: docs@python components:
+ Documentation type: resource usage -> behavior |
2013-04-19 13:51:43 | Anssi.Kääriäinen | set | messages:
+ msg187359 |
2013-04-19 12:33:04 | benjamin.peterson | set | messages:
+ msg187354 |
2013-04-19 09:06:47 | pitrou | set | nosy:
+ ncoghlan, benjamin.peterson
|
2013-04-18 22:28:47 | isoschiz | set | nosy:
+ isoschiz
|
2013-03-28 01:17:53 | vstinner | set | nosy:
+ vstinner
|
2013-03-19 20:28:18 | Anssi.Kääriäinen | set | messages:
+ msg184671 |
2013-03-19 14:32:08 | amaury.forgeotdarc | set | nosy:
+ amaury.forgeotdarc messages:
+ msg184635
|
2013-03-18 20:34:25 | Anssi.Kääriäinen | create | |