Issue9141
Created on 2010-07-02 11:30 by krisvale, last changed 2010-07-31 17:37 by dstanek.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| gc_collectable.patch | krisvale, 2010-07-02 11:30 | review | ||
| Messages (18) | |||
|---|---|---|---|
| msg109099 - (view) | Author: Kristján Valur Jónsson (krisvale) * | Date: 2010-07-02 11:30 | |
The GC module currently relies only on the presence of a __del__ method to decide if an object can be safely collected at all. Also, there is a special case for generator objects. This patch allows any object to call an api during its traversal, PyObject_GC_Collectable(), to indicate whether it is fit to be collected at this time, overriding any presence of a __del__ method or not. This mechanism is being put in place in stackless python 2.7 because tasklets cannot always be collected depending on their runtime state, and I thought this might be a useful addition for regular python, especially since there already is such a dynamic special case for generator objects. |
|||
| msg109100 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2010-07-02 11:40 | |
First, what's the use case? Just backporting an internal Stackless API isn't a reasonable request. Second, this tells that there *is* a finalization function, but not what the function is. How is the GC supposed to find it? Third, the implementation looks quite suboptimal. Better define a new slot method, such as tp_needs_finalizing, or tp_get_finalizer. |
|||
| msg109112 - (view) | Author: Kristján Valur Jónsson (krisvale) * | Date: 2010-07-02 13:59 | |
1. The use case is any C extension that may need to run non-trivial code when being deleted, but where this can not be statically known. An example in the code today is generators, and the PyGen_NeedsFinalizing() function, doing exactly this. This is an attempt to make this functionality available to any C extension. 2. No, this gives a C extension the oppertunity to say: It's okay to delete me from GC, or it's not okay to delete me from GC. There is no need to specify any particular finalizing function, it will be invoked when the refcount goes to 0, just as it happens with __del__(), or as it happens with PyGen objects. 3. This code is only invoked for garbage deemed collectable. As such it is not on any critical path, most gc collections don't actually find any garbage. The cost of a few extra indirect function calls is likely to drown in the memory allocator overhead when the objects are released. Also, this is designed to be a minimum impact patch. We really shouldn't be creating new slot methods when it can be avoided. |
|||
| msg109113 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2010-07-02 14:04 | |
> 1. The use case is any C extension that may need to run non-trivial > code when being deleted, but where this can not be statically known. tp_del is IMO a bad place to do it. I'd recommend tp_dealloc instead, precisely so that you don't end up with uncollectable objects tied to internal OS structures or other non-trivial resources. (also, tp_del seems to have problems with subclassing. I don't remember the specifics) > 3. This code is only invoked for garbage deemed collectable. As such > it is not on any critical path, most gc collections don't actually > find any garbage. The cost of a few extra indirect function calls is > likely to drown in the memory allocator overhead when the objects are > released. Ok. |
|||
| msg109226 - (view) | Author: Kristján Valur Jónsson (krisvale) * | Date: 2010-07-04 15:24 | |
I don't understand the tp_del / tp_dealloc argument here. This is really not the concern of gc. Gc only goes and finds unreachable objects, and if they are deemend collectable, makes them go away by clearing their references. GC only cares that during the final release of any objects in a reference cycle, no non-trivial code may run (for various technical reasons). So, objects deemed "sensitive" in this way, are left alone (and put in gc.garbage instead). GC has no other knowledge of how objects behave, if or when any finalizers are called. The only thing gc does is calling Py_DECREF (through the tp_clear slot). Everything else is up to the objects. The intent of this patch is for objects themselves to be able to provide better information to GC as to whether they are too "sensitive" to be cleared by GC, rather than GC relying solely on the presence of a __del__ method, or using the builtin special case knowledbe for PyGen objects. Whether finalizers are called from tp_del or tp_dealloc is beyond the scope of the gc module or, indeed, this patch. |
|||
| msg109229 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2010-07-04 15:34 | |
> GC only cares that during the final release of any objects in a > reference cycle, no non-trivial code may run (for various technical > reasons). So, objects deemed "sensitive" in this way, are left alone > (and put in gc.garbage instead). Which is really the problem, and why you should prefer tp_dealloc over tp_del. (unless you want gc.garbage to fill up too easily) |
|||
| msg109240 - (view) | Author: Kristján Valur Jónsson (krisvale) * | Date: 2010-07-04 17:11 | |
I'm confused, Antoine. _I_ am not preferring anything. I am not modifying the way finalizers are called. I'm merely providing a way for objects to signal that they _have_ (or don't have) non-trivial finalizers. A finalizer _may_ be tp_del, but it may be any code that is called as part of tp_dealloc(). This patch allows an object to say: Calling tp_dealloc() is currently unsafe from a gc.collect() run. |
|||
| msg109241 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2010-07-04 17:17 | |
> A finalizer _may_ be tp_del, but it may be any code that is called as > part of tp_dealloc(). This patch allows an object to say: Calling > tp_dealloc() is currently unsafe from a gc.collect() run. Well, I don't know what you mean. tp_dealloc should always be "safe" and is never checked for by the GC. By the way, tp_del isn't even documented, which reinforces my point that it's not really for use by third-party extension types. I think we are reaching the point where, if you want this to happen, you should ask for opinions on python-dev, because I really don't understand which problem you are trying to solve. |
|||
| msg109247 - (view) | Author: Kristján Valur Jónsson (krisvale) * | Date: 2010-07-04 17:56 | |
tp_del() is generally invoked by tp_dealloc(). See for example typeobject.c:849. gc.collect() never invokes tp_del() directy, that is left for the objects themselves (as part of tp_dealloc()) Now, gc already knows that if tp_del is present, it cannot cause break the cycle, (for various technical reasons, one of them being that tp_del would be invoked with various dependency objects already nerfed by tp_clear()). But this is not always enough: 1) it may be too pessimistic. Sometimes the tp_del method doesn't do anything significant, or isn't even called, depending on runtime state, so it is safe for gc to clear the object (see for example genobject.c:30). 2) Sometimes finalization behaviour is not defined by a tp_del method at all, but rather some custom code in tp_dealloc. All this patch does, is to generalize the mechanism already provided for genobject.c (by PyGen_NeedsFinalizing()), to any object: Any object can signal to gc that: a) it is ok to collect a cycle with me in it, or b) no, it is unsafe to do so. With this patch in place, PyGen_NeedsFinalizing() no longer needs to be a special case in gcmodule.c. Yes, I'll send a message to python-dev. |
|||
| msg109248 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2010-07-04 17:59 | |
> All this patch does, is to generalize the mechanism already provided > for genobject.c (by PyGen_NeedsFinalizing()), to any object: Any > object can signal to gc that: a) it is ok to collect a cycle with me > in it, or b) no, it is unsafe to do so. With this patch in place, > PyGen_NeedsFinalizing() no longer needs to be a special case in > gcmodule.c. Adding an API function and an additional traversal pass to the GC just for the sake of removing a special case doesn't seem reasonable to me. That's why I'm asking what specific problem you are trying to solve. (rather than simply trying to make things "nicer") |
|||
| msg109265 - (view) | Author: Daniel Stutzbach (stutzbach) ![]() |
Date: 2010-07-04 21:01 | |
Kristján, Can you compare and contrast your approach with calling PyObject_GC_UnTrack and PyObject_GC_Track to mark the object as uncollectable/collectable? |
|||
| msg109802 - (view) | Author: Kristján Valur Jónsson (krisvale) * | Date: 2010-07-09 22:41 | |
Hi Antoine. I am not adding another traversal pass. I am modifying the function that gets called for every object about to be collected, to call the object's traversal function. This is hardly an extra "pass" and is done only for objects that are about to be collected (i.e. only in the rare case). The particular case is this: I have objects (tasklets) that depending on their runtime state, need to run a finalizer or not. If not, they can be collected. If they do, then they end up on gc.garbage. This is exactly the same case as with generators. But instead of adding yet another special case into stackless python, I realized that it may be up to objects' runtime state whether they need to run a finalizer or not. So, I wrote a more generic way and decided to contribute it to C Python, so that a future developer, having a C extension that only sometimes needs to run a finalizer, could have a way to cleanly deal with garbage collection. This is twice now that this problem has been seen in python. The first time, a special case was coded into gcmodule. This second time, I'm offering a more generic solution. Now, I have a number of useful improvements and additions to C python sitting in our repository, the result of over 7 years work with Python at CCP games. I have another one sitting and waiting that adds useful functionality to gcmodule. Useful in my opinion anyway, and worth at least being kept for posterity as a patch in the tracker. But since these patches of mine seem to be met with rather limited enthusiasm, I wonder if I should be bothering. |
|||
| msg109803 - (view) | Author: Kristján Valur Jónsson (krisvale) * | Date: 2010-07-09 22:48 | |
Hi Daniel.
Your message was classified as spam, I have no idea why, but this is why I only noticed it now.
Your suggestion is interesting, I hadn't thought of that. Yes, it is possible to use the track/untrack functions (I think), but that would mean that you would have to monitor your object for every state change and reliably detect the transition from one state to another. Being able to query the current state and let gc know: "No, I cannot be collected as I am now" is a much more robust solution from the programmers perspective.
A further difference is this: If an object isn't tracked, it won't be collected if it is part of a cycle, but it will not be put in gc.garbage either. In effect, it will just remain in memory, unreachable, with no chance of it ever being released.
In contrast, a generator (or in my case, a tasklet) that needs to have a finalizer run when it goes away, will end up in gc.garbage, which a dilligent application will visit once in a while in order to clean it out. I'm not sure how you would clear out a generator like that, but in Stackless python, you could do something like this once in a while:
for g in gc.garbage:
if isinstance(g, stackless.tasklet):
g.kill()
del gc.garbage[:]
|
|||
| msg109804 - (view) | Author: Antoine Pitrou (pitrou) * ![]() |
Date: 2010-07-09 22:58 | |
> The particular case is this: I have objects (tasklets) that depending > on their runtime state, need to run a finalizer or not. If not, they > can be collected. If they do, then they end up on gc.garbage. But why don't you try using tp_dealloc as I suggested instead? It will allow you to run a finalizer in all cases, without anything ending up in gc.garbage. It's what we do in the new io lib. Also, I'm still worried that the mechanism you're proposing is a bit quirky. I hope other developers can give opinions or even suggestions. > But since these patches of mine seem to be met with rather limited > enthusiasm, I wonder if I should be bothering. Well, the GC is a rather delicate affair and moreover it doesn't have a dedicated maintainer. That doesn't mean your patches are not interesting, but I think everyone tries to be very careful with that area of the code. |
|||
| msg109807 - (view) | Author: Raymond Hettinger (rhettinger) * ![]() |
Date: 2010-07-09 23:16 | |
> I'm merely providing a way for objects to signal > that they _have_ (or don't have) non-trivial finalizers. +1 This is useful and non-hackish way to communicate with GC and solves a problem that doesn't otherwise have a clean solution. |
|||
| msg109810 - (view) | Author: Daniel Stutzbach (stutzbach) ![]() |
Date: 2010-07-10 00:33 | |
2010/7/9 Kristján Valur Jónsson <report@bugs.python.org> > Your message was classified as spam, I have no idea why, but this is why I > only noticed it now. > Yes, I just noticed that tonight as well. I filed a bug on the meta-tracker in the hopes that someone can dig into the underlying cause. > Your suggestion is interesting, I hadn't thought of that. Yes, it is > possible to use the track/untrack functions (I think), but that would mean > that you would have to monitor your object for every state change and > reliably detect the transition from one state to another. Being able to > query the current state and let gc know: "No, I cannot be collected as I am > now" is a much more robust solution from the programmers perspective. > > A further difference is this: If an object isn't tracked, it won't be > collected if it is part of a cycle, but it will not be put in gc.garbage > either. In effect, it will just remain in memory, unreachable, with no > chance of it ever being released. > Yes, I see. I have used the track/untrack approach, but it was in a very different situation. I have a long-running C function which keeps alive a large number of objects. At the start of the function, I untrack them all as a performance optimization, so the garbage collector does not have to spend time traversing them. Before the function returns, I track them again. I see now why that wouldn't work for your use-case. Thank you. I like the idea of your patch to give opportunities to tell the GC that they are uncollectable on the fly. I'm concerned about the performance impact of making tp_traverse do double-duty, though. Calling tp_traverse for every object in a cycle will have the effect of making an extra pass on every reference from every object participating in the cycle. For example, consider a large list that's part of a cycle. If we call the list's tp_traverse to establish if it's collectible, list's tp_traverse will call visit() on every item in the list. Even though you've made visit() a do-nothing function, that's still a function call per reference. It seems a shame to do all of that work unnecessarily. |
|||
| msg110973 - (view) | Author: Kristján Valur Jónsson (krisvale) * | Date: 2010-07-20 21:03 | |
I agree that this is not the _optimal_ solution, and that when you are collecting stuff, say a list of many items, it will cause X indirect calls to all of the things in the list. My counterargument is, however, that all those items will, if collected, be handed off to the final Py_DECREF with all the baggage that that entails, ultimately resulting in PyObject_Free or PyMem_Free. An extra indirect call, (only for unreachable objects, btw, which is only a small objects of all objects visited during a GC pass) should not play a significatn part in the process. An alternative to this extra tp_traverse() pass, would be to flag objects that report that they are or are not collectable, with a special value in gc_refs, probably a bitmask. But this would be a much more intrusive change in an algorithm that is far from simple, and so need very rigorous review and testing. |
|||
| msg110975 - (view) | Author: Daniel Stutzbach (stutzbach) ![]() |
Date: 2010-07-20 21:19 | |
Yes, I see your point. Even if more expensive than strictly necessary, the cost should be swamped by other existing costs. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2010-07-31 17:37:47 | dstanek | set | nosy:
+ dstanek |
| 2010-07-20 21:19:58 | stutzbach | set | keywords:
patch, patch, needs review messages: + msg110975 |
| 2010-07-20 21:08:48 | stutzbach | set | files: - unnamed |
| 2010-07-20 21:03:44 | krisvale | set | keywords:
patch, patch, needs review messages: + msg110973 |
| 2010-07-10 00:33:33 | stutzbach | set | files:
+ unnamed messages: + msg109810 |
| 2010-07-09 23:16:13 | rhettinger | set | keywords:
patch, patch, needs review nosy: + rhettinger messages: + msg109807 |
| 2010-07-09 23:12:38 | stutzbach | set | messages: - msg109806 |
| 2010-07-09 23:11:50 | stutzbach | set | keywords:
patch, patch, needs review messages: + msg109806 |
| 2010-07-09 22:58:51 | pitrou | set | keywords:
patch, patch, needs review nosy: + loewis, amaury.forgeotdarc messages: + msg109804 |
| 2010-07-09 22:48:32 | krisvale | set | keywords:
patch, patch, needs review messages: + msg109803 |
| 2010-07-09 22:41:09 | krisvale | set | keywords:
patch, patch, needs review messages: + msg109802 |
| 2010-07-04 21:01:40 | stutzbach | set | keywords:
patch, patch, needs review nosy: + stutzbach messages: + msg109265 |
| 2010-07-04 17:59:11 | pitrou | set | messages: + msg109248 |
| 2010-07-04 17:56:20 | krisvale | set | keywords:
patch, patch, needs review messages: + msg109247 |
| 2010-07-04 17:17:14 | pitrou | set | messages: + msg109241 |
| 2010-07-04 17:11:03 | krisvale | set | keywords:
patch, patch, needs review messages: + msg109240 |
| 2010-07-04 15:34:26 | pitrou | set | messages: + msg109229 |
| 2010-07-04 15:24:40 | krisvale | set | keywords:
patch, patch, needs review messages: + msg109226 |
| 2010-07-02 14:04:50 | pitrou | set | messages: + msg109113 |
| 2010-07-02 13:59:11 | krisvale | set | keywords:
patch, patch, needs review messages: + msg109112 |
| 2010-07-02 11:40:28 | pitrou | set | keywords:
patch, patch, needs review nosy: + pitrou, tim_one messages: + msg109100 |
| 2010-07-02 11:30:10 | krisvale | create | |
