classification
Title: Allow objects to decide if they can be collected by GC
Type: enhancement Stage:
Components: Interpreter Core Versions: Python 3.3
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Jim.Jewett, amaury.forgeotdarc, asvetlov, dstanek, isoschiz, kristjan.jonsson, loewis, pconnell, pitrou, rhettinger, stutzbach, tim.peters
Priority: normal Keywords: needs review, patch

Created on 2010-07-02 11:30 by kristjan.jonsson, last changed 2013-04-23 22:00 by isoschiz.

Files
File name Uploaded Description Edit
gc_collectable.patch kristjan.jonsson, 2010-07-02 11:30 review
ob_is_gc.patch kristjan.jonsson, 2012-04-05 11:55 review
ob_is_gc.patch kristjan.jonsson, 2012-04-06 10:47 review
Messages (41)
msg109099 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-07-02 11:30
The GC module currently relies only on the presence of a __del__ method to decide if an object can be safely collected at all.  Also, there is a special case for generator objects.

This patch allows any object to call an api during its traversal,
PyObject_GC_Collectable(), to indicate whether it is fit to be collected at this time, overriding any presence of a __del__ method or not.

This mechanism is being put in place in stackless python 2.7 because tasklets cannot always be collected depending on their runtime state, and I thought this might be a useful addition for regular python, especially since there already is such a dynamic special case for generator objects.
msg109100 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-07-02 11:40
First, what's the use case? Just backporting an internal Stackless API isn't a reasonable request.
Second, this tells that there *is* a finalization function, but not what the function is. How is the GC supposed to find it?
Third, the implementation looks quite suboptimal. Better define a new slot method, such as tp_needs_finalizing, or tp_get_finalizer.
msg109112 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-07-02 13:59
1. The use case is any C extension that may need to run non-trivial code when being deleted, but where this can not be statically known.  An example in the code today is generators, and the PyGen_NeedsFinalizing() function, doing exactly this.  This is an attempt to make this functionality available to any C extension.

2. No, this gives a C extension the oppertunity to say: It's okay to delete me from GC, or it's not okay to delete me from GC.  There is no need to specify any particular finalizing function, it will be invoked when the refcount goes to 0, just as it happens with __del__(), or as it happens with PyGen objects.

3. This code is only invoked for garbage deemed collectable.  As such it is not on any critical path, most gc collections don't actually find any garbage. The cost of a few extra indirect function calls is likely to drown in the memory allocator overhead when the objects are released. 

Also, this is designed to be a minimum impact patch.  We really shouldn't be creating new slot methods when it can be avoided.
msg109113 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-07-02 14:04
> 1. The use case is any C extension that may need to run non-trivial
> code when being deleted, but where this can not be statically known.

tp_del is IMO a bad place to do it. I'd recommend tp_dealloc instead,
precisely so that you don't end up with uncollectable objects tied to
internal OS structures or other non-trivial resources.

(also, tp_del seems to have problems with subclassing. I don't remember
the specifics)

> 3. This code is only invoked for garbage deemed collectable.  As such
> it is not on any critical path, most gc collections don't actually
> find any garbage. The cost of a few extra indirect function calls is
> likely to drown in the memory allocator overhead when the objects are
> released.

Ok.
msg109226 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-07-04 15:24
I don't understand the tp_del / tp_dealloc argument here.  This is really not the concern of gc.  Gc only goes and finds unreachable objects, and if they are deemend collectable, makes them go away by clearing their references.
GC only cares that during the final release of any objects in a reference cycle, no non-trivial code may run (for various technical reasons).  So, objects deemed "sensitive" in this way, are left alone (and put in gc.garbage instead).
GC has no other knowledge of how objects behave, if or when any finalizers are called.  The only thing gc does is calling Py_DECREF (through the tp_clear slot).  Everything else is up to the objects.

The intent of this patch is for objects themselves to be able to provide better information to GC as to whether they are too "sensitive" to be cleared by GC, rather than GC relying solely on the presence of a __del__ method, or using the builtin special case knowledbe for PyGen objects.

Whether finalizers are called from tp_del or tp_dealloc is beyond the scope of the gc module or, indeed, this patch.
msg109229 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-07-04 15:34
> GC only cares that during the final release of any objects in a
> reference cycle, no non-trivial code may run (for various technical
> reasons).  So, objects deemed "sensitive" in this way, are left alone
> (and put in gc.garbage instead).

Which is really the problem, and why you should prefer tp_dealloc over
tp_del.
(unless you want gc.garbage to fill up too easily)
msg109240 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-07-04 17:11
I'm confused, Antoine. _I_ am not preferring anything.  I am not modifying the way finalizers are called.  I'm merely providing a way for objects to signal that they _have_ (or don't have) non-trivial finalizers.

A finalizer _may_ be tp_del, but it may be any code that is called as part of tp_dealloc().  This patch allows an object to say:  Calling tp_dealloc() is currently unsafe from a gc.collect() run.
msg109241 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-07-04 17:17
> A finalizer _may_ be tp_del, but it may be any code that is called as
> part of tp_dealloc().  This patch allows an object to say:  Calling
> tp_dealloc() is currently unsafe from a gc.collect() run.

Well, I don't know what you mean. tp_dealloc should always be "safe" and
is never checked for by the GC.

By the way, tp_del isn't even documented, which reinforces my point that
it's not really for use by third-party extension types.

I think we are reaching the point where, if you want this to happen, you
should ask for opinions on python-dev, because I really don't understand
which problem you are trying to solve.
msg109247 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-07-04 17:56
tp_del() is generally invoked by tp_dealloc().  See for example typeobject.c:849.  gc.collect() never invokes tp_del() directy, that is left for the objects themselves (as part of tp_dealloc())

Now, gc already knows that if tp_del is present, it cannot cause break the cycle, (for various technical reasons, one of them being that tp_del would be invoked with various dependency objects already nerfed by tp_clear()).

But this is not always enough:
1) it may be too pessimistic.  Sometimes the tp_del method doesn't do anything significant, or isn't even called, depending on runtime state, so it is safe for gc to clear the object (see for example genobject.c:30).
2) Sometimes finalization behaviour is not defined by a tp_del method at all, but rather some custom code in tp_dealloc.

All this patch does, is to generalize the mechanism already provided for genobject.c (by PyGen_NeedsFinalizing()), to any object: Any object can signal to gc that: a) it is ok to collect a cycle with me in it, or b) no, it is unsafe to do so.  With this patch in place, PyGen_NeedsFinalizing() no longer needs to be a special case in gcmodule.c.

Yes, I'll send a message to python-dev.
msg109248 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-07-04 17:59
> All this patch does, is to generalize the mechanism already provided
> for genobject.c (by PyGen_NeedsFinalizing()), to any object: Any
> object can signal to gc that: a) it is ok to collect a cycle with me
> in it, or b) no, it is unsafe to do so.  With this patch in place,
> PyGen_NeedsFinalizing() no longer needs to be a special case in
> gcmodule.c.

Adding an API function and an additional traversal pass to the GC just
for the sake of removing a special case doesn't seem reasonable to me.
That's why I'm asking what specific problem you are trying to solve.
(rather than simply trying to make things "nicer")
msg109265 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-07-04 21:01
Kristján,

Can you compare and contrast your approach with calling PyObject_GC_UnTrack and PyObject_GC_Track to mark the object as uncollectable/collectable?
msg109802 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-07-09 22:41
Hi Antoine.
I am not adding another traversal pass.  I am modifying the function that gets called for every object about to be collected, to call the object's traversal function.  This is hardly an extra "pass" and is done only for objects that are about to be collected (i.e. only in the rare case).

The particular case is this:  I have objects (tasklets) that depending on their runtime state, need to run a finalizer or not.  If not, they can be collected.  If they do, then they end up on gc.garbage.

This is exactly the same case as with generators.  But instead of adding yet another special case into stackless python, I realized that it may be up to objects' runtime state whether they need to run a finalizer or not.  So, I wrote a more generic way and decided to contribute it to C Python, so that a future developer, having a C extension that only sometimes needs to run a finalizer, could have a way to cleanly deal with garbage collection.  This is twice now that this problem has been seen in python.  The first time, a special case was coded into gcmodule.  This second time, I'm offering a more generic solution.

Now, I have a number of useful improvements and additions to C python sitting in our repository, the result of over 7 years work with Python at CCP games.  I have another one sitting and waiting that adds useful functionality to gcmodule.  Useful in my opinion anyway, and worth at least being kept for posterity as a patch in the tracker.  But since these patches of mine seem to be met with rather limited enthusiasm,  I wonder if I should be bothering.
msg109803 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-07-09 22:48
Hi Daniel.
Your message was classified as spam, I have no idea why, but this is why I only noticed it now.

Your suggestion is interesting, I hadn't thought of that.  Yes, it is possible to use the track/untrack functions (I think), but that would mean that you would have to monitor your object for every state change and reliably detect the transition from one state to another.  Being able to query the current state and let gc know:  "No, I cannot be collected as I am now" is a much more robust solution from the programmers perspective.

A further difference is this:  If an object isn't tracked, it won't be collected if it is part of a cycle, but it will not be put in gc.garbage either.  In effect, it will just remain in memory, unreachable, with no chance of it ever being released.

In contrast, a generator (or in my case, a tasklet) that needs to have a finalizer run when it goes away, will end up in gc.garbage, which a dilligent application will visit once in a while in order to clean it out.  I'm not sure how you would clear out a generator like that, but in Stackless python, you could do something like this once in a while:
for g in gc.garbage:
  if isinstance(g, stackless.tasklet):
    g.kill()
del gc.garbage[:]
msg109804 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-07-09 22:58
> The particular case is this:  I have objects (tasklets) that depending 
> on their runtime state, need to run a finalizer or not.  If not, they
> can be collected.  If they do, then they end up on gc.garbage.

But why don't you try using tp_dealloc as I suggested instead? It will allow you to run a finalizer in all cases, without anything ending up in gc.garbage. It's what we do in the new io lib.

Also, I'm still worried that the mechanism you're proposing is a bit quirky. I hope other developers can give opinions or even suggestions.

> But since these patches of mine seem to be met with rather limited
> enthusiasm,  I wonder if I should be bothering.

Well, the GC is a rather delicate affair and moreover it doesn't have a dedicated maintainer. That doesn't mean your patches are not interesting, but I think everyone tries to be very careful with that area of the code.
msg109807 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-07-09 23:16
> I'm merely providing a way for objects to signal
> that they _have_ (or don't have) non-trivial finalizers.

+1 This is useful and non-hackish way to communicate
with GC and solves a problem that doesn't otherwise
have a clean solution.
msg109810 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-07-10 00:33
2010/7/9 Kristján Valur Jónsson <report@bugs.python.org>

> Your message was classified as spam, I have no idea why, but this is why I
> only noticed it now.
>

Yes, I just noticed that tonight as well.  I filed a bug on the meta-tracker
in the hopes that someone can dig into the underlying cause.

> Your suggestion is interesting, I hadn't thought of that.  Yes, it is
> possible to use the track/untrack functions (I think), but that would mean
> that you would have to monitor your object for every state change and
> reliably detect the transition from one state to another.  Being able to
> query the current state and let gc know:  "No, I cannot be collected as I am
> now" is a much more robust solution from the programmers perspective.
>
> A further difference is this:  If an object isn't tracked, it won't be
> collected if it is part of a cycle, but it will not be put in gc.garbage
> either.  In effect, it will just remain in memory, unreachable, with no
> chance of it ever being released.
>

Yes, I see.  I have used the track/untrack approach, but it was in a very
different situation.  I have a long-running C function which keeps alive a
large number of objects.  At the start of the function, I untrack them all
as a performance optimization, so the garbage collector does not have to
spend time traversing them.  Before the function returns, I track them
again.

I see now why that wouldn't work for your use-case.  Thank you.  I like the
idea of your patch to give opportunities to tell the GC that they are
uncollectable on the fly.

I'm concerned about the performance impact of making tp_traverse do
double-duty, though.  Calling tp_traverse for every object in a cycle will
have the effect of making an extra pass on every reference from every object
participating in the cycle.  For example, consider a large list that's part
of a cycle.  If we call the list's tp_traverse to establish if
it's collectible, list's tp_traverse will call visit() on every item in the
list.  Even though you've made visit() a do-nothing function, that's still a
function call per reference.  It seems a shame to do all of that work
unnecessarily.
msg110973 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-07-20 21:03
I agree that this is not the _optimal_ solution, and that when you are collecting stuff, say a list of many items, it will cause X indirect calls to all of the things in the list.

My counterargument is, however, that all those items will, if collected, be handed off to the final Py_DECREF with all the baggage that that entails, ultimately resulting in PyObject_Free or PyMem_Free.  An extra indirect call, (only for unreachable objects, btw, which is only a small objects of all objects visited during a GC pass) should not play a significatn part in the process.

An alternative to this extra tp_traverse() pass, would be to flag objects that report that they are or are not collectable, with a special  value in gc_refs, probably a bitmask.  But this would be a much more intrusive change in an algorithm that is far from simple, and so need very rigorous review and testing.
msg110975 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-07-20 21:19
Yes, I see your point.  Even if more expensive than strictly necessary, the cost should be swamped by other existing costs.
msg157464 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-04-04 10:56
I just noticed that PyTypeObjects have a gc slot already:
inquiry tp_is_gc; /* For PyObject_IS_GC */

Now, this is used in only one place in 2.7, for type objects:
return type->tp_flags & Py_TPFLAGS_HEAPTYPE;

This is thus used to dynamically query if an object was allocated with a GC header or not, since static types cannot have this.

Now, it would be perfectly possible to change the semantics of this function to return a bit flag, with bit 0 being the "has head" and bit 1 meaning "is collectable"

This might simplify some stuff.  Any thoughts?
msg157492 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-04 15:46
> This might simplify some stuff.  Any thoughts?

I think you should float it on python-dev.
msg157571 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-04-05 11:55
Here is a completely new patch.  This approach uses the already existing tp_is_gc enquiry slot to signal garbage collection.
The patch modifies the generator object to use this new mechanism.
The patch keeps the old PyGen_NeedsFinalizing() API, but this can now go away, unless people think it might be used in extension modules 

(why do we always expose all those internal apis from the dll? I wonder.)
msg157621 - (view) Author: Jim Jewett (Jim.Jewett) Date: 2012-04-05 20:56
I think the second patch is a fairly straightforward enhancement to the existing situation.  It still feels a bit like an awkward half-measure, though.

(1)  If a class does have a finalizer, instances should still be able to say "Go ahead and collect me, I don't happen to be in the problematic state."  (In other words, there should be a return value that means not to bother checking for the existence of a tp_del / __del__ method.)

(2)  It should be explicit whether or not this is:
  (2a) just an inquiry ("If you were in a cycle, could I collect you?),
  (2b) a statement plus inquiry ("You are in a cycle.  Can I collect you?"), or
  (2c) a request ("You are in a cycle, and I would like to collect you.  Clean up if you can, then tell me whether you are still uncollectable.")

(3)  It may be worth adding an additional slot for safe idempotent finalizers.  (Earlier suggestions called this __close__, but I suppose __finalize__ might be more general.)  Once a garbage cycle is detected, go ahead and call that slot's method before giving up and sticking the whole thing in garbage.  If python classes could also fill this slot, it would look a lot like (2c).
msg157657 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-04-06 10:47
New patch with explicit mutually exclusive flags and better comments.
msg157658 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-04-06 10:53
Thanks for your comments Jim.
They made me realize that the patch was incomplete.  What is needed are two mutually exclusive flags that override the presence of the tp_del slot.  This addresses your 1) point and is the case for Generators.

I don't think 2 is important.  Does the context of the call matter?  It is merely a question of whether a finalizer will or will not do something trivial.

Finalizers implemented in python can never run from garbage collection.  the potential to do harm is simply too great.  Which is why these extensions are purely the domain of specialized C extensions.  Hence I don't think adding another slot, or allowing self-declared idempotent python finalizers to run during cleanup is a good idea.
msg157681 - (view) Author: Jim Jewett (Jim.Jewett) Date: 2012-04-06 19:51
> I don't think 2 is important.  Does the context of the call matter?
> It is merely a question of whether a finalizer will or will not do
> something trivial.

It would affect how I would write such functions.  If I knew that it wouldn't be called until the object was already garbage, then I be inclined to move parts of tp_del there, and break the cycle.

> Finalizers implemented in python can never run from garbage 
> collection.  the potential to do harm is simply too great.

__del__ methods do run, even if an object was collected by the cycle detector.  And they can't do any harm that couldn't also be done by a C finalizer.

The only change from today's situation with __del__ is that once an object is known to be cyclic garbage, it would get a chance to break the cycles itself (or, admittedly, to rescue itself) before the cycle-breaker began making arbitrary decisions or gave up and stuffed it in the uncollectable list.

Even in your case, instead of setting a timer to clean out garbage, you could have the garbage itself notify your cleaner that it needed attention.
msg157682 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2012-04-06 20:03
On Fri, Apr 6, 2012 at 12:51 PM, Jim Jewett <report@bugs.python.org> wrote:

> __del__ methods do run, even if an object was collected by the cycle
> detector.  And they can't do any harm that couldn't also be done by a C
> finalizer.
>

No, if an object with a __del__ method is part of a cycle, it is not
collected.  The objects get appended to gc.garbage instead.

See:  http://docs.python.org/py3k/library/gc.html#gc.garbage
msg157685 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-04-06 20:24
Right Daniel, but currently an exception is made for Generator objects.  The generator can tell that the finalizer won't do anything and then allow collection regardless.  It knows the finalizer and that in its state it will at most do some Py_DECREF operations.

This patch aims to generalize this exception mechanism, and also its opposite, so that we don´t need to rely on the presence of the finalizer slot to decide.  We can then remove a silly function from the public (but undocumented) API.

Of course, you can only allow collectino of an object with a finalizer if you know for sure that it will not do anything but Py_DECREF.  This is possible for generators because they are not an inheritable class.  But we could never do this in general for a python class.  Even running .py code from the GC collector cycle would crash everything instantly.
msg157688 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2012-04-06 20:35
Are __del__ and tp_del getting conflated in this conversation?  I don't see a __del__ method on generator objects:

filfre:$ python3.1 
Python 3.1.2 (r312:79147, Dec  9 2011, 20:47:34) 
[GCC 4.4.3] on linux2
>>> (i for i in range(5)).__del__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'generator' object has no attribute '__del__'

but I may be missing something.
msg157689 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-04-06 20:52
typeobject.c:
TPSLOT("__del__", tp_del, slot_tp_del, NULL, ""),

I'm not super-familiar with how typeobjects and slots work, but I imagine that if a type is defined with a __del__ member, then the tp_del slot is automatically filled out.  The converse need not be true.

At any rate, the non-zero-ness of tp_del is what gcmodule.c uses to find out if an object has a finaliser.  There is a code rudiment present in gcmodule.c, that hints at an earlier time when '__del__' was looked up but that string is not actually used.  That can be removed to, eiter as part of this patch or separately since it should be quite uncontroversial.
msg157714 - (view) Author: Jim Jewett (Jim.Jewett) Date: 2012-04-07 02:24
On Fri, Apr 6, 2012 at 4:03 PM,
 Daniel Stutzbach <stutzbach@google.com> added the comment:

>> __del__ methods do run, even if an object was collected by the cycle
>> detector.  And they can't do any harm that couldn't also be done by a C
>> finalizer.

> No, if an object with a __del__ method is part of a cycle, it is not
> collected.  The objects get appended to gc.garbage instead.

> See:  http://docs.python.org/py3k/library/gc.html#gc.garbage

They can still be collected if there is only one object with a __del__
method in the cycle.

(Whether the code actually does that, it appears not to at the moment,
and I won't swear by by own memory of 2.3 era code.)
msg157719 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-04-07 08:50
I'm still unclear about the rationale for this change. krisvale says in the patch and in msg109099 that this is to determine whether an object can be collected "at this time". Is the intended usage that the result value may change over the lifetime of the object?

If so, I'm -1 on the patch. If an object cannot be collected "at this time", it means that it is added to gc.garbage, which in turn means that it will never be collected (unless somebody explicitly clears gc.garbage).

Supporting the case of objects that can be collected despite living in a cycle is fine to me, but those objects must not change their mind.

Supporting the case of objects that are not collectable now, but may be collectable later, may have its use case (which one?), but this is not addressed by the patch (AFAICT). To support it, processing of the entire cycle must be postponed (to the next collection? to the next generation?).

I'm -0 on recycling the is_gc slot. Having a GC header and having a non-trivial tp_del are two unrelated issues. If this is done, I think it would be best to rename the slot to tp_gc_flags or something. There is also the slight risk of some type in the wild returning non-1 currently, which then would get misinterpreted.
msg157722 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-04-07 10:30
Jim: The edge case of collecting an object that is alone in a cycle is something that isn't handled.  I'm also not sure that it is worth doing or even safe or possible.  but that is beside the point and not the topic of this patch.

Martin: This patch is formalizing a known fact about cpython objects, albeit an uncommon one:  Some objects with finalizers can be safely collected if they are in a certain state.  The canonical example is the PyGeneratorObject.  gccollect.c has special code for this type and there is a special evil API exposed to deal with the case.

_This patch is not changing any behaviour._  Generator objects that are in a special state of existance cannot be collected.  Rather, they are (and always have been) put in gc.garbage along with the rest of their cycle chain.  In other cases, the finalizer causes no side effects and simply clearing them is ok.  I couldn't tell you what those generator execution states are, but the fact is that they exist.

A similar problem exists in Stackless python with tasklets.  We solved it in this generic way there rather than add more exceptional code to gcmodule.c and this patch is the result of that work.  In Stackless, the inverse is true:  An object without an explicit finalizer can still be unsafe to collect, because the tp_dealloc can do evil things, but doesn't always, depending on object state.

So, objects who change their mind about whether they can be collected or not are a fact of life in python.  Yes, even cPython.  This patch aims to formalize that fact and give it an interface, rather than to have special code for generator objects in gcmodule.c and an inscrutable API exposed (PyGen_NeedsFinalizing())

About reusing the slot:  Slots are a precious commodity in python.  This particular slot is undocumented and used only for one known thing:  To distinguish PyTypeObjects from PyHeapTypeObjects.  In fact, creating a slot and not using special case code (like they did for PyGeneratorObjects) was forward thinking, and I'm trying to build on that.  Renaming the slot is a fine idea.

A brief search on google code (and google at large) showed no one using this slot.  It is one of those undocumented strange slots that one just initializes to 0.
msg157723 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-04-07 10:33
Btw. tangentially related to this discussion, issue 10576 aims to make the situation with uncollectable objects a little more bearable.  An application can listen for garbage collection, visit gc.garbage and deal with its problematic types in its own way.
msg158510 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-04-16 22:05
I've stumbled upon further cases of this problem, and a possible serious bug in python 3.
the _PyIOBase_finalize() will invoke a "close" method on the object if the object isn't closed.  This is the same as the object having a __del__ method.  Yet, these objects are not treated as having finalizers by the garbage collector, and indeed, _PyIOBase_finalize() is called by the tp_clear() slot of several derived types. Surely, if these objects define non-trivial 'close' members, they must not be called during garbage collection.
msg158537 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-17 10:33
> Surely, if these objects define non-trivial 'close' members, they must 
> not be called during garbage collection.

Define "non-trivial". There are various tests for it in test_io.

Not ending up in gc.garbage is *by design*. Making file objects uncollectable as soon as they appear in a reference cycle would be a serious regression. That's why the cleanup is done in tp_dealloc instead of having a __del__.
msg158538 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-04-17 10:37
I _think_ the only python related things you can do from tp_clear() is Py_DECREF(), this is what I mean by trivial.  This is the reason, for example, that special care was done with generators.
An IO object could of course do non-python operations such as closing files and freeing buffers.

If file.close() can be an arbitrary python method, then it can no more be called from gc, than an object's __del__ method.  This would not be a regression, this would be a fact of life.

The point of _this_ defect (issue 9141) is to allow objects to be smarter about this, being able to tell gc if their finalizers are trivial or not.


I will start a discussion on python-dev to see if anyone knows exactly why these limitations are in place, and what they are.  They are not documented in the source code.
msg158540 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-17 10:48
> I _think_ the only python related things you can do from tp_clear() is
> Py_DECREF(), this is what I mean by trivial.

Well, Py_DECREF is not trivial at all, since it can invoke arbitrary
Python code (through e.g. weakref callbacks, or by releasing the GIL).
Therefore, I would say any code is allowed from tp_clear :-)

> If file.close() can be an arbitrary python method, then it can no more
> be called from gc, than an object's __del__ method.  This would not be
> a regression, this would be a fact of life.

I don't believe it. I don't see what's magical about being called by the
gc. Again, a Py_DECREF in tp_dealloc can invoke arbitrary Python code.
msg158542 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-04-17 10:58
>I don't believe it. I don't see what's magical about being called by the
>gc. Again, a Py_DECREF in tp_dealloc can invoke arbitrary Python code.

Look again.  gcmodule specifically takes any objects reachable from ob_clear and sees if any of them have side effects when Py_DECREF'd.  If any object has a finalizer, the entire cycle is put in gc.garbage.

gcmodule is trickier than you might think.  I've spent quite a time with it.

Anyway, I've put the issue to python-dev, let's see if they have some autorative insight on the matter.
msg158543 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-17 11:08
> Look again.  gcmodule specifically takes any objects reachable from
> ob_clear and sees if any of them have side effects when Py_DECREF'd.

has_finalizer() in gcmodule.c doesn't check for weakref callbacks, so a
weakref callback can still be invoked from tp_dealloc.
msg158560 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2012-04-17 17:47
On Tue, Apr 17, 2012 at 4:08 AM, Antoine Pitrou <report@bugs.python.org> wrote:
> has_finalizer() in gcmodule.c doesn't check for weakref callbacks, so a
> weakref callback can still be invoked from tp_dealloc.

Unless I'm mistaken, weakrefs will be handled and removed by
handle_weakrefs() before delete_garbage() is called.
msg158561 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-17 17:58
> On Tue, Apr 17, 2012 at 4:08 AM, Antoine Pitrou <report@bugs.python.org> wrote:
> > has_finalizer() in gcmodule.c doesn't check for weakref callbacks, so a
> > weakref callback can still be invoked from tp_dealloc.
> 
> Unless I'm mistaken, weakrefs will be handled and removed by
> handle_weakrefs() before delete_garbage() is called.

Perhaps, but what does that change?
History
Date User Action Args
2013-04-23 22:00:21isoschizsetnosy: + pconnell, isoschiz
2012-04-17 17:58:15pitrousetmessages: + msg158561
2012-04-17 17:47:47stutzbachsetmessages: + msg158560
2012-04-17 11:08:39pitrousetmessages: + msg158543
2012-04-17 10:58:02kristjan.jonssonsetmessages: + msg158542
2012-04-17 10:48:30pitrousetmessages: + msg158540
2012-04-17 10:37:59kristjan.jonssonsetmessages: + msg158538
2012-04-17 10:33:12pitrousetmessages: + msg158537
2012-04-16 22:05:51kristjan.jonssonsetmessages: + msg158510
2012-04-07 10:33:59kristjan.jonssonsetmessages: + msg157723
2012-04-07 10:30:24kristjan.jonssonsetmessages: + msg157722
2012-04-07 08:50:22loewissetmessages: + msg157719
2012-04-07 02:24:39Jim.Jewettsetmessages: + msg157714
2012-04-06 20:52:51kristjan.jonssonsetmessages: + msg157689
2012-04-06 20:35:11stutzbachsetmessages: + msg157688
2012-04-06 20:24:41kristjan.jonssonsetmessages: + msg157685
2012-04-06 20:03:19stutzbachsetmessages: + msg157682
2012-04-06 19:51:50Jim.Jewettsetmessages: + msg157681
2012-04-06 10:53:44kristjan.jonssonsetmessages: + msg157658
2012-04-06 10:47:12kristjan.jonssonsetfiles: + ob_is_gc.patch

messages: + msg157657
2012-04-05 20:57:00Jim.Jewettsetnosy: + Jim.Jewett
messages: + msg157621
2012-04-05 12:09:23asvetlovsetnosy: + asvetlov
2012-04-05 11:55:27kristjan.jonssonsetfiles: + ob_is_gc.patch

messages: + msg157571
2012-04-04 15:46:51pitrousetmessages: + msg157492
versions: + Python 3.3, - Python 3.2
2012-04-04 10:56:02kristjan.jonssonsetmessages: + msg157464
2010-07-31 17:37:47dstaneksetnosy: + dstanek
2010-07-20 21:19:58stutzbachsetkeywords: patch, patch, needs review

messages: + msg110975
2010-07-20 21:08:48stutzbachsetfiles: - unnamed
2010-07-20 21:03:44kristjan.jonssonsetkeywords: patch, patch, needs review

messages: + msg110973
2010-07-10 00:33:33stutzbachsetfiles: + unnamed

messages: + msg109810
2010-07-09 23:16:13rhettingersetkeywords: patch, patch, needs review
nosy: + rhettinger
messages: + msg109807

2010-07-09 23:12:38stutzbachsetmessages: - msg109806
2010-07-09 23:11:50stutzbachsetkeywords: patch, patch, needs review

messages: + msg109806
2010-07-09 22:58:51pitrousetkeywords: patch, patch, needs review
nosy: + loewis, amaury.forgeotdarc
messages: + msg109804

2010-07-09 22:48:32kristjan.jonssonsetkeywords: patch, patch, needs review

messages: + msg109803
2010-07-09 22:41:09kristjan.jonssonsetkeywords: patch, patch, needs review

messages: + msg109802
2010-07-04 21:01:40stutzbachsetkeywords: patch, patch, needs review
nosy: + stutzbach
messages: + msg109265

2010-07-04 17:59:11pitrousetmessages: + msg109248
2010-07-04 17:56:20kristjan.jonssonsetkeywords: patch, patch, needs review

messages: + msg109247
2010-07-04 17:17:14pitrousetmessages: + msg109241
2010-07-04 17:11:03kristjan.jonssonsetkeywords: patch, patch, needs review

messages: + msg109240
2010-07-04 15:34:26pitrousetmessages: + msg109229
2010-07-04 15:24:40kristjan.jonssonsetkeywords: patch, patch, needs review

messages: + msg109226
2010-07-02 14:04:50pitrousetmessages: + msg109113
2010-07-02 13:59:11kristjan.jonssonsetkeywords: patch, patch, needs review

messages: + msg109112
2010-07-02 11:40:28pitrousetkeywords: patch, patch, needs review
nosy: + pitrou, tim.peters
messages: + msg109100

2010-07-02 11:30:10kristjan.jonssoncreate