This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Dynamic classes contain non-breakable reference cycles
Type: resource usage Stage:
Components: Interpreter Core Versions: Python 3.4
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, arigo, daniel.urban, fdrake, gvanrossum, isoschiz, kristjan.jonsson, ncoghlan, pconnell, pitrou
Priority: normal Keywords: patch

Created on 2013-05-10 20:22 by kristjan.jonsson, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
classleak.patch kristjan.jonsson, 2013-05-10 20:22 Patch file
classleak.py kristjan.jonsson, 2013-05-10 20:22
Messages (10)
msg188875 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-05-10 20:22
Classes contain two kinds of cycles back to themselves:
1) in their __mro_  list, the class itself is typically the first member.
2) in the descriptors for various fields such as __dict__.

The problem here is that there is no way to break the cycle.  A class that is dynamically created (e.g. in a function) and then not needed, will stick around until garbage collection is performed.

This happens in spite of attempts within the core to avoid such cycles.  For instance, the type's tp_subclasses list contains to avoid a cycle between a baseclass and its parent.

A .py file demonstrating the problem is attached.

A patch is attached that resolves the issue:
1) the mro tuple in the type object is "nerfed" to contain a Py_None reference in its first place, where it previously held the cyclic reference to the type object itself.  This is then "fixed" in place where required.  the __mro__ attribute becomes a getter that duplicates the tuple.

2) the descriptors are modified to hold a weak-reference to the target type, rather than a strong reference.

3) Fix process cleanup.  The thread state cannot be released until after the cleanup of e.g. PySet_Fini() because the freeing of objects in there requires the DUSTBIN_SAFE macros that require the thread state.  Cleanup behaviour probably changed since objects go away on their own now.

4) changes to test_gc.py in the testsuite reflecting the changed behaviour

The patched code passes all the testsuite.
msg188876 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-05-10 20:22
Adding file demonstraing the problem.
msg188882 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-10 21:05
> 1) the mro tuple in the type object is "nerfed" to contain a
> Py_None reference in its first place

I like the idea, but I find the implementation (the new macros) quite convoluted. I think special-casing the first tuple element where necessary would be cleaner.

> 2) the descriptors are modified to hold a weak-reference to the target 
> type, rather than a strong reference.

What would happen if someone keeps a reference to the descriptor and not to the class object? Is it a possible use case?

> 3) Fix process cleanup.

This was already done in changeset 6f4627a65c0a. You might want to update your working copy.

Does your patch cater to all possible implicit cycles, or only a subset of them?
msg188886 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-05-10 22:20
Hello Antoine.
1) Yes, I am not completely happy with the approach myself, but I'd like to get the ball rolling.  One particular nuance is that there appears to be a way to provide a custom MRO, one which does not enforce the rule that the self type come first.
(although, in the code there is a place that iterates over the mro tuple starting from index 1... the code isn't completely consistent.)

An alternative approach, that would replace 'self' in the mro with Py_None, and then special casing this when doing lookup, might that be ok?

2) I was a bit worried about it, but running the test suite showed no problems.  The only time this weak reference gets dropped is when the class is garbage collected, hence the need to add special code to repr the descriptor.  The descriptors are normally part of the class, and then de-referenced at runtime into concrete things, such as bound methods, and the like.  I suppose we could float this on python-dev, to see what people think.

3) thanks for the heads up.  it looked a bit weird to me.  But python 3 code is still getting the shakedown, and touching these parts of the code can illustrate unknown bugs.  There are some other assumptions I came across in the code that are weird, but don't appear to be relevant. In one place, it is presumed that tp_mro could be NULL, but later, it is INCREFd.  And in another place, (descriptors) the type is XINCREFd, but then in other places, assumed to be true....

Anyway, I'd like us to find a way to do this in a clean way.  I'll revisit the macro scheme, and see if I can do it more nicely and explicitly.
msg189165 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-05-13 18:47
I think turning the __mro__ tuple into a getter is fine.  As long as this works I'm okay:

class C: ...
mro = C.__mro__
del C
assert mro[0].__name__ == 'C'

(The last assert stands in for asserting that the class object must stay alive as long as the tuple returned by __mro__ is alive.)

I think it's also fine if the descriptors contain weak references.  Its hard to get a "raw" descriptor anyways -- you can't say C.desc, you'd have to say C.__dict__['desc'].
msg189207 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-05-14 11:41
Thanks Guido.
The current patch provides the property you ask for.  I will see if I can make the "fiddling" of the internal tuple less magical.

I have one other question for you:  The standard "mro" puts the class in the 0th position.
But apparently, there is a mechanism for a "custom" mro, by calling an mro() function on the type (as far as I understand).
However, the order of objects in the returned tuple is not verified, only the types of the objects therein (this is in mro_internal())
Yet there is code that manually skips the 0th element, e.g. this code 

/* Initialize tp_dict properly */
    bases = type->tp_mro;
    assert(bases != NULL);
    assert(PyTuple_Check(bases));
    n = PyTuple_GET_SIZE(bases);
    for (i = 1; i < n; i++) {
        PyObject *b = PyTuple_GET_ITEM(bases, i);
        if (PyType_Check(b))
            inherit_slots(type, (PyTypeObject *)b);
    }
(from PyType_Ready())

I'm not familiar with the mro() function.  What kind of black magic is it supposed to provide?  And how can we make sure that its free-form return value is reconciled with the 1-based loop above?
msg189231 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2013-05-14 16:10
Well, adding weak references left and right to break cycles is going to subtly change or break people's code and hasn't been done so far, but that's only my opinion.  Anyway, I want to correct what you say about tp_subclasses: yes, tp_subclasses is a list of weakrefs, but the reason is not (as I think you mean) to avoid cycles.  The reason is simply that if it were strong references, then all classes ever created would stay alive.
msg189238 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-05-14 17:03
Kristjan, it seems you're in over your head. :-)

The mro() function is documented here:
http://docs.python.org/3/library/stdtypes.html?highlight=mro#class.mro
It exists to be called instead of using __mro__ directly; a metaclass can then override what it returns.  (Though I don't know if anyone has ever done this.)

The code you quote is in PyType_Ready() and skips bases[0] because bases[0] has already been used as the basic "template" for the C-level layout of the instances.  This particular loop adds additional slots to the layout and (most importantly) verifies that no bases are in conflict with the layout enforced by bases[0].  (This is required because we don't actually implement "virtual slots" at this level -- the C-level layout can only extend the first base class's layout.)

BTW, Armin is also right about the reason for using weak references in the __subclasses__ list.

As for replacing references with weak references, I would be much more concerned if you were proposing this for e.g. bound methods or instances, but that doesn't mean I'm completely unconcerned...

In addition to worrying about breaking (obscure) code that might depend on those references, I worry that it is trying to optimize for an unusual case (dynamically created classes in a world where you don't want to call gc.collect()) but slowing down the common case (access of the class from the descriptor every time a descriptor is used).

Specifically about your patch, I'm pretty sure there are some calls in there that don't expect a NULL pointer, as d_type is never NULL; but adding the weak reference makes it possible that your macro will return a NULL pointer.
msg189247 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-05-14 19:55
Armin:  Of course you are right.  This is what weak references are for, in a gc world, although their convenience to avoid cycles and enable reference counting to work always makes me forget.

I have another ongoing issue regarding tp_subclasses, btw, in issue #17936.

Guido: Yes, it is complicated.  The reason I am doing this is because I'm working to incorporate these changes in our stackless 2.7 branch.  I figured this might be useful to Python at large, hence this submission.  relying on gc is not an option when using python in a performance sensitive environment.  there are a few places in the core that cause cycles and I'm always keen to try to remove those.  Of course we can avoid the dynamic creation of classes, which is perhaps somewhat exotic.  But it is a simpler problem than the reference cylcle inherent in recursive closures.  I had a crack at that a week ago and ran in to a wall, so I thought I'd set myself an easier target :)

Thanks for clarifying mro().  What I was driving at was that without mro(), then type == type->tp_mro[0].  and knowing this, it is easy to put a None in tp_mro[0].  With a custom mro(), this restriction is no longer valid.  But my patch deals with this by verifying the assumption.  So, there is no big problem there.

I hear you worry a bit about performance for descriptors.  Performance is indeed a valid concern, but I"m not sure that an extra C indirection will show up on any readings, given that the next thing that happens is typically the creation of a bound method or the like, with all the stuff that entails.

I am not too worried about possibly returning NULL.  That's a technical detail that is fixable.
A much better question is whether this is worth doing at all because the practice I'm trying to optimize is probably a rare practice as you point out.  When do you truly need to create throwaway classes?  Most places that do are simply defining classes in a function scope as a means of name scoping, not realizing that each function invocation will create a unique metaclass instance and cost a non-trivial amount of cpu.

So, after this interesting sojourn into the bowels of typeobject.c, and these most enlightening discussions with you (Armin, Guido, Antoine) I don't think I'll pursue this any further.

Cheers!
msg189248 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2013-05-14 20:10
Good call. I think it's perfectly fine for you to do this in your custom 2.7 branch. It feels too fragile to adopt the same approach for Python 3.4 though.
History
Date User Action Args
2022-04-11 14:57:45adminsetgithub: 62150
2013-05-14 20:10:14gvanrossumsetstatus: open -> closed

messages: + msg189248
2013-05-14 19:55:43kristjan.jonssonsetmessages: + msg189247
2013-05-14 17:03:25gvanrossumsetmessages: + msg189238
2013-05-14 16:10:02arigosetnosy: + arigo
messages: + msg189231
2013-05-14 11:41:17kristjan.jonssonsetmessages: + msg189207
2013-05-13 18:47:13gvanrossumsetnosy: + gvanrossum
messages: + msg189165
2013-05-13 15:53:29daniel.urbansetnosy: + daniel.urban
2013-05-13 13:25:56ncoghlansetnosy: + ncoghlan
2013-05-13 12:49:35fdrakesetnosy: + fdrake
2013-05-13 12:01:08isoschizsetnosy: + pconnell, isoschiz
2013-05-10 22:20:49kristjan.jonssonsetmessages: + msg188886
2013-05-10 21:05:16pitrousetnosy: + amaury.forgeotdarc, pitrou
messages: + msg188882
2013-05-10 20:22:51kristjan.jonssonsetfiles: + classleak.py

messages: + msg188876
2013-05-10 20:22:24kristjan.jonssoncreate