classification
Title: Declaring a class creates circular references
Type: resource usage Stage:
Components: Interpreter Core Versions: Python 3.5
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: BreamoreBoy, andrea.corbellini, belopolsky, daniel.urban, dstanek, georg.brandl, kristjan.jonsson, pitrou, terry.reedy, warp10, ysj.ray
Priority: normal Keywords:

Created on 2010-07-29 14:19 by andrea.corbellini, last changed 2014-10-06 12:41 by georg.brandl. This issue is now closed.

Files
File name Uploaded Description Edit
test.py andrea.corbellini, 2010-07-29 14:19 Test script to reproduce the problem
Messages (20)
msg111935 - (view) Author: Andrea Corbellini (andrea.corbellini) Date: 2010-07-29 14:19
Creating a class (either using the 'class' statement or using type()) creates a circular reference.

I've attached a script that simply demonstrates this. The problem is caused by the fact that MyClass.__dict__['__dict__'].__objclass__ is MyClass.

Although most of the times classes are never deleted when the interpreted exits, some programs (like the popular Django framework) create temporary classes. And this is a pain because you can't disable the garbage collector.
msg111945 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-07-29 14:53
Is there a reason why you want to disable the cyclic garbage collector?
msg111946 - (view) Author: ysj.ray (ysj.ray) Date: 2010-07-29 15:03
This should not be a problem, I think. We cannot avoid such a circular reference. 

I believe the main reason of the circular reference is in MyClass.__mro__, this is a list and contains a reference to MyClass. You can add "__slots__ = []" to MyClass's class body to avoid __dict__(the one you mentioned) and another member which contains two more references to MyClass, but the __mro__ still contains the last reference to MyClass.
msg111950 - (view) Author: Andrea Corbellini (andrea.corbellini) Date: 2010-07-29 15:23
Disabling the GC can increase performances (although not significantly). But this bug is the cause of other problems too: what if the metaclass contains a __del__() method?

An another issue that I've found is that debugging is harder. I always try to avoid to create ref cycles in my code, also if my objects are collectable. In this way, I'm sure that I'll always be able to add a __del__ method in the future without problems. However, I can't easily check ref cycles without manually inspecting `gc.garbage`.

And also, specifying DEBUG_SAVEALL will put all the deleted classes and their attributes in the garbage, which makes debugging *very* hard in case of a leaking program.
msg111951 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2010-07-29 15:25
I couldn't imagine why a metaclass would want to have a __del__ method...
msg111955 - (view) Author: Andrea Corbellini (andrea.corbellini) Date: 2010-07-29 15:53
Having a __del__ inside a metaclass is strange, I know... but probably there are situations where you need to do so. Why shouldn't a developer be able to add a __del__ to a metaclass without creating uncollectable objects? I don't think this behavior is by design.

Also, doing random contributions to various projects I've seen many odd things. However I don't think that the bug tracker is the right place to discuss development practices. A bug that causes problems in unusual situations is still a bug. :-)
msg111957 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2010-07-29 16:04
Whether this is a bug is not clear.  Sometimes it's just impossible not to create cycles, and classes may just be one instance of that.
msg111961 - (view) Author: Andrea Corbellini (andrea.corbellini) Date: 2010-07-29 16:32
This is an unwanted an unexpected behavior, so this is a bug by definition. If it's not easy to fix, it's a different matter.

However here's a proposed solution:

* for the __mro__: instead of using a tuple, use a new object that inherits from it. This new object should use weak reference for the first item and should return the real object (if available) only in __getitem__().

* __objclass__ can should become a property based on weak references.
msg111963 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-29 16:36
This was discussed in issue1545463 which was deemed not to be worth fixing.  The particular bug described in issue1545463 would also be fixed by the still open issue812369.  So yes, there are cases where these cycles are a problem even with gc.
msg111966 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2010-07-29 16:57
This is not so easy: the __mro__ tuple, as its name says, is used internally for method resolution, or finding attributes on the type's bases. __objclass__ is used whenever the descriptor is accessed.  These operations are involved in every method call.

If you want this to go somewhere, you will have to produce a patch and demonstrate that the slowdown does not unsuitably impact performance.
msg111989 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-07-29 19:22
Interesting.  Here is a case for disabling GC:  

We intentionally disable GC in Eve-online, because of the unpredictable lag spikes it produces.  In a server application with gigabytes of python objects, GC can cause hickups of some seconds, which is unacceptable when trying to keep the latency low.

We were considering using Django for our backend web framework once.  Good that we didn't, since it would have leaked.

We do try to keep our code free from circular references, and occasionally run single GC passes on our test cluster to weed out any that may have formed accidentally (gc.garbage with DEBUG_LEAKS)

IMHO, python should try to be as free from these as possible, although it is admittedly not always easy (see recursive closures, issue 7464)

An alternative, is to do something like the minidom module does:  Provide a "unlink" method (or similar) to manually nerf objects before forgetting them.  Perhaps this could be standardized with an unlink keyword and a __unlink__ special method?
msg112051 - (view) Author: ysj.ray (ysj.ray) Date: 2010-07-30 09:09
> However here's a proposed solution:

* for the __mro__: instead of using a tuple, use a new object that inherits from it. This new object should use weak reference for the first item and should return the real object (if available) only in __getitem__().

* __objclass__ can should become a property based on weak references.



I'm afraid doing so could cause some public C API change. For example, the PyDescr_TYPE, if we implemented all the Descriptor object's d_type(that is the __objclass__ of all types of descriptors) based on weakref, we could have all the callers who call the descriptor functions to check weather the weak-referented class object has gone away.
msg112055 - (view) Author: Andrea Colangelo (warp10) Date: 2010-07-30 09:25
I can confirm this bug should be addressed some way. On my high-traffic server, keeping GC enabled causes performance issues due to this bug, and disabling it causes an out-of-memory within hours.
msg112059 - (view) Author: ysj.ray (ysj.ray) Date: 2010-07-30 10:16
Is this due to your adopting of Django?
Are there other guys who using Django suffer the same problem?
msg113126 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-08-06 19:38
Andrea: for the purpose of this tracker, a 'bug' is a discrepancy between doc and behavior. The fact that (new-style) classes have circular references was known when they were introduced in 2.2. That fact is not a bug in the above sense. It is a design tradeoff that works OK for the typical case of classes remaining until the end of a program run. Changing the tradeoff embodied in type() is not likely to be accepted.

That said, an alternate no_cr (no circular reference) metaclass (perhaps a type subclass) that contructed cr-free classes, even at the cost of slower lookup, might be a good addition to the stdlib. Ray's ideas seems to me at least plausible. My first thought on location is in functools.
msg113133 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-08-06 20:34
What about my suggestion of simply providing a convention on how to disable links manually, similar to how minidom.document does it with an "unlink" method?  If these cases are documented, then there shouldn't be any extra bother to do so.
We could provide an builtin unlink() function that calls any __unlink__ special method if present.

This sounds like a feature request, really, and one that is unlikely to get accepted since it sort of exposes an implementation detail of C python.  Oh well.
msg113158 - (view) Author: ysj.ray (ysj.ray) Date: 2010-08-07 09:39
krisvale:

> What about my suggestion of simply providing a convention on how to disable links manually, similar to how minidom.document does it with an "unlink" method?  If these cases are documented, then there shouldn't be any extra bother to do so.
We could provide an builtin unlink() function that calls any __unlink__ special method if present.


Maybe through this way such problems can be solved, but it seems it's not graceful and pythonic enough. The reason such an "unlink" method appears in xml.dom.minidom is because usually large xml document can eat up so much memory that we had to do something to make the memory garbaged sooner. But adding this as a standard __unlink__ method and requiring explicitly call the unlink() method make it similar as using c & c++, which require explicitly memory release. 

Besides, as you mentioned, "exposes an implementation detail of C python", it's better that the object reference mechanism be fully controlled by python core and not be awared by users at all. That is also one of the purposes of designing python. 

I prefer terry's idea, which suggests adding a metaclass that produces rc-free class to stdlib. These class should behave almost the same as the ones "type" metaclass produces. In order not to introduce the slower lookup, we can also add a little differences on usage of such classes, for example, internally store mro list without the class itself, but construct a full mro list when getting the __mro__ attribute, and different "super" implementation, since explicitly getting the mro is not so common in practice.
msg113181 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-08-07 15:38
I am opposed to a new 'unlink' builtin and '__unlink__' special method. There are more than enough builtin functions now, and this need is specific to those who both make lots temporary classes that they need garbage collected AND need to run with gc turned off. The intersection of these two small groups is probably very small. Since turning gc off is only an option with CPython (and PyPy?), this need is also implementation specific.

Before anything were added to the stdlib (and the need *might* be considered too rare for that) it should be tested in real use by more than one person. It appears that there are at least a couple of options for implementing a no_cr class. They could be attached to this issue, posted on the cookbook, or uploaded to PyPI.
msg227969 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-09-30 18:30
Does anyone wish to take this forward, is it simply dead in the water or what?  You might like to note that msg111963 refers to #812369 which was superseded by #18214 which was fixed in 3.4.
msg228674 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2014-10-06 12:41
I don't think this can go somewhere useful.
History
Date User Action Args
2014-10-06 12:41:18georg.brandlsetstatus: open -> closed
resolution: wont fix
messages: + msg228674
2014-09-30 18:30:53BreamoreBoysetnosy: + BreamoreBoy

messages: + msg227969
versions: + Python 3.5, - Python 3.2
2010-08-18 00:15:54dstaneksetnosy: + dstanek
2010-08-07 15:38:42terry.reedysetmessages: + msg113181
2010-08-07 13:04:33daniel.urbansetnosy: + daniel.urban
2010-08-07 09:39:18ysj.raysetmessages: + msg113158
2010-08-06 20:34:01kristjan.jonssonsetmessages: + msg113133
2010-08-06 19:38:11terry.reedysetnosy: + terry.reedy
messages: + msg113126
2010-07-30 10:16:37ysj.raysetmessages: + msg112059
2010-07-30 09:25:05warp10setnosy: + warp10
messages: + msg112055
2010-07-30 09:09:16ysj.raysetmessages: + msg112051
2010-07-29 21:28:05georg.brandllinkissue9366 superseder
2010-07-29 19:22:30kristjan.jonssonsetmessages: + msg111989
2010-07-29 16:57:54georg.brandlsetmessages: + msg111966
2010-07-29 16:36:39belopolskysetnosy: + belopolsky
messages: + msg111963
2010-07-29 16:32:53andrea.corbellinisetmessages: + msg111961
2010-07-29 16:17:17pitrousetnosy: + kristjan.jonsson
2010-07-29 16:04:25georg.brandlsetmessages: + msg111957
2010-07-29 15:53:28andrea.corbellinisetmessages: + msg111955
2010-07-29 15:25:05georg.brandlsetnosy: + georg.brandl
messages: + msg111951
2010-07-29 15:23:09andrea.corbellinisetmessages: + msg111950
2010-07-29 15:03:01ysj.raysetnosy: + ysj.ray
messages: + msg111946
2010-07-29 14:53:52pitrousetversions: - Python 2.6, Python 2.5, Python 3.1, Python 2.7, Python 3.3
2010-07-29 14:53:29pitrousetnosy: + pitrou
messages: + msg111945
2010-07-29 14:19:44andrea.corbellinicreate