This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author mistasse
Recipients mistasse, pablogsal
Date 2019-12-16.14:31:31
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1576506693.01.0.0561513913872.issue39061@roundup.psfhosted.org>
In-reply-to
Content
> The fact that the GC will take longer time does not qualify this as memory leaks. A memory leak is by definition memory that cannot be reclaimed and in this case, once the collection of the old generation happens it will be collected, therefore is not a "leak" per se.

I completely agree, I did not know what to call that. It's just that I was really believing to write GC-friendly code, but that my assumptions were very wrong. The result of my investigation seems so counter-intuitive that I tend to believe this is a bug introduced by a quick implementation shortcut and an optimization.

I wouldn't have reported it if I had been able to mitigate it by setting the gc parameters. My first idea was to set a threshold0 such that I'm certain I don't keep references for that amount of time. But the one I'm using at the moment of the second generation collection will go in the old generation, it is the only one but it will for sure. As those are the only "new" objects that go to the old generation, it really takes a long time for it to grow sufficiently to get collected.

> This has also other downsides, like objects that won't be collected will suffer more traversals and collections, that can be impactful in performance, so is not that simple.

I don't think it will be that impactful because of traversals and collections. I think it was mostly convenient to merge the generations in the "collect" function, and that not merging them will be a bit more tedious.

If there is a second gen collection every 10 young gen collection, then it just introduces some more objects in the next second gen collection every second gen collection rather than putting them in the old generation directly.

To me, it is unintended that all the objects that are reachable during a second gen collection are put in the old generation. There is a high probability we have some short-lived objects there. It wouldn't be as problematic to traverse them once more in the next second gen. I tend to believe it is the purpose of the old gen not to be reachable in less than 2 passes.

Hopefully, for my personal, non artificial case, there are other assumptions I can make so I used weakrefs. I have one parent with children pointing to it, they just point with weakrefs now, and I know I always keep a reference to the parent or are OK to let them all go as refcount(parent) == 0. I'm very grateful I don't have to do gc.collect, which indeed was the next option.

Thank you for taking the problem seriously, and for the time you may dedicate to it. No need to be quick, I just wanted to raise that question. I am of course interested in the bad consequences it could bring (at the core of such a broadly used language, I would expect there are some), but at the same time, it is such a rare event and very localized and counter-intuitive in the implementation that it would surprise me.
History
Date User Action Args
2019-12-16 14:31:33mistassesetrecipients: + mistasse, pablogsal
2019-12-16 14:31:33mistassesetmessageid: <1576506693.01.0.0561513913872.issue39061@roundup.psfhosted.org>
2019-12-16 14:31:32mistasselinkissue39061 messages
2019-12-16 14:31:31mistassecreate