Author pitrou
Recipients barry, benjamin.peterson, davin, inada.naoki, lukasz.langa, nascheme, pitrou, rhettinger, tim.peters, vstinner, yselivanov
Date 2017-09-25.19:07:42
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <f5d553a6-da2e-8a3e-e61c-339bfea4ed1b@free.fr>
In-reply-to <1506365751.51.0.196053174056.issue31558@psf.upfronthosting.co.za>
Content
Le 25/09/2017 à 20:55, Neil Schemenauer a écrit :
> 
> I think the basic idea makes a lot of sense, i.e. have a generation that is never collected.  An alternative way to implement it would be to have an extra generation, e.g. rather than just 0, 1, 2 also have generation 3.  The collection would by default never collect generation 3.  Generation 4 would be equivalent to the frozen generation.  You could still force collection by calling gc.collect(3).

API-wise it would sound better to have a separate gc.collect_frozen()...

Though I think a gc.unfreeze() that moves the frozen generation into the
oldest non-frozen generation would be useful too, at least for testing
and experimentation.

> I think issue 3110 (https://bugs.python.org/issue31105) is also related.  The current GC thresholds are not very good.  I've look at what Go does and the GC collection is based on a relative increase in memory usage.  Python could do perhaps something similar.  The accounting of actual bytes allocated and deallocated is tricky because the *_Del/Free functions don't actually know how much memory is being freed, at least not in a simple way.

Yeah... It's worse than that.  Take for example a bytearray object.  The
basic object (the PyByteArrayObject structure) is quite small.  But it
also has a separately-allocated payload that is deleted whenever
tp_dealloc is called.  The GC isn't aware of that payload.  Worse, the
payload can (and will) change size during the object's lifetime, without
the GC's knowledge about it ever being updated. (*)

IMHO, the only reliable way to use memory footprint to drive the GC
heuristic would be to force all allocations into our own allocator, and
reconcile the GC with that allocator (instead of having the GC be its
own separate thing as is the case nowadays).

(*) And let's not talk about hairier cases, such as having multiple
memoryviews over the same very large object...

PS: every heuristic has its flaws.  As I noted on python-(dev|ideas),
full GC runtimes such as most Java implementations are well-known for
requiring careful tuning of GC parameters for "non-usual" workloads.  At
least reference counting makes CPython more robust in many cases.
History
Date User Action Args
2017-09-25 19:07:42pitrousetrecipients: + pitrou, tim.peters, barry, nascheme, rhettinger, vstinner, benjamin.peterson, inada.naoki, lukasz.langa, yselivanov, davin
2017-09-25 19:07:42pitroulinkissue31558 messages
2017-09-25 19:07:42pitroucreate