Message 302972 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	pitrou
Recipients	barry, benjamin.peterson, davin, lukasz.langa, methane, nascheme, pitrou, rhettinger, tim.peters, vstinner, yselivanov
Date	2017-09-25.19:07:42
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<f5d553a6-da2e-8a3e-e61c-339bfea4ed1b@free.fr>
In-reply-to	<1506365751.51.0.196053174056.issue31558@psf.upfronthosting.co.za>

Content
Le 25/09/2017 à 20:55, Neil Schemenauer a écrit : > > I think the basic idea makes a lot of sense, i.e. have a generation that is never collected. An alternative way to implement it would be to have an extra generation, e.g. rather than just 0, 1, 2 also have generation 3. The collection would by default never collect generation 3. Generation 4 would be equivalent to the frozen generation. You could still force collection by calling gc.collect(3). API-wise it would sound better to have a separate gc.collect_frozen()... Though I think a gc.unfreeze() that moves the frozen generation into the oldest non-frozen generation would be useful too, at least for testing and experimentation. > I think issue 3110 (https://bugs.python.org/issue31105) is also related. The current GC thresholds are not very good. I've look at what Go does and the GC collection is based on a relative increase in memory usage. Python could do perhaps something similar. The accounting of actual bytes allocated and deallocated is tricky because the _Del/Free functions don't actually know how much memory is being freed, at least not in a simple way. Yeah... It's worse than that. Take for example a bytearray object. The basic object (the PyByteArrayObject structure) is quite small. But it also has a separately-allocated payload that is deleted whenever tp_dealloc is called. The GC isn't aware of that payload. Worse, the payload can (and will) change size during the object's lifetime, without the GC's knowledge about it ever being updated. () IMHO, the only reliable way to use memory footprint to drive the GC heuristic would be to force all allocations into our own allocator, and reconcile the GC with that allocator (instead of having the GC be its own separate thing as is the case nowadays). (*) And let's not talk about hairier cases, such as having multiple memoryviews over the same very large object... PS: every heuristic has its flaws. As I noted on python-(dev\|ideas), full GC runtimes such as most Java implementations are well-known for requiring careful tuning of GC parameters for "non-usual" workloads. At least reference counting makes CPython more robust in many cases.

Le 25/09/2017 à 20:55, Neil Schemenauer a écrit :
> 
> I think the basic idea makes a lot of sense, i.e. have a generation that is never collected.  An alternative way to implement it would be to have an extra generation, e.g. rather than just 0, 1, 2 also have generation 3.  The collection would by default never collect generation 3.  Generation 4 would be equivalent to the frozen generation.  You could still force collection by calling gc.collect(3).

API-wise it would sound better to have a separate gc.collect_frozen()...

Though I think a gc.unfreeze() that moves the frozen generation into the
oldest non-frozen generation would be useful too, at least for testing
and experimentation.

> I think issue 3110 (https://bugs.python.org/issue31105) is also related.  The current GC thresholds are not very good.  I've look at what Go does and the GC collection is based on a relative increase in memory usage.  Python could do perhaps something similar.  The accounting of actual bytes allocated and deallocated is tricky because the *_Del/Free functions don't actually know how much memory is being freed, at least not in a simple way.

Yeah... It's worse than that.  Take for example a bytearray object.  The
basic object (the PyByteArrayObject structure) is quite small.  But it
also has a separately-allocated payload that is deleted whenever
tp_dealloc is called.  The GC isn't aware of that payload.  Worse, the
payload can (and will) change size during the object's lifetime, without
the GC's knowledge about it ever being updated. (*)

IMHO, the only reliable way to use memory footprint to drive the GC
heuristic would be to force all allocations into our own allocator, and
reconcile the GC with that allocator (instead of having the GC be its
own separate thing as is the case nowadays).

(*) And let's not talk about hairier cases, such as having multiple
memoryviews over the same very large object...

PS: every heuristic has its flaws.  As I noted on python-(dev|ideas),
full GC runtimes such as most Java implementations are well-known for
requiring careful tuning of GC parameters for "non-usual" workloads.  At
least reference counting makes CPython more robust in many cases.

History
Date	User	Action	Args
2017-09-25 19:07:42	pitrou	set	recipients: + pitrou, tim.peters, barry, nascheme, rhettinger, vstinner, benjamin.peterson, methane, lukasz.langa, yselivanov, davin
2017-09-25 19:07:42	pitrou	link	issue31558 messages
2017-09-25 19:07:42	pitrou	create