classification
Title: Add a progress callback to gcmodule
Type: enhancement Stage: patch review
Components: Interpreter Core Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Jim.Jewett, Yury.Selivanov, asvetlov, ebfe, jimjjewett, kristjan.jonsson, lehmannro, michael.foord, pitrou, python-dev, stutzbach
Priority: normal Keywords: patch

Created on 2010-11-29 12:33 by kristjan.jonsson, last changed 2012-04-15 11:43 by kristjan.jonsson. This issue is now closed.

Files
File name Uploaded Description Edit
gccallback1.patch kristjan.jonsson, 2010-11-29 12:33 review
gccallback2.patch kristjan.jonsson, 2010-11-30 13:16 review
gccallback3.patch kristjan.jonsson, 2010-12-05 08:45 review
gccallback4.patch kristjan.jonsson, 2012-03-20 20:05 review
gccallback4a.patch Jim.Jewett, 2012-03-21 01:31 Based on krisvale's patch 4; see below
gccallback.patch kristjan.jonsson, 2012-04-07 13:00 review
gccallback.patch kristjan.jonsson, 2012-04-08 13:45 review
Messages (31)
msg122795 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-11-29 12:33
As discussed here: http://mail.python.org/pipermail/python-ideas/2010-November/008813.html:

Adding the ability to register callbacks to be invoked before and after garbage collection runs.  This can be used to gather run-time statistics such as timing information and frequency of garbage collection runs, and to perform application-specific cleanup of uncollecatable objects from gc.garbage.

The first patch is the code as currently in use in our codebase at CCP (ported from 2.7).  There is only one callback registered and the callback signature is perhaps a bit lame.  Also, no error checking.  But it is shown here for reference and as a basis for discussion.
msg122840 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-11-29 18:32
These functions will be very useful for any long-running program.  Thank you for the patch.

Would you be willing to write tests and documentation?

Would it make more sense for the callback to take a boolean instead of an integer as the first argument?
msg122844 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-11-29 18:40
Well, as you point out, it would make more sense to have two separate callbacks. Also, PyErr_WriteUnraisable() is better than PyErr_Clear().

Finally, you accidentally recoded the file; it should be kept utf-8, not latin-whatever.
msg122869 - (view) Author: Jim Jewett (jimjjewett) Date: 2010-11-29 20:47
I like the idea, but I do quibble with the signature.  As nearly as I can tell, you're assuming

(a)  Only one callback.  I would prefer a sequence of callbacks, to make cooperation easier.  (This does mean you would need a callback removal, instead of just setting it to None.)

(b)  The callback will be called once before collecting generations, and once after (with the number of objects that weren't collected).  Should these be separate callbacks, rather than the same one with a boolean?  And why does it need the number of uncollected objects?  (This might be a case where Practicality Beats Purity, but it is worth documenting.)
msg122901 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-11-30 13:16
Hi, as I stated, the original patch was simply our original implementation.
Here is a new patch.  It is simpler:
1) it exposes a gc.callbacks list where users can register themselves, in the spirit of sys.meta_path
2) One can have multiple callbacks
3) Improve error handling
4) The callback is called with a "phase" argument, currently 0 for start, and 1 for the end.

Let's start bikeshedding the calling signature.  I like having a single callback, since multiple callables are a nuisance to manage.

Once we agree, I'll post a patch for the documentation, and unittests.
msg122902 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-11-30 13:24
> Let's start bikeshedding the calling signature.  I like having a
> single callback, since multiple callables are a nuisance to manage.

IMO the callback should have a second argument as a dict containing
various statistics that we can expand over time. The generation number,
for example, should be present.

As for the phase number, if 0 means start and 1 means end, you can't
decently add another phase anyway (having 2 mean "somewhere between 0
and 1" would be completely confusing).

PS : please don't use C++-style comments in your patch
msg122903 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-11-30 13:27
You are right, Antoine.
How about a string and a dict?  the string can be "start" and "stop" and we can add interesting information to the dict as you suggest.
msg122913 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-11-30 16:10
> You are right, Antoine.
> How about a string and a dict?  the string can be "start" and "stop"
> and we can add interesting information to the dict as you suggest.

Looks good to me.
msg122914 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-11-30 16:32
> How about a string and a dict?  the string can be "start" and "stop"
> and we can add interesting information to the dict as you suggest.

I like where this is headed.  How about putting the string in the dict, too?

d['phase'] = 'start'
msg123224 - (view) Author: Robert Lehmann (lehmannro) * Date: 2010-12-03 10:06
A few issues I'd like to raise:

(1) Multiple callback chains.  Is there any code in your existing use case of GC callbacks where you don't check for the phase argument and follow different code paths depending on it?  If not, having two callback chains should be fine and takes the burden from the programmer to the implementors.  (This is feasible if we *only ever* have two values for the phase.)

(2) Single collections.  Currently, neither PyGC_Collect nor gc.collect() invoke the callbacks (since they do not call collect_generations).  Is this an oversight or intentional?

(3) Error checking.  What about callbacks which are bound to fail on each and every invocation, ie. because of wrong signatures.  Should these be flat-out rejected in some way *on registration*, automagically removed when first encountered, or are we okay with errors slammed into the user's face every so often because he should REALLY fix them?

(4) Interop.  Can this be supported as easily on other VMs?  (That's perhaps a good reason for the statistics to be a dict, for GCs providing vastly different amounts of information.)
msg123225 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-12-03 10:13
1) I'm not sure what you are asking.  Does anyone think that it is simpler to register two different callbacks than one?  IMHO it complicates the interface and creates so many oppertunities to do things incorrectly.

2)No, it is an oversight, let me verify the code.

3)I don't think we ought to be smart and try to remove callbacks.  I don't think that is common practice, the developer will know soon enough if things don't work.  Just keep it simple.

4)Interop, I think, is not an issue.  the gc module is a C-Python only implementation detail.  Every implementation has the freedom to collect garbage in its own way.

I'll post an updated version soon.
msg123253 - (view) Author: Jim Jewett (jimjjewett) Date: 2010-12-03 15:29
<blockquote>Does anyone think that it is simpler to register two different callbacks than one? </blockquote>

Moderately, yes.

Functions that actually help with cleanup should normally be run only in one phase; it is just stats-gathering and logging functions that might run both times, and I don't mind registering those twice.

For functions that are run only once (which I personally think is the more normal case), the choices are between

    @register_gc
    def my_callback(actually_run_flag, mydict):
        if not actually_run_flag:
            return
        ...

vs

    @register_gc_before
    def my_callback(mydict):
        ...
msg123415 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-12-05 08:45
Here is a third patch.  The callback now gets two argument, phase and info.
I've added documentation and unittests.
msg124508 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-12-22 13:56
Uh oh.  I forgot about this and there now we have passed beta 2.
Didn't anyone want to review the patch?
msg124570 - (view) Author: Lukas Lueg (ebfe) Date: 2010-12-23 21:26
Why not make the start-callback be able to return a boolean value to the gcmodule that indicates if garbage collection should take place or not.

For example, any value returned from the callback that evaluates to False (like null) will cause the module to evaluate any other callback and possibly collect garbage objects. Any value that evaluates to True (like True) returned from any callback causes all further callbacks to not be called and garbage collection not to take place now.
msg124631 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-12-25 06:52
Well, the idea is good and it did cross my mind.  Particularly it could be useful for performance sensitive applications.
However it complicates things.
1) If a GC is rejected, when do we make the next attempt?
2) If a callback cancels GC, then what about the other callbacks that have already been called?  They would need to get some "canceled" call.  But this is ok, I suppose.

Since we have already passed the beta 2, I'll see if I can come up with a simple change, particularly if we don't descend into some infinite loop wrt. 1) above.
msg124666 - (view) Author: Lukas Lueg (ebfe) Date: 2010-12-26 17:05
Collection may re-occur at any time, there is no promise to the callback code. However, the callback can disable the gc, preventing further collection.

I don't think we need the other callbacks to be informed. As the callbacks are worked down in the order they registered, whoever comes first is served first. Returning True from the callback is mereley a "I dont mind if gc happens now..."
msg124698 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-12-27 02:35
1) what I mean is that if a callback rejects GC, the GC algorithm may find its condition for invoking GC in the first place to be still valid immediately afterwards, so doing a GC will be immediately retried.  I have to check, but it could mean that more changes would be required.
2) Of course callbacks have to know, e.g. those that intend to gather statisctic or measure the time of GC.  They have started a timer on the "start" opcode, and expect a "stop" code to follow.  They have to get some "canceled" code for their bookkeeping to work.

Then additionally we have the question: Should you be able to cancel a direct gc request (like calling gc.collect()) or just the automatic one?

This then starts to be a much more complicated change, perhaps one that requires a PEP so I don't think we should do all of that in one gulp.  Once the callback mechanism is in, there is every oppertunity to extend it.
msg124705 - (view) Author: Lukas Lueg (ebfe) Date: 2010-12-27 07:25
Agreed, let's have the simple callback first.

To solve 2) later on, we could have the callback proposed here be the 'execution'-callback. It neither has nor will have the capability to prevent garbage-collection.
We can introduce another 'prepare'-callback later which is called when the gc-modules decides that it is time for collection. Callbacks may react with a negative value so execution does not happen and the execution-callbacks are also never called.
msg156450 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-03-20 19:26
Hi!
Michael Foord reminded me of this little gem.  I'm getting this started again, hopefully for inclusion in 3.3
msg156453 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-03-20 20:05
Hm, actually it wasn't Michael, but Martin. No matter!  Here is a proposed patch, as promised without all the bells and whistles, in particular, there is no feature to cancel the garbage collection.

Can we get this into 3.3?
msg156471 - (view) Author: Jim Jewett (Jim.Jewett) Date: 2012-03-21 01:31
gccallback4a.patch is a few changes to gccallback4.patch.  Most are just spelling or grammar in comments, but I did modify the test case so that the Uncollectable loop had *multiple* objects with __del__ methods.
msg157172 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-03-31 11:36
Thanks, Jim.
Unless anyone objects, I'll commit this then.
msg157176 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-03-31 11:57
Comments:

- the tests look fragile. How can you know a garbage collection will only collect your own objects? So you should call gc.collect() first at the beginning of each test and then initialize the self.visit list. We don't want weird failures because of the unittest machinery or anything else.

- I also don't understand the logic in testCollect. Why can't you directly check the contents of self.visit instead of that convoluted code?

- In invoke_gc_callback(), "i" should be a Py_ssize_t, not an int

- In invoke_gc_callback(), in which situation can callbacks be something else than a list? I think the PyList_Check() should be an assert (and probably use PyList_CheckExact()).

- Finally, *please* try to follow PEP 8. Comments should have a space after the "#". Otherwise they look unreadable.
msg157187 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-03-31 13:27
Thank you Antoine. Your points:
- Yes, I can robustify this.
- I think I worried that the actual contents might be too complex to test for it. I'll see if I can't just simplify it as you suggest.
- right, thanks.
- gc.callbacks is a simple module attribute.  Anyone can set it to anything else, e.g. gc.callbacks=None.  We have to accomodate this possibility or else introduce annoying api functions to edit the list.  I thought it best to do things similarly to sys.import_hooks etc, simply expose a list object and trust the user to treat this list object with care, but check it regardless to avoid crashing.
- Pep8, yes sorry.  I do a lot of programming in other projects that are more lenient in comment styles, so it doesn't come automatically.  I personally have no problem whatsoever #to #read #such #comments :)
msg157188 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-03-31 13:35
> - gc.callbacks is a simple module attribute.  Anyone can set it to
> anything else, e.g. gc.callbacks=None.  We have to accomodate this
> possibility or else introduce annoying api functions to edit the list.
> I thought it best to do things similarly to sys.import_hooks etc,
> simply expose a list object and trust the user to treat this list
> object with care, but check it regardless to avoid crashing.

The way I read it, you don't fetch it from the module dictionary,
though, you just use the static C variable, which shouldn't change when
the dict is mutated.
msg157660 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-04-06 11:46
You are right, I was thinking more of pyobject attributes.  I'll fix this then.
msg157733 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-04-07 13:00
Here is an updated patch, taking Jim's and Antoine's comments into account.

Jim, I´d like to comment that I think the reason __del__ objects are uncollectable is more subtle than there being no defined order of calling the __del__ functions.  More significantly, no python code may be executed during an implicit garbage collection.
Now, it is possible that one could clean up cycles containing only one __del__ method during _expcicit_ collections (calling gc.collect()) but it hardly seems worth the effort.
msg157734 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-07 13:48
Uploaded another review.
I also notice you didn't really address my point, since self.visit is still initialized too early. IMO it should be initialized after the first gc.collect() at the beginning of each test (not in setUp()).
msg157788 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2012-04-08 13:45
A new patch, taking Antoine's review and comments into account.
msg158320 - (view) Author: Roundup Robot (python-dev) Date: 2012-04-15 11:42
New changeset 88f8ef5785d7 by Kristján Valur Jónsson in branch 'default':
Issue #10576: Add a progress callback to gcmodule
http://hg.python.org/cpython/rev/88f8ef5785d7
History
Date User Action Args
2012-04-15 11:43:36kristjan.jonssonsetstatus: open -> closed
resolution: fixed
2012-04-15 11:42:22python-devsetnosy: + python-dev
messages: + msg158320
2012-04-08 13:45:19kristjan.jonssonsetfiles: + gccallback.patch

messages: + msg157788
2012-04-07 13:48:35pitrousetmessages: + msg157734
2012-04-07 13:00:09kristjan.jonssonsetfiles: + gccallback.patch

messages: + msg157733
2012-04-06 11:46:29kristjan.jonssonsetmessages: + msg157660
2012-03-31 13:35:27pitrousetmessages: + msg157188
2012-03-31 13:27:42kristjan.jonssonsetmessages: + msg157187
2012-03-31 11:57:56pitrousetmessages: + msg157176
2012-03-31 11:36:55kristjan.jonssonsetmessages: + msg157172
2012-03-21 01:31:45Jim.Jewettsetfiles: + gccallback4a.patch
nosy: + Jim.Jewett
messages: + msg156471

2012-03-20 20:05:01kristjan.jonssonsetfiles: + gccallback4.patch

messages: + msg156453
2012-03-20 19:26:57kristjan.jonssonsetnosy: + michael.foord
messages: + msg156450
2010-12-27 07:25:26ebfesetnosy: jimjjewett, pitrou, kristjan.jonsson, lehmannro, stutzbach, ebfe, asvetlov, Yury.Selivanov
messages: + msg124705
2010-12-27 02:35:59kristjan.jonssonsetnosy: jimjjewett, pitrou, kristjan.jonsson, lehmannro, stutzbach, ebfe, asvetlov, Yury.Selivanov
messages: + msg124698
2010-12-26 17:05:06ebfesetnosy: jimjjewett, pitrou, kristjan.jonsson, lehmannro, stutzbach, ebfe, asvetlov, Yury.Selivanov
messages: + msg124666
2010-12-25 06:52:05kristjan.jonssonsetnosy: jimjjewett, pitrou, kristjan.jonsson, lehmannro, stutzbach, ebfe, asvetlov, Yury.Selivanov
messages: + msg124631
2010-12-23 21:26:22ebfesetnosy: + ebfe
messages: + msg124570
2010-12-22 13:56:14kristjan.jonssonsetnosy: jimjjewett, pitrou, kristjan.jonsson, lehmannro, stutzbach, asvetlov, Yury.Selivanov
messages: + msg124508
2010-12-05 14:09:46pitrousetstage: patch review
versions: + Python 3.3, - Python 3.2
2010-12-05 08:45:03kristjan.jonssonsetfiles: + gccallback3.patch

messages: + msg123415
2010-12-03 20:01:02Yury.Selivanovsetnosy: + Yury.Selivanov
2010-12-03 15:29:11jimjjewettsetmessages: + msg123253
2010-12-03 10:13:52kristjan.jonssonsetmessages: + msg123225
2010-12-03 10:06:58lehmannrosetnosy: + lehmannro
messages: + msg123224
2010-11-30 16:32:54stutzbachsetmessages: + msg122914
2010-11-30 16:10:46pitrousetmessages: + msg122913
2010-11-30 13:27:40kristjan.jonssonsetmessages: + msg122903
2010-11-30 13:24:19pitrousetmessages: + msg122902
2010-11-30 13:16:22kristjan.jonssonsetfiles: + gccallback2.patch

messages: + msg122901
2010-11-29 20:47:00jimjjewettsetnosy: + jimjjewett
messages: + msg122869
2010-11-29 18:40:19pitrousetnosy: + pitrou
messages: + msg122844
2010-11-29 18:32:48stutzbachsetnosy: + stutzbach
messages: + msg122840
2010-11-29 14:06:36asvetlovsetnosy: + asvetlov
2010-11-29 12:33:06kristjan.jonssoncreate