classification
Title: Consider integration of PyObject_GC_UnTrack() with the trashcan C API
Type: enhancement Stage: needs patch
Components: Versions:
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: nascheme, pablogsal, tim.peters, vstinner
Priority: normal Keywords: patch

Created on 2021-08-10 21:07 by nascheme, last changed 2021-08-16 21:40 by nascheme.

Pull Requests
URL Status Linked Edit
PR 27718 open nascheme, 2021-08-10 21:49
Messages (15)
msg399356 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2021-08-10 21:07
The fix for bpo-33930 includes a somewhat mysterious comment:

    // The Py_TRASHCAN mechanism requires that we be able to
    // call PyObject_GC_UnTrack twice on an object.

I wonder if we can just integrate the untrack into the body of the trashcan code.  Then, the explicit call to untrack in the dealloc function body can be removed.  That removes the risk of incorrectly using the macro version.

I suspect the reason this was not done originally is because the original trashcan mechanism did not use the GC header pointers to store objects.  Now, any object that uses the trashcan *must* be a GC object.
msg399358 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-08-10 21:28
We probably will need a new set of macros because these macros are public IIRC correctly. Although as PyObject_GC_UnTrack has a check, we probably can just absorb it and let backwards compat work by being idempotent if called twice
msg399359 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2021-08-10 22:02
Extensions that call PyObject_GC_UnTrack before calling Py_TRASHCAN_BEGIN will still work, they will just take a very minor performance hit.  I don't think it is worth the trouble to introduce new macros for that reason.  Extensions that really care about performance can wrap the call in a Python version ifdef.

There is an issue if someone writes and tests their extension with the new API, i.e. without having the explicit PyObject_GC_UnTrack() call in their dealloc method.  If they compile with an older Python, they likely get a crash.  If they compile with asserts enable, they would get an assert fail in _PyTrash_thread_deposit_object, i.e.:

    _PyObject_ASSERT(op, !_PyObject_GC_IS_TRACKED(op));

I guess that's an argument for new macros.
msg399360 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2021-08-10 22:06
For examples of bugs caused by forgetting the untrack calls, see bpo-31095.
msg399367 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2021-08-10 23:38
Since C99 is now allowed, perhaps we should suggest that declarations go after Py_TRASHCAN_BEGIN, e.g.

	mytype_dealloc(mytype *p)
	{
	    Py_TRASHCAN_BEGIN(p, mytype_dealloc)
	    ... declarations go here ...
	    ... The body of the deallocator goes here, including all calls ...
	    ... to Py_DECREF on contained objects.                         ...
	    Py_TRASHCAN_END                // there should be no code after this
	}

The only dealloc method that doesn't fit this pattern is subtype_dealloc() and that's because the object might not be a GC object.

Given the pattern, it occurs to me that that _Py_Dealloc() could do the trashcan begin/end work.  However, the authors of the dealloc methods would have to realize the special rules of the trashcan (e.g. no returns from the dealloc method body).  Also, there would be some overhead added to _Py_Dealloc() to check if the trashcan is supported.  E.g. checking a type flag.

Another idea would be to add a new slot for the method, e.g. tp_dealloc_trash.  Then, _Py_Dealloc() could check if it exists and if so do the trashcan begin/end dance around it.  That would still add some overhead to _Py_Dealloc() so I think the current macros are the best approach.
msg399377 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-08-11 09:36
Please don't add new macros. The TRASHCAN C API is already complex enough.
msg399378 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-08-11 09:42
Moreover, before making any change, I would suggest by starting with documenting the existing trashcan C API. Right now, there is no documentation in Doc/c-api/. Documentation can be copied from Include/cpython/object.h.

Last year, I modified the trashcan API to hide implementation details. There were also issues with the stable ABI... thing which is unclear to me, since the trashcan C API is excluded from the limited C API. Well, I excluded it. Previously, it was part of it, but it did not work with the limited C API.

commit 0fa4f43db086ac3459811cca4ec5201ffbee694a
Author: Victor Stinner <vstinner@python.org>
Date:   Wed Feb 5 12:23:27 2020 +0100

    bpo-39542: Exclude trashcan from the limited C API (GH-18362)
    
    Exclude trashcan mechanism from the limited C API: it requires access to
    PyTypeObject and PyThreadState structure fields, whereas these structures
    are opaque in the limited C API.
    
    The trashcan mechanism never worked with the limited C API. Move it
    from object.h to cpython/object.h.

commit 38965ec5411da60d312b59be281f3510d58e0cf1
Author: Victor Stinner <vstinner@python.org>
Date:   Fri Mar 13 16:51:52 2020 +0100

    bpo-39947: Hide implementation detail of trashcan macros (GH-18971)
    
    Py_TRASHCAN_BEGIN_CONDITION and Py_TRASHCAN_END macro no longer
    access PyThreadState attributes, but call new private
    _PyTrash_begin() and _PyTrash_end() functions which hide
    implementation details.

commit db532a09990c837ec1348e6e6bd2719f5d4a8216
Author: Victor Stinner <vstinner@python.org>
Date:   Wed Jun 23 15:51:47 2021 +0200

    bpo-39947: Remove old private trashcan C API functions (GH-26869)
    
    Remove 4 C API private trashcan functions which were only kept for
    the backward compatibility of the stable ABI with Python 3.8 and
    older, since the trashcan API was not usable with the limited C API
    on Python 3.8 and older. The trashcan API was excluded from the
    limited C API in Python 3.9.
    
    Removed functions:
    
    * _PyTrash_deposit_object()
    * _PyTrash_destroy_chain()
    * _PyTrash_thread_deposit_object()
    * _PyTrash_thread_destroy_chain()
    
    The trashcan C API was never usable with the limited C API, since old
    trashcan macros accessed directly PyThreadState members like
    "_tstate->trash_delete_nesting", whereas the PyThreadState structure
    is opaque in the limited C API.
    
    Exclude also the PyTrash_UNWIND_LEVEL constant from the C API.
    
    The trashcan C API was modified in Python 3.9 by commit
    38965ec5411da60d312b59be281f3510d58e0cf1 and in Python 3.10 by commit
    ed1a5a5baca8f61e9a99c5be3adc16b1801514fe to hide implementation
    details.
msg399379 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-08-11 09:46
> Now, any object that uses the trashcan *must* be a GC object.

It would be nice to add an assertion in _PyTrash_begin().
msg399426 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2021-08-12 00:19
I was thinking about this more today and I think the better fix is to actually build the trashcan mechanism into _Py_Dealloc().  Requiring that types opt-in to the trashcan mechanism by using the trashcan macros is not ideal.  

First, the trashcan macros are a bit tricky to use correctly.  Second, every "container" type is potentially a part of a long ref chain and could blow up the stack on deallocation (i.e. triggered from DECREF).  So, for correctness/robustness, every type that supports cyclic GC should get trashcan-style deallocation.

We would have to find a way to do this without incurring a (significant) performance hit.  Also, it would have to be done without breaking C extensions.  Ideally they would get trashcan-style deallocation without any source code changes.  I'm not sure if that can be done but I'm hopeful it's possible.
msg399434 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2021-08-12 06:38
I wrote a proof-of-concept as bpo-44897.  PR 27718 (this issue) might a slightly better performance but I like the other one better because it doesn't expose so much implementation detail to extension types.  Either of them require careful review before merging.
msg399437 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-08-12 10:00
The problem of PyObject_GC_UnTrack() is just the most visible effect of the trashcan mecanism: tp_dealloc can be called twice, and this is *not* expected by the tp_dealloc API.

Putting trashcan mecanism outside tp_dealloc can allow to make sure that tp_dealloc is called exactly once. _Py_Dealloc() sounds like a good candidate, but I didn't check if it's the only way to call tp_dealloc. Are there other places where tp_dealloc is called *directly*?

Using military grade regex, I found the following functions calling tp_dealloc:

grep 'dealloc[) ]*([a-zA-Z]\+)' */*.c

* _Py_Dealloc() obviously
* _PyTrash_thread_destroy_chain()
* subtype_dealloc()
* Modules/_testcapimodule.c: check_pyobject_freed_is_freed() <= unit test, it can be ignored

So if we move the trashcan usage inside functions, 3 functions must be modified:

* _Py_Dealloc()
* _PyTrash_thread_destroy_chain()
* subtype_dealloc()
msg399495 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2021-08-13 06:44
> The problem of PyObject_GC_UnTrack() is just the most visible effect of the
> trashcan mecanism: tp_dealloc can be called twice, and this is *not* expected
> by the tp_dealloc API.

The fact that Py_TRASHCAN_BEGIN and Py_TRASHCAN_END can cause tp_dealloc to be
called more than once is not a huge issue, IMHO.  The statements executed more
than once are the ones before or after the trashcan macros.  If you use them
properly, the multiple calls doesn't cause issues.  The big trap is
to use _PyObject_GC_UNTRACK before Py_TRASHCAN_BEGIN.  It is not safe to call
that macro multiple times.  Anywhere you see Py_TRASHCAN_BEGIN and
_PyObject_GC_UNTRACK before it, that's a bug.

The point of this PR is to make the macros harder to use incorrectly.  If there
is no need to call untrack, Py_TRASHCAN_BEGIN and Py_TRASHCAN_END can normally
be the first and last lines of the tp_dealloc function (declarations can be
inside).  Then, the fact that tp_dealloc is actually called more than once by the
trashcan doesn't cause an issue.

tp_dealloc can get called more than once for another reason.  If the object is
resurrected by tp_finalize or tp_del, then tp_dealloc can be called again.
It's pretty mind bending to try to understand all of implications of that. Most
of the ugliness is inside PyObject_CallFinalizerFromDealloc().  The type with
the finalizer has to be careful not to put the object in an invalid state
before it gets resurrected.

> Putting trashcan mecanism outside tp_dealloc can allow to make sure that
> tp_dealloc is called exactly once. _Py_Dealloc() sounds like a good
> candidate, but I didn't check if it's the only way to call tp_dealloc. Are
> there other places where tp_dealloc is called *directly*?

tp_dealloc is called from a few more places, as you found.  As for providing
C-stack exhaustion protection via the trashcan, I think we are only interested
in places were tp_dealloc is called as a result of a DECREF.  And I think that
only goes through _Py_Dealloc().

I suppose it would be possible to blow up the C stack via a call chain like:

    subtype_dealloc -> basedealloc -> subtype_dealloc -> basedealloc -> ...

I think you would have to make a long chain of subclasses.  Seems unlikely in
real code so I'm not worried about trying to fix that.

> Using military grade regex, I found the following functions calling tp_dealloc:
> 
> grep 'dealloc[) ]*([a-zA-Z]\+)' */*.c
> 
> * _Py_Dealloc() obviously
> * _PyTrash_thread_destroy_chain()
> * subtype_dealloc()
> * Modules/_testcapimodule.c: check_pyobject_freed_is_freed() <= unit test, it can be ignored
> 
> So if we move the trashcan usage inside functions, 3 functions must be modified:
> 
> * _Py_Dealloc()
> * _PyTrash_thread_destroy_chain()
> * subtype_dealloc()

Take a look at bpo-44897.  It implements the _Py_Dealloc approach. You can see
what needs to be modified.  I removed the Py_TRASHCAN_BEGIN/Py_TRASHCAN_END
macros as well, just because they are not needed anymore.  _PyObject_GC_UNTRACK
calls inside tp_dealloc methods need to be removed because _Py_Dealloc already
does the untrack.  For external extensions, they must be using
PyObject_GC_UnTrack() and so that's safe (object is already untracked and so it
does nothing).

I think the big change is calling tp_dealloc methods with the object already
untracked.  If an extension did something like:

mytype_dealloc(PyObject *self)
{
	...
	assert(PyObject_GC_IsTracked(self));
	...
}

That would break after PR 27738, at least as it's currently written.

The other issue I'm not quite sure about is that
PyObject_CallFinalizerFromDealloc insists that GC objects are tracked when that
function is called.  I think it is okay to just call _PyObject_GC_TRACK on the
ones that are not tracked, to ensure that.  Maybe Tim Peters will jump in and
correct the errors of my thinking.
msg399519 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-08-13 10:41
Thanks Neil for the thorough explanation!

I think in any case we should benchmark this because this will affect *all* GC types if I understand correctly and the TS mechanism had shown slowdowns before
msg399548 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2021-08-13 17:31
> I think in any case we should benchmark this because this will affect *all*
> GC types if I understand correctly and the TS mechanism had shown slowdowns
> before

We definitely need to benchmark. I would guess that adding trashcan protection
to all GC types would not be a performance issue.  We already had the macros
for some performance critical ones (frame, tuple, list, dict).

The performance hit will likely come as a result of adding a branch that
uses _PyType_IS_GC() to the DECREF code path.  It means any time an object hits
zero refcount, we call _PyType_IS_GC() on it.  Previously, we would just call
tp_dealloc. 

Because of PEP 442, _PyType_IS_GC() checks not only a type flag but might also
call tp_is_gc.  So, benchmarks will need to be done.   We might get a small win
because the trashcan logic can be better optimized now that it's in a single
complied unit.

Small correction for my explaination above: if tp_dealloc gets called mutiple
times because of the trashcan, it is the code before the BEGIN macro that gets
called mutiple times.
msg399686 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2021-08-16 21:40
Another small correction, _PyType_IS_GC() checks only the type flag (Py_TPFLAGS_HAVE_GC).  _PyObject_IS_GC() checks both the type flag and calls tp_is_gc(), if it exists.  tp_is_gc() is wart, IMHO, and it would be nice if we can kill it off so only the type flag is needed.  That's a topic for a different issue though.
History
Date User Action Args
2021-08-16 21:40:17naschemesetmessages: + msg399686
2021-08-13 17:31:19naschemesetmessages: + msg399548
2021-08-13 10:41:08pablogsalsetmessages: + msg399519
2021-08-13 06:44:02naschemesetnosy: + tim.peters
messages: + msg399495
2021-08-12 10:00:41vstinnersetmessages: + msg399437
2021-08-12 06:38:42naschemesetmessages: + msg399434
2021-08-12 00:19:22naschemesetmessages: + msg399426
2021-08-11 09:46:27vstinnersetmessages: + msg399379
2021-08-11 09:43:05vstinnersettitle: Consider integration of GC_UNTRACK with TRASHCAN -> Consider integration of PyObject_GC_UnTrack() with the trashcan C API
2021-08-11 09:42:32vstinnersetmessages: + msg399378
2021-08-11 09:36:59vstinnersetnosy: + vstinner
messages: + msg399377
2021-08-10 23:38:25naschemesetmessages: + msg399367
2021-08-10 22:06:12naschemesetmessages: + msg399360
2021-08-10 22:02:12naschemesetmessages: + msg399359
stage: patch review -> needs patch
2021-08-10 21:49:40naschemesetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request26201
2021-08-10 21:28:41pablogsalsetnosy: + pablogsal
messages: + msg399358
2021-08-10 21:07:59naschemecreate