Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add C API for gc.enable, gc.disable, and gc.isenabled #72441

Closed
llllllllll mannequin opened this issue Sep 23, 2016 · 12 comments
Closed

Add C API for gc.enable, gc.disable, and gc.isenabled #72441

llllllllll mannequin opened this issue Sep 23, 2016 · 12 comments
Labels
3.10 only security fixes performance Performance or resource usage topic-C-API

Comments

@llllllllll
Copy link
Mannequin

llllllllll mannequin commented Sep 23, 2016

BPO 28254
Nosy @gpshead, @scoder, @vstinner, @encukou, @llllllllll, @da-woods
PRs
  • bpo-28254: Add a C-API for controlling the state of the garbage collector #25687
  • bpo-28254: _posixsubprocess uses PyGC_Enable/PyGC_Disable #25693
  • bpo-28254: Cleanup test_subprocess.test_preexec_gc_module_failure() #25709
  • bpo-28254: Add PyGC_ functions to the stable ABI manifest #25720
  • Files
  • gc-capi.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-04-28.16:15:19.646>
    created_at = <Date 2016-09-23.06:21:47.553>
    labels = ['expert-C-API', '3.10', 'performance']
    title = 'Add C API for gc.enable, gc.disable, and gc.isenabled'
    updated_at = <Date 2021-04-29.13:47:06.944>
    user = 'https://github.com/llllllllll'

    bugs.python.org fields:

    activity = <Date 2021-04-29.13:47:06.944>
    actor = 'petr.viktorin'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-04-28.16:15:19.646>
    closer = 'scoder'
    components = ['C API']
    creation = <Date 2016-09-23.06:21:47.553>
    creator = 'llllllllll'
    dependencies = []
    files = ['44787']
    hgrepos = []
    issue_num = 28254
    keywords = ['patch']
    message_count = 12.0
    messages = ['277245', '277295', '277557', '277569', '358330', '391274', '391291', '392195', '392230', '392239', '392289', '392308']
    nosy_count = 6.0
    nosy_names = ['gregory.p.smith', 'scoder', 'vstinner', 'petr.viktorin', 'llllllllll', 'da-woods']
    pr_nums = ['25687', '25693', '25709', '25720']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'resource usage'
    url = 'https://bugs.python.org/issue28254'
    versions = ['Python 3.10']

    @llllllllll
    Copy link
    Mannequin Author

    llllllllll mannequin commented Sep 23, 2016

    I was writing an extension module that was working with weakrefs and wanted to ensure that the GC would not run for a block of code. I noticed that gc.enable/gc.disable are not exposed to C and the state of the gc is in a static variable so it cannot be mutated from an extension.

    This change adds an easier way to access this functionality in the C api without going through an import or PyObject_Call.

    I am wondering where to document these functions as well as PyGC_Collect. I didn't see that function anywhere in the docs (and didn't know about it) so I would like to make them more visible. My first thought was Doc/c-api/gcsupport.rst but this is not in service of supporting the GC, it is about the state of the GC itself. If that seems like a decent place I could add a small section at the bottom about interacting with the GC and document these new functions and PyGC_Collect.

    I'm sorry if this is exposed elsewhere. If it is I can try to add some links to i

    @llllllllll llllllllll mannequin added 3.7 (EOL) end of life extension-modules C modules in the Modules dir labels Sep 23, 2016
    @rhettinger
    Copy link
    Contributor

    What use case do you have that warrants a C API expansion rather than just using PyObject_Call? I don't recall this ever having been requested which suggests that it is either unnecessary or that people fear opening a can of worms.

    @llllllllll
    Copy link
    Mannequin Author

    llllllllll mannequin commented Sep 27, 2016

    I definitely could have used PyImport_Import and PyObject_Call to accomplish this task; however, when I looked at at the implementation of these functions it seemed like a lot of overhead and book keeping just to set a boolean. Since I was already in a C extension it would be nicer to not need to worry about PyObjects and ref counts just to set this value so I thought it would be nice to expose these small functions to users. Since this is equivalent to gc.enable or gc.disable I don't think there is a deep can of worms here. The only difference in the implementation is that it doesn't return None. There is already PyGC_Collect, so I figured these were just omitted because no one thought to add them. I don't have a burning desire to get this in, I just thought it would be a nice to have.

    @rhettinger
    Copy link
    Contributor

    I don't have a burning desire to get this in,
    I just thought it would be a nice to have.

    In that case, I think we should decline.

    @superbobry
    Copy link
    Mannequin

    superbobry mannequin commented Dec 13, 2019

    I know this patch has already been rejected, but I wanted to give another potential use-case for accessing GC status from C: JIT compilers.

    Imagine a JIT compiler which uses alternative storage for instance attributes. In order to maintain correctness, it should "materialize" the stored attributes whenever __dict__ (or rather a pointer to __dict__) is accessed. In this context materialization means something like:

        __dict__ = {}
        for key, value in zip(keys, values):
            __dict__[key] = value

    Now, what if a __dict__ is accessed during a GC collection (which is possible: collect->deduce_unreachable->subtract_refs->subtype_traverse via tp_traverse)? The JIT compiler should be able to detect such calls and avoid allocating anything:

        if collecting:
            return
    
        __dict__ = {}
        # ...

    This is possible to implement in pure Python using gc.isenabled and gc.callbacks, but there is no existing API to do that in C.

    Does this sounds convincing enough to motivate adding

    int PyGC_IsEnabled(void)
    int PyGC_IsCollecting(void)

    to the C API?

    @da-woods
    Copy link
    Mannequin

    da-woods mannequin commented Apr 17, 2021

    Cython currently does an elaborate process of importing the GC module and loading the "isenabled", "enable" and "disable" attributes each time it creates a cdef type

    https://github.com/cython/cython/blob/master/Cython/Utility/ExtensionTypes.c#L73-L107

    Admittedly that's partly because it's abusing the Py_TPFLAGS_HEAPTYPE to allow multiple inheritance where Python doesn't really allow it.

    But the up-shot is that Cython definite has a use-case for doing this directly in the C API, and would use these functions if provided.

    @scoder
    Copy link
    Contributor

    scoder commented Apr 17, 2021

    I support that there should be a simple way to do this from C. The way via importing the "gc" module and then looking up an attribute (possibly building a Unicode string) and calling a function in it involves several operations that can take some time and require useless error handling since each of them will practically never fail but may. An operation as simple as changing the GC status shouldn't require that.

    A use case is building large data structures, or critical rearrangements of data structures that involve object operations that might trigger a GC run, maybe even including temporarily invalid object states and graphs. When larger data structures are involved but no collectable cycles, then the GC will probably not do anything useful except slowing down the creation process and/or running arbitrary code in between.

    I consider the need to disable garbage collection in critical sections of C code similar to locking and GIL handling. It isn't quite the same as locking, since other code can enable it again when it runs, which isn't the case for a lock, but especially in that case, being able to detect quickly whether it was re-enabled when my own code gains back control seems beneficial. Having to call a Python function for that and taking care of the object result is way too much overhead for this case.

    The fact that this is a rare request may not necessarily mean that it's rarely needed. There is certainly a bunch of C code out there that would benefit from temporarily disabling the GC in critical sections. I would imagine that people simply don't think of doing it and fail to notice any resulting slow-downs or even crashes since those often require elaborate circumstances to occur, and thus may not become visible at all in test or benchmark scenarios.

    Note that the GC state is now part of the PyInterpreterState, so a patch would need to do what "gc_enable_impl" and "gc_disable_impl" do in Modules/gcmodule.c, or, rather, become their implementation to share code.

    @scoder scoder added 3.10 only security fixes topic-C-API and removed 3.7 (EOL) end of life extension-modules C modules in the Modules dir labels Apr 17, 2021
    @scoder scoder reopened this Apr 17, 2021
    @scoder scoder added the performance Performance or resource usage label Apr 17, 2021
    @scoder
    Copy link
    Contributor

    scoder commented Apr 28, 2021

    I just remembered that it's usually helpful to return the previous state, so that callers know whether they need to re-enable or disable the GC when they're done. I'll add that.

    Also, returning an "int" may allow us to add a "-1" error code at some point, if it turns out to become necessary.

    @vstinner
    Copy link
    Member

    New changeset 3cc481b by scoder in branch 'master':
    bpo-28254: Add a C-API for controlling the GC state (GH-25687)
    3cc481b

    @scoder scoder closed this as completed Apr 28, 2021
    @vstinner
    Copy link
    Member

    New changeset 103d5e4 by Victor Stinner in branch 'master':
    bpo-28254: _posixsubprocess uses PyGC_Enable/PyGC_Disable (GH-25693)
    103d5e4

    @vstinner
    Copy link
    Member

    New changeset b1f413e by Victor Stinner in branch 'master':
    bpo-28254: Cleanup test_subprocess.test_preexec_gc_module_failure() (GH-25709)
    b1f413e

    @encukou
    Copy link
    Member

    encukou commented Apr 29, 2021

    New changeset 14fc2bd by Petr Viktorin in branch 'master':
    bpo-28254: Add PyGC_ functions to the stable ABI manifest (GH-25720)
    14fc2bd

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes performance Performance or resource usage topic-C-API
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants