Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that objects entering the GC are valid #82573

Closed
vstinner opened this issue Oct 7, 2019 · 11 comments
Closed

Ensure that objects entering the GC are valid #82573

vstinner opened this issue Oct 7, 2019 · 11 comments
Labels
3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@vstinner
Copy link
Member

vstinner commented Oct 7, 2019

BPO 38392
Nosy @tim-one, @Yhg1s, @nascheme, @scoder, @vstinner, @serhiy-storchaka, @pablogsal
PRs
  • bpo-38392: PyObject_GC_Track() validates object in debug mode #16615
  • bpo-38392: Only declare visit_validate() if Py_DEBUG is defined #16689
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-10-07.22:26:18.535>
    created_at = <Date 2019-10-07.10:48:58.568>
    labels = ['interpreter-core', '3.9']
    title = 'Ensure that objects entering the GC are valid'
    updated_at = <Date 2019-10-10.07:32:19.614>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2019-10-10.07:32:19.614>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-10-07.22:26:18.535>
    closer = 'vstinner'
    components = ['Interpreter Core']
    creation = <Date 2019-10-07.10:48:58.568>
    creator = 'vstinner'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 38392
    keywords = ['patch']
    message_count = 11.0
    messages = ['354074', '354075', '354080', '354081', '354082', '354083', '354087', '354148', '354149', '354173', '354318']
    nosy_count = 7.0
    nosy_names = ['tim.peters', 'twouters', 'nascheme', 'scoder', 'vstinner', 'serhiy.storchaka', 'pablogsal']
    pr_nums = ['16615', '16689']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue38392'
    versions = ['Python 3.9']

    @vstinner
    Copy link
    Member Author

    vstinner commented Oct 7, 2019

    A bug in Python/hamt.c was only discovered 4 months after the code was added to Python:

    The problem was that an object was tracked by the GC, whereas calling its traverse method crashed Python... depending when the traverse method is called exactly.

    Pseudo-code:

      PyObject_GC_Track(<partially initialized object>);
      ...
      obj2 = PyObject_GC_NewVar(...);
      ...
      <finish to initialize the first object>

    The problem is that PyObject_GC_NewVar() can trigger a GC collection which uses the partially initialized object, and then we get a cryptic crash in the GC (usually seen in visit_decref()).

    I propose to make PyObject_GC_Track() stricter: validate that the object seems to be "valid" or "fully initialized". From the GC point of view, it means that calling tp_traverse must not crash.

    Attached PR is a concrete implementation.

    This issue is a follow-up of my previously failed attempt bpo-36389 "Add gc.enable_object_debugger(): detect corrupted Python objects in the GC". While bpo-36389 attempts to discoverd corrupted objects once a GC collection is triggered, this issue is restricted to objects entering the GC.

    First problem: my PR breaks Modules/pyexpat.c whereas the code is not strictly a bug: in Python 3.8, it's ok to put a partially initialized object into the GC if no GC collection can be triggered before the object is fully initialized. In Modules/pyexpat.c case, the code looks safe from that point of view.

    It means that such change is a backward incompatible change. IMHO it's a good step forward, since it is likely to discovered hidden bugs.

    @vstinner vstinner added 3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Oct 7, 2019
    @vstinner
    Copy link
    Member Author

    vstinner commented Oct 7, 2019

    Pablo, Tim, Neal: what do you think of this idea?

    @Yhg1s
    Copy link
    Member

    Yhg1s commented Oct 7, 2019

    I'm pretty sure you meant nascheme, not nnorwitz.

    @serhiy-storchaka
    Copy link
    Member

    It could be possible to do this in backward compatible way. PyObject_GC_Track() could add the object to the list of new objects (all objects are already linked in bi-linked list, so it would need to just move the object to the specified list), and PyObject_GC_NewVar() could check all new objects and clear the list of new objects (just move the head).

    I am not sure it is worth to do such complication.

    @serhiy-storchaka
    Copy link
    Member

    From documentation:

    .. c:function:: void PyObject_GC_Track(PyObject *op)

    Adds the object *op* to the set of container objects tracked by the
    collector. The collector can run at unexpected times so objects must be
    valid while being tracked. This should be called once all the fields
    followed by the :c:member:`~PyTypeObject.tp_traverse` handler become valid, usually near the
    end of the constructor.

    So there was a bug in Modules/pyexpat.c.

    @vstinner
    Copy link
    Member Author

    vstinner commented Oct 7, 2019

    I'm pretty sure you meant nascheme, not nnorwitz.

    Oops, right :-)

    @vstinner
    Copy link
    Member Author

    vstinner commented Oct 7, 2019

    It could be possible to do this in backward compatible way. PyObject_GC_Track() could add the object to the list of new objects (all objects are already linked in bi-linked list, so it would need to just move the object to the specified list), and PyObject_GC_NewVar() could check all new objects and clear the list of new objects (just move the head).

    That's different than what I'm proposing here. Checking "later" is more on the bpo-36389 side: my bpo-36389 PR modified PyObject_GC_NewVar() to check that young object are valid.

    Moreover, my bpo-36389 PR required threshold: checking all young objects at each PyObject_GC_NewVar() simply killed performance which made the whole approach unusable in practice. Because of the threshold, my implementation failed to detect the hamt.c bug bpo-33803: the partially initialized was already fully initialized when my "object debugger" ran. PR 16615 detects immediately the hamt.c bug bpo-33803.

    I designed bpo-36389 to attempt to detect when an object is corrupted after its creation, corrupted by code running "later".

    We had a customer with a bug in OpenStack Nova. Nova crashed in visit_decref() but I failed to debug the issue. It motivated me to modify the debug ABI in Python 3.8, so later it will become possible to switch from release to debug mode to attempt to debug a program using additional checks added by the debug build.

    Here my intent is to ensure that objects entering the GC are valid to help to keep the GC list consistent and valid. But it doesn't prevent an object from being corrupted after its creation. We should try to find a different approach for that (maybe revisit bpo-36389).

    @vstinner
    Copy link
    Member Author

    vstinner commented Oct 7, 2019

    New changeset 1b18455 by Victor Stinner in branch 'master':
    bpo-38392: PyObject_GC_Track() validates object in debug mode (GH-16615)
    1b18455

    @vstinner
    Copy link
    Member Author

    vstinner commented Oct 7, 2019

    Ok, I pushed a change. Let's see how third party projects like it :-) We still have time to revert it if it causes too many damage.

    @vstinner vstinner closed this as completed Oct 7, 2019
    @serhiy-storchaka
    Copy link
    Member

    I am sure this will break third-party projects. But I thing that we should not revert this change, as it completely matches the documentation. It is good that it was applied so early at development cycle.

    Stefan, could you please test Cython and lxml with this?

    @vstinner
    Copy link
    Member Author

    New changeset a544773 by Victor Stinner in branch 'master':
    bpo-38392: Only declare visit_validate() if Py_DEBUG is defined (GH-16689)
    a544773

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants