Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiple issues in _elementtree module #72050

Closed
tehybel mannequin opened this issue Aug 25, 2016 · 8 comments
Closed

multiple issues in _elementtree module #72050

tehybel mannequin opened this issue Aug 25, 2016 · 8 comments
Assignees
Labels
extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error

Comments

@tehybel
Copy link
Mannequin

tehybel mannequin commented Aug 25, 2016

BPO 27863
PRs
  • bpo-27863: Fixed multiple crashes in ElementTree. #765
  • bpo-27863: Fixed multiple crashes in ElementTree. (#765) #903
  • bpo-27863: Fixed multiple crashes in ElementTree. (#765) #904
  • bpo-27863: Fixed multiple crashes in ElementTree. (#765) (#903) #963
  • Dependencies
  • bpo-15083: Rewrite ElementTree tests in a cleaner and safer way
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2017-04-20.07:36:48.868>
    created_at = <Date 2016-08-25.16:17:10.532>
    labels = ['extension-modules', 'type-bug']
    title = 'multiple issues in _elementtree module'
    updated_at = <Date 2021-11-04.14:14:05.394>
    user = 'https://bugs.python.org/tehybel'

    bugs.python.org fields:

    activity = <Date 2021-11-04.14:14:05.394>
    actor = 'eryksun'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2017-04-20.07:36:48.868>
    closer = 'serhiy.storchaka'
    components = ['Extension Modules']
    creation = <Date 2016-08-25.16:17:10.532>
    creator = 'tehybel'
    dependencies = ['15083']
    files = []
    hgrepos = []
    issue_num = 27863
    keywords = []
    message_count = 7.0
    messages = ['273662', '273664', '290825', '290828', '290845', '290849', '291039']
    nosy_count = 0.0
    nosy_names = []
    pr_nums = ['765', '903', '904', '963']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue27863'
    versions = ['Python 3.5']

    @tehybel
    Copy link
    Mannequin Author

    tehybel mannequin commented Aug 25, 2016

    I'll describe 7 issues in the /Modules/_elementtree.c module here. They
    include multiple use-after-frees, type confusions and instances of
    out-of-bounds array indexing.

    Issue 1: use-after-free in element_get_text

    The problematic code looks like this:

    LOCAL(PyObject*)
    element_get_text(ElementObject* self)
    { 
        /* return borrowed reference to text attribute \*/
    
            PyObject* res = self->text;
    
            if (JOIN_GET(res)) { 
                res = JOIN_OBJ(res);
                if (PyList_CheckExact(res)) {
                    res = list_join(res);
                    if (!res)
                        return NULL;
                    self->text = res;
                }
            }
    
            return res;
        }

    As we can see, if res is a list, we call list_join with res, which is
    self->text. list_join will decrease self->text's reference count. When that
    happens, arbitrary python code can run. If that code uses self->text, a
    use-after-free occurs.

    PoC (Proof-of-Concept segfaulting script):

    ---

    import _elementtree as et
    
    class X(str):
        def __del__(self):
            print(elem.text)
    
    b = et.TreeBuilder()
    b.start("test")
    b.data(["AAAA", X("BBBB")])
    b.start("test2")
    
    elem = b.close()
    print(elem.text)

    Issue 2: use-after-free in element_get_tail

    The same type of issue also exists in element_get_tail and should be fixed
    there, too.

    Issue 3: type confusion in elementiter_next

    The function elementiter_next iterates over a tree consisting of elements.
    Each element has an array of children.

    Before doing any casts, most of the elementtree code is careful to check that
    these children are, indeed, elements; that is, that their type is correct. The
    problem is that elementiter_next does not validate these child objects before
    performing a cast. Here is the relevant line:

    elem = (ElementObject \*)cur_parent-\>extra-\>children[child_index];
    

    If the child is not an element, a type confusion occurs. Here's a PoC:

    -----

    import _elementtree as et
    
    state = {
        "tag": "tag",
        "_children": [1,2,3],
        "attrib": "attr",
        "tail": "tail",
        "text": "text",
    }
    
    e = et.Element("ttt")
    e.__setstate__(state)
    
    for x in e.iter():
        print(x)

    Issue 4: array-out-of-bounds indexing in element_subscr

    This issue occurs when taking the subscript of an element. This is done using
    the element_subscr function. The function calls PySlice_GetIndicesEx:

        if (PySlice_GetIndicesEx(item,
                self->extra->length,
                &start, &stop, &step, &slicelen) < 0) {
            return NULL; 
        }

    The problem is that to evaluate the indices, PySlice_GetIndicesEx might call
    back into python code. That code might cause the element's self->extra->length
    to change. If this happens, the variables "start", "stop" and "step" might no
    longer be appropriate.

    The code then uses these variables for array indexing:

    for (cur = start, i = 0; i < slicelen;
         cur += step, i++) {
        PyObject* item = self->extra->children[cur]; 
        ...
    }
    

    But this could go out of bounds and interpret arbitrary memory as a PyObject.
    Here's a PoC that reproduces this:

    ---

    import _elementtree as et
    
    class X:
        def __index__(self):
            e[:] = []
            return 1
    
    e = et.Element("elem")
    for _ in range(10):
        e.insert(0, et.Element("child"))

    print(e[0:10:X()])

    ---

    Running it results in a segfault:

    (gdb) r ./poc14.py
    Starting program: /home/xx/Python-3.5.2/python ./poc14.py

    Program received signal SIGSEGV, Segmentation fault.
    0x000000000049f933 in PyObject_Repr (v=0x7ffff68af058) at Objects/object.c:471
    471 if (Py_TYPE(v)->tp_repr == NULL)
    (gdb) print *v
    $37 = {_ob_next = 0xdbdbdbdbdbdbdbdb, _ob_prev = 0xdbdbdbdbdbdbdbdb, ob_refcnt = 0xdbdbdbdbdbdbdbdc,
    ob_type = 0xdbdbdbdbdbdbdbdb}

    As we can see, "v" is freed memory with arbitrary contents.

    Issue 5: array-out-of-bounds indexing in element_ass_subscr

    A separate issue of the same type, also due to a call to PySlice_GetIndicesEx,
    exists in element_ass_subscr. Here's a proof-of-concept script for that:

    ---

    import _elementtree as et
    
    class X:
        def __index__(self):
            e[:] = []
            return 1
    
    e = et.Element("elem")
    for _ in range(10):
        e.insert(0, et.Element("child"))

    e[0:10:X()] = []

    ---

    To fix these, I believe we could check whether self->extra->length changed
    after calling PySlice_GetIndicesEx, and bail out if so. (You can grep the
    codebase for "changed size during iteration" for examples of some similarish
    cases.)

    Issue 6: use-after-free in treebuilder_handle_start

    In the function treebuilder_handle_start we see these lines:

    if (treebuilder_set_element_text(self->last, self->data))
        return NULL;
    

    Here self->last is the most recent element, and we are setting its text to
    self->data. This assignment happens via the function
    treebuilder_set_element_text which in turn calls
    treebuilder_set_element_text_or_tail. There, if the element self->last is not
    an exact instance of Element_Type, we run these lines:

        PyObject *joined = list_join(data);
        int r;
        if (joined == NULL)
            return -1;
        r = _PyObject_SetAttrId(element, name, joined);
        Py_DECREF(joined);
        return r;

    The comment for list_join says this:

    /* join list elements (destroying the list in the process) */
    

    So note that we destroy the given list. If we go back to the initial function,
    treebuilder_handle_start, then we recall that "element" is really self->last and
    "data" is actually self->data. When list_join(data) is called, self->data will
    therefore be destroyed. So we have to be careful and ensure that self->data is
    overwritten; otherwise it could get used after being freed.

    Unfortunately the call to _PyObject_SetAttrId could fail -- for example if
    "element" is an integer. Then treebuilder_handle_start just bails out without
    clearing self->data. This results in a use-after-free.

    Here's a PoC:

    ---

    import _elementtree as et
    
    b = et.TreeBuilder(element_factory=lambda x, y: 123)
    
    b.start("tag", {})
    b.data("ABCD")
    b.start("tag2", {})

    It sets self->last to be an integer, 123. When we try to set the attribute
    "text" on this integer, it fails. At that point, self->data has been freed but
    the reference isn't cleared. When treebuilder_dealloc is called by the garbage
    collector, self->data is excessively decref'd and we get a segfault:

    (gdb) run ./poc16.py
    Traceback (most recent call last):
      File "./poc16.py", line 7, in <module>
        b.start("tag2", {})
    AttributeError: 'int' object has no attribute 'text'

    Program received signal SIGSEGV, Segmentation fault.
    0x0000000000435122 in visit_decref (op=0x7ffff6d52380, data=0x0) at Modules/gcmodule.c:373
    373 if (PyObject_IS_GC(op)) {
    (gdb) print *op
    $52 = {_ob_next = 0xdbdbdbdbdbdbdbdb, _ob_prev = 0xdbdbdbdbdbdbdbdb, ob_refcnt = 0xdbdbdbdbdbdbdbdb,
    ob_type = 0xdbdbdbdbdbdbdbdb}

    Note that this issue exists in two places inside treebuilder_handle_start -- the
    call to treebuilder_set_element_tail has the same problem and both should be
    fixed.

    Issue 7: use-after-free in treebuilder_handle_end

    The same sort of issue also exists in treebuilder_handle_end and should be
    fixed there as well.

    All these issues were found in Python-3.5.2 and tested with --with-pydebug
    enabled. I did not check the 2.7 branch, but some (though not all) of the
    issues likely exist there as well.

    @tehybel tehybel mannequin added the extension-modules C modules in the Modules dir label Aug 25, 2016
    @SilentGhost SilentGhost mannequin added the type-bug An unexpected behavior, bug, or error label Aug 25, 2016
    @serhiy-storchaka
    Copy link
    Member

    Thank you for your report tehybel.

    @serhiy-storchaka serhiy-storchaka self-assigned this Aug 25, 2016
    @serhiy-storchaka
    Copy link
    Member

    New changeset 576def0 by Serhiy Storchaka in branch 'master':
    bpo-27863: Fixed multiple crashes in ElementTree. (#765)
    576def0

    @serhiy-storchaka
    Copy link
    Member

    New changeset c90ff1b by Serhiy Storchaka in branch '3.5':
    bpo-27863: Fixed multiple crashes in ElementTree. (#765) (#904)
    c90ff1b

    @serhiy-storchaka
    Copy link
    Member

    Since it is hard to backport the bugfix to 2.7 without test, bpo-15083 is a dependence.

    @serhiy-storchaka
    Copy link
    Member

    New changeset a6b4e19 by Serhiy Storchaka in branch '3.6':
    bpo-27863: Fixed multiple crashes in ElementTree. (#765) (#903)
    a6b4e19

    @serhiy-storchaka
    Copy link
    Member

    New changeset 9c2c42c by Serhiy Storchaka in branch '2.7':
    bpo-27863: Fixed multiple crashes in ElementTree. (#765) (#903) (#963)
    9c2c42c

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @panukulv
    Copy link

    Merge

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants