This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: multiple issues in _elementtree module
Type: behavior Stage: resolved
Components: Extension Modules Versions: Python 3.5
process
Status: closed Resolution: fixed
Dependencies: 15083 Superseder:
Assigned To: serhiy.storchaka Nosy List:
Priority: normal Keywords:

Created on 2016-08-25 16:17 by tehybel, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 765 merged serhiy.storchaka, 2017-03-22 07:36
PR 903 merged serhiy.storchaka, 2017-03-30 06:50
PR 904 merged serhiy.storchaka, 2017-03-30 07:05
PR 963 merged serhiy.storchaka, 2017-04-02 15:55
Messages (7)
msg273662 - (view) Author: tehybel (tehybel) Date: 2016-08-25 16:17
I'll describe 7 issues in the /Modules/_elementtree.c module here. They
include multiple use-after-frees, type confusions and instances of
out-of-bounds array indexing.



Issue 1: use-after-free in element_get_text

The problematic code looks like this:
    
    LOCAL(PyObject*)
    element_get_text(ElementObject* self)
    { 
        /* return borrowed reference to text attribute */

        PyObject* res = self->text;

        if (JOIN_GET(res)) { 
            res = JOIN_OBJ(res);
            if (PyList_CheckExact(res)) {
                res = list_join(res);
                if (!res)
                    return NULL;
                self->text = res;
            }
        }

        return res;
    }

As we can see, if res is a list, we call list_join with res, which is
self->text. list_join will decrease self->text's reference count. When that
happens, arbitrary python code can run. If that code uses self->text, a
use-after-free occurs.

PoC (Proof-of-Concept segfaulting script):

---

import _elementtree as et

class X(str):
    def __del__(self):
        print(elem.text)

b = et.TreeBuilder()
b.start("test")
b.data(["AAAA", X("BBBB")])
b.start("test2")

elem = b.close()
print(elem.text)

---


Issue 2: use-after-free in element_get_tail

The same type of issue also exists in element_get_tail and should be fixed
there, too.



Issue 3: type confusion in elementiter_next

The function elementiter_next iterates over a tree consisting of elements.
Each element has an array of children. 
    
Before doing any casts, most of the elementtree code is careful to check that
these children are, indeed, elements; that is, that their type is correct. The
problem is that elementiter_next does not validate these child objects before
performing a cast. Here is the relevant line:
    
    elem = (ElementObject *)cur_parent->extra->children[child_index];
    
If the child is not an element, a type confusion occurs. Here's a PoC:

-----

import _elementtree as et

state = {
    "tag": "tag",
    "_children": [1,2,3],
    "attrib": "attr",
    "tail": "tail",
    "text": "text",
}

e = et.Element("ttt")
e.__setstate__(state)

for x in e.iter():
    print(x)

-----



Issue 4: array-out-of-bounds indexing in element_subscr

This issue occurs when taking the subscript of an element. This is done using
the element_subscr function. The function calls PySlice_GetIndicesEx:

    if (PySlice_GetIndicesEx(item,
            self->extra->length,
            &start, &stop, &step, &slicelen) < 0) {
        return NULL; 
    }

The problem is that to evaluate the indices, PySlice_GetIndicesEx might call
back into python code. That code might cause the element's self->extra->length
to change. If this happens, the variables "start", "stop" and "step" might no
longer be appropriate.

The code then uses these variables for array indexing:

    for (cur = start, i = 0; i < slicelen;
         cur += step, i++) {
        PyObject* item = self->extra->children[cur]; 
        ...
    }

But this could go out of bounds and interpret arbitrary memory as a PyObject.
Here's a PoC that reproduces this:

---

import _elementtree as et

class X:
    def __index__(self):
        e[:] = []
        return 1

e = et.Element("elem")
for _ in range(10):
    e.insert(0, et.Element("child"))

print(e[0:10:X()])

---

Running it results in a segfault:

(gdb) r ./poc14.py
Starting program: /home/xx/Python-3.5.2/python ./poc14.py

Program received signal SIGSEGV, Segmentation fault.
0x000000000049f933 in PyObject_Repr (v=0x7ffff68af058) at Objects/object.c:471
471     if (Py_TYPE(v)->tp_repr == NULL)
(gdb) print *v
$37 = {_ob_next = 0xdbdbdbdbdbdbdbdb, _ob_prev = 0xdbdbdbdbdbdbdbdb, ob_refcnt = 0xdbdbdbdbdbdbdbdc,
  ob_type = 0xdbdbdbdbdbdbdbdb}

As we can see, "v" is freed memory with arbitrary contents.



Issue 5: array-out-of-bounds indexing in element_ass_subscr

A separate issue of the same type, also due to a call to PySlice_GetIndicesEx,
exists in element_ass_subscr. Here's a proof-of-concept script for that:

---

import _elementtree as et

class X:
    def __index__(self):
        e[:] = []
        return 1

e = et.Element("elem")
for _ in range(10):
    e.insert(0, et.Element("child"))

e[0:10:X()] = []

---

To fix these, I believe we could check whether self->extra->length changed
after calling PySlice_GetIndicesEx, and bail out if so. (You can grep the
codebase for "changed size during iteration" for examples of some similarish
cases.)





Issue 6: use-after-free in treebuilder_handle_start

In the function treebuilder_handle_start we see these lines:

    if (treebuilder_set_element_text(self->last, self->data))
        return NULL;

Here self->last is the most recent element, and we are setting its text to
self->data. This assignment happens via the function
treebuilder_set_element_text which in turn calls
treebuilder_set_element_text_or_tail. There, if the element self->last is not
an exact instance of Element_Type, we run these lines:

    PyObject *joined = list_join(data);
    int r;
    if (joined == NULL)
        return -1;
    r = _PyObject_SetAttrId(element, name, joined);
    Py_DECREF(joined);
    return r;

The comment for list_join says this:

    /* join list elements (destroying the list in the process) */

So note that we destroy the given list. If we go back to the initial function,
treebuilder_handle_start, then we recall that "element" is really self->last and
"data" is actually self->data. When list_join(data) is called, self->data will
therefore be destroyed. So we have to be careful and ensure that self->data is
overwritten; otherwise it could get used after being freed.

Unfortunately the call to _PyObject_SetAttrId could fail -- for example if
"element" is an integer. Then treebuilder_handle_start just bails out without
clearing self->data. This results in a use-after-free.

Here's a PoC:

---

import _elementtree as et

b = et.TreeBuilder(element_factory=lambda x, y: 123)

b.start("tag", {})
b.data("ABCD")
b.start("tag2", {})

---

It sets self->last to be an integer, 123. When we try to set the attribute
"text" on this integer, it fails. At that point, self->data has been freed but
the reference isn't cleared. When treebuilder_dealloc is called by the garbage
collector, self->data is excessively decref'd and we get a segfault:

(gdb) run ./poc16.py
Traceback (most recent call last):
  File "./poc16.py", line 7, in <module>
    b.start("tag2", {})
AttributeError: 'int' object has no attribute 'text'

Program received signal SIGSEGV, Segmentation fault.
0x0000000000435122 in visit_decref (op=0x7ffff6d52380, data=0x0) at Modules/gcmodule.c:373
373     if (PyObject_IS_GC(op)) {
(gdb) print *op
$52 = {_ob_next = 0xdbdbdbdbdbdbdbdb, _ob_prev = 0xdbdbdbdbdbdbdbdb, ob_refcnt = 0xdbdbdbdbdbdbdbdb,
  ob_type = 0xdbdbdbdbdbdbdbdb}


Note that this issue exists in two places inside treebuilder_handle_start -- the
call to treebuilder_set_element_tail has the same problem and both should be
fixed.


Issue 7: use-after-free in treebuilder_handle_end

The same sort of issue also exists in treebuilder_handle_end and should be
fixed there as well.




All these issues were found in Python-3.5.2 and tested with --with-pydebug
enabled. I did not check the 2.7 branch, but some (though not all) of the
issues likely exist there as well.
msg273664 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-08-25 16:25
Thank you for your report tehybel.
msg290825 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-30 06:47
New changeset 576def096ec7b64814e038f03290031f172886c3 by Serhiy Storchaka in branch 'master':
bpo-27863: Fixed multiple crashes in ElementTree. (#765)
https://github.com/python/cpython/commit/576def096ec7b64814e038f03290031f172886c3
msg290828 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-30 07:32
New changeset c90ff1b78cb79bc3762184e03fa81f11984aaa45 by Serhiy Storchaka in branch '3.5':
bpo-27863: Fixed multiple crashes in ElementTree. (#765) (#904)
https://github.com/python/cpython/commit/c90ff1b78cb79bc3762184e03fa81f11984aaa45
msg290845 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-30 14:56
Since it is hard to backport the bugfix to 2.7 without test, issue15083 is a dependence.
msg290849 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-30 15:08
New changeset a6b4e1902250d6f28ca6d083ce1c8d7e9b91974b by Serhiy Storchaka in branch '3.6':
bpo-27863: Fixed multiple crashes in ElementTree. (#765) (#903)
https://github.com/python/cpython/commit/a6b4e1902250d6f28ca6d083ce1c8d7e9b91974b
msg291039 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-04-02 17:37
New changeset 9c2c42c221d7996070c0c0a2a114ab42fe3ddb9d by Serhiy Storchaka in branch '2.7':
bpo-27863: Fixed multiple crashes in ElementTree. (#765) (#903) (#963)
https://github.com/python/cpython/commit/9c2c42c221d7996070c0c0a2a114ab42fe3ddb9d
History
Date User Action Args
2022-04-11 14:58:35adminsetgithub: 72050
2021-11-04 14:14:05eryksunsetnosy: - ahmedsayeed1982
2021-11-04 14:13:57eryksunsetmessages: - msg405690
2021-11-04 12:09:06ahmedsayeed1982setnosy: + ahmedsayeed1982, - scoder, ericvw, eli.bendersky, serhiy.storchaka, xiang.zhang, tehybel

messages: + msg405690
versions: - Python 2.7, Python 3.6
2017-04-20 07:36:48serhiy.storchakasetstatus: open -> closed
dependencies: - various issues due to misuse of PySlice_GetIndicesEx
resolution: fixed
stage: backport needed -> resolved
2017-04-02 17:37:05serhiy.storchakasetmessages: + msg291039
2017-04-02 15:55:32serhiy.storchakasetpull_requests: + pull_request1142
2017-03-30 15:08:25serhiy.storchakasetmessages: + msg290849
2017-03-30 14:56:47serhiy.storchakasetdependencies: + Rewrite ElementTree tests in a cleaner and safer way
messages: + msg290845
stage: needs patch -> backport needed
2017-03-30 07:32:21serhiy.storchakasetmessages: + msg290828
2017-03-30 07:05:58serhiy.storchakasetpull_requests: + pull_request804
2017-03-30 06:50:01serhiy.storchakasetpull_requests: + pull_request803
2017-03-30 06:47:33serhiy.storchakasetmessages: + msg290825
2017-03-22 07:36:06serhiy.storchakasetpull_requests: + pull_request674
2016-08-27 10:42:55serhiy.storchakasetdependencies: + various issues due to misuse of PySlice_GetIndicesEx
2016-08-26 06:12:51xiang.zhangsetnosy: + xiang.zhang
2016-08-25 17:28:05ericvwsetnosy: + ericvw
2016-08-25 16:25:14serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg273664

assignee: serhiy.storchaka
stage: needs patch
2016-08-25 16:19:55SilentGhostsetnosy: + scoder, eli.bendersky
type: behavior
2016-08-25 16:17:10tehybelcreate