This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author tehybel
Recipients tehybel
Date 2016-09-02.21:48:01
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1472852881.19.0.987799736928.issue27946@psf.upfronthosting.co.za>
In-reply-to
Content
I would like to describe an issue in the _elementtree module, and then
propose a fix which would prevent this type of bug everywhere in the
codebase.

The issue exists in _elementtree_Element_get_impl in
/Modules/_elementtree.c. Here is the code:

    static PyObject *
    _elementtree_Element_get_impl(ElementObject *self, PyObject *key,
                                  PyObject *default_value)
    {
        PyObject* value;

        if (...)
            value = default_value;
        else {
            value = PyDict_GetItem(self->extra->attrib, key);
            ...
        }
        ...
    }

We look up "key" in the dictionary, that is, in self->extra->attrib. This is 
done with the call to PyDict_GetItem(self->extra->attrib, key).

We need to hash the "key" object in order to find it in the dictionary. This
sometimes requires calling the key's __hash__ function, i.e., it might call on
to python code.

What happens if the python code gets the dictionary (self->extra->attrib) freed?
Then PyDict_GetItem will use it after it has been freed.

I've attached a proof-of-concept script which causes a use-after-free through
_elementtree_Element_get_impl due to this issue.

We could fix this by calling Py_INCREF on self->extra->attrib before calling
PyDict_GetItem. But grepping for "PyDict_GetItem.*\->" shows that there are many
places in the codebase where this situation occurs. Some of these may not have
much impact, but some of them likely will.

All these bugs could be fixed at once by changing PyDict_GetItem to call 
Py_INCREF on the dictionary before it hashes the key.



Here's a PoC for the _elementtree module.

--- begin script ---

import _elementtree as et

state = {
    "tag": "tag",
    "_children": [],
    "attrib": "attr",
    "tail": "tail",
    "text": "text",
}

class X:
    def __hash__(self):
        e.__setstate__(state) # this frees e->extra->attrib
        return 13

e = et.Element("elem", {1: 2})
e.get(X())

--- end script ---

Running it (64-bits Python 3.5.2, --with-pydebug) causes a use-after-free with
control of the program counter:

(gdb) r ./poc13.py
Starting program: /home/xx/Python-3.5.2/python ./poc13.py

Program received signal SIGSEGV, Segmentation fault.
0x00000000004939af in PyDict_GetItem (op=0x7ffff6d5b1a8, key=0x7ffff6d528e0) at Objects/dictobject.c:1079
1079            ep = (mp->ma_keys->dk_lookup)(mp, key, hash, &value_addr);
(gdb) p mp->ma_keys->dk_lookup
$8 = (dict_lookup_func) 0x7b7b7b7b7b7b7b7b
History
Date User Action Args
2016-09-02 21:48:01tehybelsetrecipients: + tehybel
2016-09-02 21:48:01tehybelsetmessageid: <1472852881.19.0.987799736928.issue27946@psf.upfronthosting.co.za>
2016-09-02 21:48:01tehybellinkissue27946 messages
2016-09-02 21:48:01tehybelcreate