Author tehybel
Recipients tehybel
Date 2016-09-02.21:33:28
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1472852008.61.0.354257945943.issue27945@psf.upfronthosting.co.za>
In-reply-to
Content
Here I'll describe five distinct issues I found. Common to them all is that they
reside in the built-in dictionary object. 

Four of them are use-after-frees and one is an array-out-of-bounds indexing bug.


All of the described functions reside in /Objects/dictobject.c.


Issue 1: use-after-free when initializing a dictionary

Initialization of a dictionary happens via the function dict_init which calls
dict_update_common. From there, PyDict_MergeFromSeq2 may be called, and that is
where this issue resides.

In PyDict_MergeFromSeq2 we retrieve a sequence of size 2 with this line:

	fast = PySequence_Fast(item, "");

After checking its size, we take out a key and value:

	key = PySequence_Fast_GET_ITEM(fast, 0);
	value = PySequence_Fast_GET_ITEM(fast, 1);

Then we call PyDict_GetItem. This calls back to Python code if the key has a
__hash__ function. From there the "item" sequence could get modified, resulting
in "key" or "value" getting used after having been freed.

Here's a PoC:

---

class X:
    def __hash__(self):
        pair[:] = []
        return 13

pair = [X(), 123]
dict([pair])

---

It crashes while trying to use freed memory as a PyObject:

(gdb) run ./poc24.py 
Program received signal SIGSEGV, Segmentation fault.
0x000000000048fe25 in insertdict (mp=mp@entry=0x7ffff6d5c4b8, key=key@entry=0x7ffff6d52538, hash=0xd, 
    value=value@entry=0x8d1ac0 <small_ints+6144>) at Objects/dictobject.c:831
831	    MAINTAIN_TRACKING(mp, key, value);
(gdb) print *key
$26 = {_ob_next = 0xdbdbdbdbdbdbdbdb, _ob_prev = 0xdbdbdbdbdbdbdbdb, ob_refcnt = 0xdbdbdbdbdbdbdbdb, 
  ob_type = 0xdbdbdbdbdbdbdbdb}




Issue 2: use-after-free in dictitems_contains

In the function dictitems_contains we call PyDict_GetItem to look up a value in
the dictionary:

	found = PyDict_GetItem((PyObject *)dv->dv_dict, key);

However this "found" variable is borrowed. We then go ahead and compare it:

	return PyObject_RichCompareBool(value, found, Py_EQ);

But PyObject_RichCompareBool could call back into Python code and e.g. release
the GIL. As a result, the dictionary may be mutated. Thus "found" could get
freed. 

Then, inside PyObject_RichCompareBool (actually in do_richcompare), the "found"
variable gets used after being freed.

PoC:

---

class X:
    def __eq__(self, other):
        d.clear()
        return NotImplemented

d = {0: set()}
(0, X()) in d.items()

---

Result:

(gdb) run ./poc25.py 
Program received signal SIGSEGV, Segmentation fault.
0x00000000004a03b6 in do_richcompare (v=v@entry=0x7ffff6d52468, w=w@entry=0x7ffff6ddf7c8, op=op@entry=0x2)
    at Objects/object.c:673
673	    if (!checked_reverse_op && (f = w->ob_type->tp_richcompare) != NULL) {
(gdb) print w->ob_type
$26 = (struct _typeobject *) 0xdbdbdbdbdbdbdbdb




Issue 3: use-after-free in dict_equal

In the function dict_equal, we call the "lookdict" function via
b->ma_keys->dk_lookup to look up a value:

	if ((b->ma_keys->dk_lookup)(b, key, ep->me_hash, &vaddr) == NULL)

This value's address is stored into the "vaddr" variable and the value is
fetched into the "bval" variable:

	bval = *vaddr;

Then we call Py_DECREF(key) which can call back into Python code. This could
release the GIL and mutate dictionary b. Therefore "bval" could become freed at
this point. We then proceed to use "bval":

	cmp = PyObject_RichCompareBool(aval, bval, Py_EQ);

This results in a use-after-free.

PoC:

---

class X():
    def __del__(self): 
        dict_b.clear()
    def __eq__(self, other):
        dict_a.clear()
        return True
    def __hash__(self): 
        return 13
        
dict_a = {X(): 0}
dict_b = {X(): X()}
dict_a == dict_b

---

Result:

(gdb) run ./poc26.py 
Program received signal SIGSEGV, Segmentation fault.
PyType_IsSubtype (a=0xdbdbdbdbdbdbdbdb, b=0x87ec60 <PyLong_Type>)
    at Objects/typeobject.c:1343
1343	    mro = a->tp_mro;
(gdb) print a
$59 = (PyTypeObject *) 0xdbdbdbdbdbdbdbdb



Issue 4: use-after-free in _PyDict_FromKeys

The function _PyDict_FromKeys takes an iterable as argument. If the iterable is
a dict, _PyDict_FromKeys loops over it like this:

	while (_PyDict_Next(iterable, &pos, &key, &oldvalue, &hash)) {
		if (insertdict(mp, key, hash, value)) {
			...
		}
	}

However if we look at the comment for PyDict_Next, we see this:

	 * CAUTION:  In general, it isn't safe to use PyDict_Next in a loop that
	 * mutates the dict.

But insertdict can call on to Python code which might mutate the dict. In that
case we perform a use-after-free of the "key" variable.

Here's a PoC:

---

class X(int):
    def __hash__(self):
        return 13 
    def __eq__(self, other):
        if len(d) > 1:
            d.clear()
        return False

d = {}
d = {X(1): 1, X(2): 2}
x = {}.fromkeys(d)

---

And the result:

(gdb) run ./poc27.py 
Program received signal SIGSEGV, Segmentation fault.
0x0000000000435122 in visit_decref (op=0x7ffff6d5ca68, data=0x0) at Modules/gcmodule.c:373
373	    if (PyObject_IS_GC(op)) {
(gdb) print *op
$115 = {_ob_next = 0xdbdbdbdbdbdbdbdb, _ob_prev = 0xdbdbdbdbdbdbdbdb, ob_refcnt = 0xdbdbdbdbdbdbdbdb, 
  ob_type = 0xdbdbdbdbdbdbdbdb}


An almost identical issue also exists further down in the function when calling
_PySet_NextEntry. To see this crash, just change "d" to be a set in the PoC
above:

	d = set()
	d = set([X(1), X(2)])

this likewise crashes with a use-after-free.



(Note: if you grep for PyDict_Next you will find more similar cases, although
many are in obscure modules or deprecated functions. I'm not sure those are
worth fixing? E.g. here's a crasher for BaseException_setstate which also calls
PyDict_Next:

---

class X(str):
    def __hash__(self):
        d.clear()
        return 13

d = {}
d[X()] = X()

e = Exception()
e.__setstate__(d)

---

end note.)




Issue 5: out-of-bounds indexing in dictiter_iternextitem

The function dictiter_iternextitem is used to iterate over a dictionary's items.
dictiter_iternextitem is careful to check that the dictionary did not change
size during iteration. However after performing this check, it calls Py_DECREF:

	Py_DECREF(PyTuple_GET_ITEM(result, 0));
	Py_DECREF(PyTuple_GET_ITEM(result, 1));

This can execute Python code and mutate the dict. If that happens, the index "i"
previously computed by dictiter_iternextitem could become invalid. It would then
index out of bounds with this line:

	key = d->ma_keys->dk_entries[i].me_key;

Furthermore the "value_ptr" variable would have gone stale, too. Taking the
"value" variable out of it uses memory that has been freed:

	value = *value_ptr;

Here's a PoC which crashes with the "value" variable being an arbitrary pointer:

---

class X(int):
    def __del__(self):
        d.clear()
    
d = {i: X(i) for i in range(8)}
    
for result in d.items():
    if result[0] == 2:
        d[2] = None # free d[2] --> X(2).__del__ is called

---

The result:

(gdb) run ./poc29.py 
Program received signal SIGSEGV, Segmentation fault.
dictiter_iternextitem (di=0x7ffff6d49cd8) at Objects/dictobject.c:3187
3187        Py_INCREF(key);
(gdb) print value
$12 = (PyObject *) 0x7b7b7b7b7b7b7b7b
History
Date User Action Args
2016-09-02 21:33:28tehybelsetrecipients: + tehybel
2016-09-02 21:33:28tehybelsetmessageid: <1472852008.61.0.354257945943.issue27945@psf.upfronthosting.co.za>
2016-09-02 21:33:28tehybellinkissue27945 messages
2016-09-02 21:33:28tehybelcreate