This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: array.fromstring Use After Free
Type: crash Stage: resolved
Components: Extension Modules Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Arfrever, JohnLeitch, benjamin.peterson, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2015-07-12 00:32 by JohnLeitch, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
array.fromstring-Use-After-Free.py JohnLeitch, 2015-07-12 00:32 repro file
arraymodule.c.patch JohnLeitch, 2015-07-12 00:33 patch
array.fromstring-Use-After-Free.patch JohnLeitch, 2015-07-25 20:58
Messages (11)
msg246621 - (view) Author: John Leitch (JohnLeitch) * Date: 2015-07-12 00:32
The Python array.fromstring() method suffers from a use after free caused by unsafe realloc use. The issue is triggered when an array is concatenated to itself via fromstring() call:

static PyObject *
array_fromstring(arrayobject *self, PyObject *args)
{
    char *str;
    Py_ssize_t n;
    int itemsize = self->ob_descr->itemsize;
    if (!PyArg_ParseTuple(args, "s#:fromstring", &str, &n)) <<<< The str buffer is parsed from args. In cases where an array is passed to itself, self->ob_item == str.
        return NULL;
    if (n % itemsize != 0) {
        PyErr_SetString(PyExc_ValueError,
                   "string length not a multiple of item size");
        return NULL;
    }
    n = n / itemsize;
    if (n > 0) {
        char *item = self->ob_item; <<<< If str == self->ob_item, item == str.
        if ((n > PY_SSIZE_T_MAX - Py_SIZE(self)) ||
            ((Py_SIZE(self) + n) > PY_SSIZE_T_MAX / itemsize)) {
                return PyErr_NoMemory();
        }
        PyMem_RESIZE(item, char, (Py_SIZE(self) + n) * itemsize); <<<< A realloc call occurs here with item passed as the ptr argument. Because realloc sometimes calls free(), this means that item may be freed. If item was equal to str, str is now pointing to freed memory.
        if (item == NULL) {
            PyErr_NoMemory();
            return NULL;
        }
        self->ob_item = item;
        Py_SIZE(self) += n;
        self->allocated = Py_SIZE(self);
        memcpy(item + (Py_SIZE(self) - n) * itemsize,
               str, itemsize*n); <<<< If str is dangling at this point, a use after free occurs here.
    }
    Py_INCREF(Py_None);
    return Py_None;
}

In most cases when this occurs, the function behaves as expected; while the dangling str pointer is technically pointing to deallocated memory, given the timing it is highly likely the memory contains the expected data. However, ocassionally, an errant allocation will occur between the realloc and memcpy, leading to unexpected contents in the str buffer.

In applications that expose otherwise innocuous indirect object control of arrays as attack surface, it may be possible for an attacker to trigger the corruption of arrays. This could potentially be exploited to exfiltrate data or achieve privilege escalation, depending on subsequent operations performed using corrupted arrays.

A proof-of-concept follows:

import array
import sys
import random

testNumber = 0

def dump(value):
	global testNumber
	i = 0
	for x in value:
		y = ord(x)
		if (y != 0x41): 
			end = ''.join(value[i:]).index('A' * 0x10)
			sys.stdout.write("%08x a[%08x]: " % (testNumber, i))
			for z in value[i:i+end]: sys.stdout.write(hex(ord(z))[2:])
			sys.stdout.write('\r\n')
			break			
		i += 1

def copyArray():
	global testNumber
	while True:
		a=array.array("c",'A'*random.randint(0x0, 0x10000))
		a.fromstring(a)
		dump(a)
		testNumber += 1
	
print "Starting..."	
copyArray()

The script repeatedly creates randomly sized arrays filled with 0x41, then calls fromstring() and checks the array for corruption. If any is found, the relevant bytes are written to the console as hex. The output should look something like this:

Starting...
00000007 a[00000cdc]: c8684d0b0f54c0
0000001d a[0000f84d]: b03f4f0b8be620
00000027 a[0000119f]: 50724d0b0f54c0
0000004c a[00000e53]: b86b4d0b0f54c0
0000005a a[000001e1]: d8ab4609040620
00000090 a[0000015b]: 9040620104e5f0
0000014d a[000002d6]: 10ec620d8ab460
00000153 a[000000f7]: 9040620104e5f0
0000023c a[00000186]: 50d34c0f8b65a0
00000279 a[000001c3]: d8ab4609040620
000002ee a[00000133]: 9040620104e5f0
000002ff a[00000154]: 9040620104e5f0
0000030f a[00000278]: 10ec620d8ab460
00000368 a[00000181]: 50d34c0f8b65a0
000003b2 a[0000005a]: d0de5f0d05e5f0
000003b5 a[0000021c]: b854d00d3620
00000431 a[000001d8]: d8ab4609040620
0000044b a[000002db]: 10ec620d8ab460
00000461 a[000000de]: 9040620104e5f0
000004fb a[0000232f]: 10f74d0c0ce620
00000510 a[0000014a]: 9040620104e5f0

In some applications, such as those that are web-based, similar circumstances may manifest that would allow for remote exploitation.

To fix the issue, array_fromstring should check if self->ob_item is pointing to the same memory as str, and handle the copy accordingly. A proposed patch is attached.
msg246622 - (view) Author: John Leitch (JohnLeitch) * Date: 2015-07-12 00:33
Attaching patch.
msg247051 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-07-21 19:28
Minimal example:

import array
a = array.array("B")
a.fromstring(b'x'*0x10000)
a.fromstring(a)
a.fromstring(a)

In 3.x it doesn't work. An exception is raised:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
BufferError: cannot resize an array that is exporting buffers

I think it would be better to raise an exception in 2.7 too.
msg247297 - (view) Author: John Leitch (JohnLeitch) * Date: 2015-07-24 18:23
I understand the desire for consistency and I will create such a patch when I get some slack space (hopefully tonight), but I believe it will constitute a breaking change; in 2.7, passing self to array.fromstring works as expected most of the time.
msg247298 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-07-24 18:45
This is not about consistency, this is about that don't encourage users to write new code incompatible with 3.x. For now passing self to array.fromstring() doesn't work in 3.x and doesn't work (sporadically crashes) and never worked in 2.7.

What you think about this Benjamin?
msg247300 - (view) Author: John Leitch (JohnLeitch) * Date: 2015-07-24 20:07
To clarify one point, passing self to array.fromstring works as expected almost all the time in 2.7. My testing revealed anomalous behavior <1% of the time, and it was almost always non-fatal corruption of the buffer. It stands to reason that legacy code may exist that relies on similar operations, and such code would be broken by the requested change.
msg247312 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2015-07-25 04:18
I think it should raise an exception. It's hard to feel too bad about preventing corruption even if only "occasional".
msg247392 - (view) Author: John Leitch (JohnLeitch) * Date: 2015-07-25 20:58
Attached is a patch that updates array.fromstring to throw a ValueError when self is passed. It also updates the unit tests to cover this new behavior.
msg247407 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-07-26 05:40
The patch doesn't apply correctly. Looks as it is encoded with UTF-16. For future please provide a patch in the encoding of the source file (should be ASCII compatible, without BOM).
msg247409 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-07-26 05:50
New changeset 2d39777f3477 by Serhiy Storchaka in branch '2.7':
Issue #24613: Calling array.fromstring() with self is no longer allowed
https://hg.python.org/cpython/rev/2d39777f3477
msg247411 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-07-26 06:04
Thank you for your contribution John.
History
Date User Action Args
2022-04-11 14:58:18adminsetgithub: 68801
2015-08-17 04:33:15Arfreversetnosy: + Arfrever
2015-07-26 06:05:51serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2015-07-26 06:04:57serhiy.storchakasetmessages: + msg247411
2015-07-26 05:50:58python-devsetnosy: + python-dev
messages: + msg247409
2015-07-26 05:40:17serhiy.storchakasetmessages: + msg247407
2015-07-25 20:58:32JohnLeitchsetfiles: + array.fromstring-Use-After-Free.patch

messages: + msg247392
2015-07-25 04:18:53benjamin.petersonsetmessages: + msg247312
2015-07-24 20:07:29JohnLeitchsetmessages: + msg247300
2015-07-24 18:45:00serhiy.storchakasetnosy: + benjamin.peterson
messages: + msg247298
2015-07-24 18:23:26JohnLeitchsetmessages: + msg247297
2015-07-21 19:28:59serhiy.storchakasettype: security -> crash
messages: + msg247051
2015-07-12 03:25:37serhiy.storchakasetassignee: serhiy.storchaka

nosy: + serhiy.storchaka
components: + Extension Modules
stage: patch review
2015-07-12 00:33:14JohnLeitchsetfiles: + arraymodule.c.patch
keywords: + patch
messages: + msg246622
2015-07-12 00:32:50JohnLeitchcreate