Message 295930 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	jaybosamiya
Recipients	jaybosamiya
Date	2017-06-13.15:35:28
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1497368129.36.0.20181989843.issue30657@psf.upfronthosting.co.za>
In-reply-to

Content
In Python 2.7, there is a possible integer overflow in PyString_DecodeEscape function of the file stringobject.c, which can be abused to gain a heap overflow, possibly leading to arbitrary code execution. The relevant parts of the code are highlighted below: PyObject PyString_DecodeEscape(const char s, Py_ssize_t len, const char errors, Py_ssize_t unicode, const char recode_encoding) { int c; char p, buf; const char end; PyObject v; (1) Py_ssize_t newlen = recode_encoding ? 4len:len; (2) v = PyString_FromStringAndSize((char )NULL, newlen); if (v == NULL) return NULL; (3) p = buf = PyString_AsString(v); end = s + len; while (s < end) { if (s != '\\') { non_esc: #ifdef Py_USING_UNICODE [...] #else p++ = s++; #endif continue; [...] } } (4) if (p-buf < newlen) _PyString_Resize(&v, p - buf); / v is cleared on error / return v; failed: Py_DECREF(v); return NULL; } (1) If recode_encoding is true (i.e., non-null), we have an integer overflow here which can set newlen to be some very small value (2) This allows a small string to be created into v (3) Now p (and buf) use that small string (4) The small string is copied into with a larger string, thereby giving a heap buffer overflow In the highly unlikely but definitely possible situation that we pass it a very large string (in the order of ~1GB on a 32-bit Python install), one can reliably get heap corruption. It is possible to access this function (and condition in line(1)) through function parsestr from ast.c, when the file encoding of an input .py file is something apart from utf-8 and iso-8859-1. This can be trivially done using the following at the start of the file: # -- coding: us-ascii -*- The attached file (poc-gen.py) produces a poc.py file which satisfies these constraints and shows the vulnerability. Note: To see the vulnerability in action, it is necessary to have an ASAN build of Python, compiled for 32 bit on a 64 bit machine. Additionally, the poc.py file generated can take an extremely long time to load (over a few hours), and finally crash. Instead, if one wishes to see the proof of vulnerability quicker, then it might be better to change the constant 4 in line (1) to 65536 (just for simplicity sake), and change the multiplication_constant in poc-gen.py file to be the same (i.e. 65536). Proposed fix: Confirm that the multiplication will not overflow, before actually performing the multiplication and depending on the result.

In Python 2.7, there is a possible integer overflow in
PyString_DecodeEscape function of the file stringobject.c, which can
be abused to gain a heap overflow, possibly leading to arbitrary code
execution.

The relevant parts of the code are highlighted below:

    PyObject *PyString_DecodeEscape(const char *s,
                                    Py_ssize_t len,
                                    const char *errors,
                                    Py_ssize_t unicode,
                                    const char *recode_encoding)
    {
        int c;
        char *p, *buf;
        const char *end;
        PyObject *v;
(1)     Py_ssize_t newlen = recode_encoding ? 4*len:len;
(2)     v = PyString_FromStringAndSize((char *)NULL, newlen);
        if (v == NULL)
            return NULL;
(3)     p = buf = PyString_AsString(v);
        end = s + len;
        while (s < end) {
            if (*s != '\\') {
              non_esc:
    #ifdef Py_USING_UNICODE
    [...]
    #else
                *p++ = *s++;
    #endif
                continue;
    [...]
            }
        }
(4)     if (p-buf < newlen)
            _PyString_Resize(&v, p - buf); /* v is cleared on error */
        return v;
      failed:
        Py_DECREF(v);
        return NULL;
    }


(1) If recode_encoding is true (i.e., non-null), we have an integer
      overflow here which can set newlen to be some very small value
(2) This allows a small string to be created into v
(3) Now p (and buf) use that small string
(4) The small string is copied into with a larger string, thereby
      giving a heap buffer overflow

In the highly unlikely but definitely possible situation that we pass
it a very large string (in the order of ~1GB on a 32-bit Python
install), one can reliably get heap corruption. It is possible to
access this function (and condition in line(1)) through function
parsestr from ast.c, when the file encoding of an input .py file is
something apart from utf-8 and iso-8859-1. This can be trivially done
using the following at the start of the file:
    # -*- coding: us-ascii -*-

The attached file (poc-gen.py) produces a poc.py file which satisfies
these constraints and shows the vulnerability.

Note: To see the vulnerability in action, it is necessary to have an
ASAN build of Python, compiled for 32 bit on a 64 bit machine.
Additionally, the poc.py file generated can take an extremely long
time to load (over a few hours), and finally crash. Instead, if one
wishes to see the proof of vulnerability quicker, then it might be
better to change the constant 4 in line (1) to 65536 (just for
simplicity sake), and change the multiplication_constant in poc-gen.py
file to be the same (i.e. 65536).

Proposed fix: Confirm that the multiplication will not overflow,
before actually performing the multiplication and depending on the
result.

History
Date	User	Action	Args
2017-06-13 15:35:29	jaybosamiya	set	recipients: + jaybosamiya
2017-06-13 15:35:29	jaybosamiya	set	messageid: <1497368129.36.0.20181989843.issue30657@psf.upfronthosting.co.za>
2017-06-13 15:35:29	jaybosamiya	link	issue30657 messages
2017-06-13 15:35:28	jaybosamiya	create