This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author jaybosamiya
Recipients jaybosamiya
Date 2017-06-13.15:35:28
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1497368129.36.0.20181989843.issue30657@psf.upfronthosting.co.za>
In-reply-to
Content
In Python 2.7, there is a possible integer overflow in
PyString_DecodeEscape function of the file stringobject.c, which can
be abused to gain a heap overflow, possibly leading to arbitrary code
execution.

The relevant parts of the code are highlighted below:

    PyObject *PyString_DecodeEscape(const char *s,
                                    Py_ssize_t len,
                                    const char *errors,
                                    Py_ssize_t unicode,
                                    const char *recode_encoding)
    {
        int c;
        char *p, *buf;
        const char *end;
        PyObject *v;
(1)     Py_ssize_t newlen = recode_encoding ? 4*len:len;
(2)     v = PyString_FromStringAndSize((char *)NULL, newlen);
        if (v == NULL)
            return NULL;
(3)     p = buf = PyString_AsString(v);
        end = s + len;
        while (s < end) {
            if (*s != '\\') {
              non_esc:
    #ifdef Py_USING_UNICODE
    [...]
    #else
                *p++ = *s++;
    #endif
                continue;
    [...]
            }
        }
(4)     if (p-buf < newlen)
            _PyString_Resize(&v, p - buf); /* v is cleared on error */
        return v;
      failed:
        Py_DECREF(v);
        return NULL;
    }


(1) If recode_encoding is true (i.e., non-null), we have an integer
      overflow here which can set newlen to be some very small value
(2) This allows a small string to be created into v
(3) Now p (and buf) use that small string
(4) The small string is copied into with a larger string, thereby
      giving a heap buffer overflow

In the highly unlikely but definitely possible situation that we pass
it a very large string (in the order of ~1GB on a 32-bit Python
install), one can reliably get heap corruption. It is possible to
access this function (and condition in line(1)) through function
parsestr from ast.c, when the file encoding of an input .py file is
something apart from utf-8 and iso-8859-1. This can be trivially done
using the following at the start of the file:
    # -*- coding: us-ascii -*-

The attached file (poc-gen.py) produces a poc.py file which satisfies
these constraints and shows the vulnerability.

Note: To see the vulnerability in action, it is necessary to have an
ASAN build of Python, compiled for 32 bit on a 64 bit machine.
Additionally, the poc.py file generated can take an extremely long
time to load (over a few hours), and finally crash. Instead, if one
wishes to see the proof of vulnerability quicker, then it might be
better to change the constant 4 in line (1) to 65536 (just for
simplicity sake), and change the multiplication_constant in poc-gen.py
file to be the same (i.e. 65536).

Proposed fix: Confirm that the multiplication will not overflow,
before actually performing the multiplication and depending on the
result.
History
Date User Action Args
2017-06-13 15:35:29jaybosamiyasetrecipients: + jaybosamiya
2017-06-13 15:35:29jaybosamiyasetmessageid: <1497368129.36.0.20181989843.issue30657@psf.upfronthosting.co.za>
2017-06-13 15:35:29jaybosamiyalinkissue30657 messages
2017-06-13 15:35:28jaybosamiyacreate