Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heap overwrite in Python/fileutils.c:_Py_char2wchar() on 32 bit systems due to malloc parameter overflow #67354

Closed
Guido mannequin opened this issue Jan 4, 2015 · 6 comments
Labels
type-security A security issue

Comments

@Guido
Copy link
Mannequin

Guido mannequin commented Jan 4, 2015

BPO 23165
Nosy @vstinner, @benjaminp
Files
  • _py_char2wchar_patches.tar.gz: Patches for 3.4, 3.3, 3.2 (untested)
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2015-01-04.22:06:59.184>
    created_at = <Date 2015-01-04.16:48:31.800>
    labels = ['type-security']
    title = 'Heap overwrite in Python/fileutils.c:_Py_char2wchar() on 32 bit systems due to malloc parameter overflow'
    updated_at = <Date 2015-01-11.00:47:07.211>
    user = 'https://bugs.python.org/Guido'

    bugs.python.org fields:

    activity = <Date 2015-01-11.00:47:07.211>
    actor = 'Arfrever'
    assignee = 'none'
    closed = True
    closed_date = <Date 2015-01-04.22:06:59.184>
    closer = 'python-dev'
    components = []
    creation = <Date 2015-01-04.16:48:31.800>
    creator = 'Guido'
    dependencies = []
    files = ['37597']
    hgrepos = []
    issue_num = 23165
    keywords = []
    message_count = 6.0
    messages = ['233424', '233428', '233430', '233431', '233433', '233435']
    nosy_count = 5.0
    nosy_names = ['vstinner', 'benjamin.peterson', 'Arfrever', 'python-dev', 'Guido']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'security'
    url = 'https://bugs.python.org/issue23165'
    versions = ['Python 3.2', 'Python 3.3', 'Python 3.4']

    @Guido
    Copy link
    Mannequin Author

    Guido mannequin commented Jan 4, 2015

    The vulnerability described here is exceedingly difficult to exploit, since there is no straight-forward way an "attacker" (someone who controls a Python script contents but not other values such as system environment variables), can control a relevant parameter to the vulnerable function (_Py_char2wchar in Python/fileutils.c). It is, however, important that it is remediated since unawareness of this vulnerability may cause an unsuspecting author to establish a link between user and the function parameter in future versions of Python.

    Like I said, the vulnerability is caused by code in the _Py_char2wchar function. Indirectly this function is accessed through Objects/unicodeobject.c:PyUnicode_DecodeLocaleAndSize(), PyUnicode_DecodeFSDefaultAndSize(), PyUnicode_DecodeLocale, and some other functions.

    As far as I know this can only be exploited on 32-bit architectures (whose overflow threshold of its registers is 2**32). The following description sets out from the latest Python 3.4 code retrieved from https://hg.python.org/cpython .

    The problem lies in the computation of size of the buffer that will hold the wide char version of the input string:

    --
    Python/fileutils.c

    296 #ifdef HAVE_BROKEN_MBSTOWCS
    297 /* Some platforms have a broken implementation of
    298 * mbstowcs which does not count the characters that
    299 * would result from conversion. Use an upper bound.
    300 */
    301 argsize = strlen(arg);
    302 #else
    303 argsize = mbstowcs(NULL, arg, 0);
    304 #endif
    ...
    ...
    306 res = (wchar_t *)PyMem_RawMalloc((argsize+1)*sizeof(wchar_t));

    and:

    331 argsize = strlen(arg) + 1;
    332 res = (wchar_t*)PyMem_RawMalloc(argsize*sizeof(wchar_t));

    Both invocations to PyMem_RawMalloc are not preceded by code that asserts no overflow will occur as a result of multiplication of the length of 'arg' by sizeof(wchar_t), which is typically 4 bytes. It follows that on a 32-bit architecture, it is possible cause an internal overflow to occur through the supplication of a string whose size is >= ((2**32)-1) / 4, which is 1 gigabyte. The supplication of a 1 GB (minus one byte) string will therefore result in a value of 0 being passed to PyMem_RawMalloc, because:

            argsize = 1024*1024*1024-1
            malloc_argument = ((argsize+1) * 4
            print malloc_argument & 0xFFFFFFFF
            # prints '0'
            
    Effectively this will result in an allocation of exactly 1 byte, since a parameter of 0 is automatically adjusted to 1 by the underlying _PyMem_RawMalloc():

    --
    Objects/obmalloc.c

    51 static void *
    52 _PyMem_RawMalloc(void *ctx, size_t size)
    53 {
    54 /* PyMem_Malloc(0) means malloc(1). Some systems would return NULL
    55 for malloc(0), which would be treated as an error. Some platforms would
    56 return a pointer with no memory behind it, which would break pymalloc.
    57 To solve these problems, allocate an extra byte. */
    58 if (size == 0)
    59 size = 1;
    60 return malloc(size);
    61 }

    Once the memory has been allocated, mbstowcs() is invoked:

    --
    Python/fileutils.c

    306 res = (wchar_t *)PyMem_RawMalloc((argsize+1)*sizeof(wchar_t));
    307 if (!res)
    308 goto oom;
    309 count = mbstowcs(res, arg, argsize+1);

    In my test setup (latest 32 bit Debian), mbstowcs returns '0', meaning no bytes were written to 'res'.

    Then, 'res' is iterated over and the iteration is halted as soon as a null-wchar or a wchar which is a surrogate:

    --
    Python/fileutils.c

    310 if (count != (size_t)-1) {
    311 wchar_t *tmp;
    312 /* Only use the result if it contains no
    313 surrogate characters. */
    314 for (tmp = res; *tmp != 0 &&
    315 !Py_UNICODE_IS_SURROGATE(*tmp); tmp++)
    316 ;
    317 if (*tmp == 0) {
    318 if (size != NULL)
    319 *size = count;
    320 return res;
    321 }
    322 }
    323 PyMem_RawFree(res);

    Py_UNICODE_IS_SURROGATE is defined as follows:

    --
    Include/unicodeobject.h

    183 #define Py_UNICODE_IS_SURROGATE(ch) (0xD800 <= (ch) && (ch) <= 0xDFFF)

    In the iteration over 'res', control is transferred back to the invoker of _Py_char2wchar() if a null-wchar is encountered first. If, however, a wchar that does satisfies the expression in Py_UNICODE_IS_SURROGATE() is encountered first, *tmp is not null and thus the conditional code on lines 318-320 is skipped.
    The space that 'res' points to is unintialized. Uninitialized, however, does not not entail randomness in this case. If an attacker has sufficient freedom to manipulate the contents of the process memory prior to calling _Py_char2wchar() in order to scatter it with values that satisfy Py_UNICODE_IS_SURROGATE(), this could increase their odds of having _Py_char2wchar() encounter such a value before a null-wchar. These kinds of details are very dependant on system architecture, operating system, libc implementation and so forth.

    The remainder of the function will perform a byte-per-byte conversion embedded in a loop, to manually convert the entire input string. Especially relevant to this vulnerability are lines 332, 339, 356 and 365, 366:

    On line 332 memory is allocated, effectively only 1 byte as explained above. 'argsize', however, is 0x40000000 in our case, and the entire routine is repeated until argsize is 0.
    On line 339 one or more characters are converted, and stored into 'out', which is 'res'. Lines 356 and 366 do the same.

    --
    Python/fileutils.c

    325 /* Conversion failed. Fall back to escaping with surrogateescape. */
    326 #ifdef HAVE_MBRTOWC
    327 /* Try conversion with mbrtwoc (C99), and escape non-decodable bytes. */
    328
    329 /* Overallocate; as multi-byte characters are in the argument, the
    330 actual output could use less memory. */
    331 argsize = strlen(arg) + 1;
    332 res = (wchar_t*)PyMem_RawMalloc(argsize*sizeof(wchar_t));
    333 if (!res)
    334 goto oom;
    335 in = (unsigned char*)arg;
    336 out = res;
    337 memset(&mbs, 0, sizeof mbs);
    338 while (argsize) {
    339 size_t converted = mbrtowc(out, (char*)in, argsize, &mbs);
    340 if (converted == 0)
    341 /* Reached end of string; null char stored. */
    342 break;
    343 if (converted == (size_t)-2) {
    344 /* Incomplete character. This should never happen,
    345 since we provide everything that we have -
    346 unless there is a bug in the C library, or I
    347 misunderstood how mbrtowc works. */
    348 PyMem_RawFree(res);
    349 if (size != NULL)
    350 *size = (size_t)-2;
    351 return NULL;
    352 }
    353 if (converted == (size_t)-1) {
    354 /* Conversion error. Escape as UTF-8b, and start over
    355 in the initial shift state. */
    356 *out++ = 0xdc00 + *in++;
    357 argsize--;
    358 memset(&mbs, 0, sizeof mbs);
    359 continue;
    360 }
    361 if (Py_UNICODE_IS_SURROGATE(out)) {
    362 /
    Surrogate character. Escape the original
    363 byte sequence with surrogateescape. */
    364 argsize -= converted;
    365 while (converted--)
    366 *out++ = 0xdc00 + *in++;
    367 continue;
    368 }
    369 /* successfully converted some bytes */
    370 in += converted;
    371 argsize -= converted;
    372 out++;
    373 }
    374 if (size != NULL)
    375 *size = out - res;
    376 #else /* HAVE_MBRTOWC */
    377 /* Cannot use C locale for escaping; manually escape as if charset
    378 is ASCII (i.e. escape all bytes > 128. This will still roundtrip
    379 correctly in the locale's charset, which must be an ASCII superset. */
    380 res = decode_ascii_surrogateescape(arg, size);
    381 if (res == NULL)
    382 goto oom;
    383 #endif /* HAVE_MBRTOWC */

    Suffice it to say that this leads to writing to memory that has not been allocated, thereby making this a heap overflow vulnerability. decode_ascii_surrogateescape() seems to suffer from the same issue.

    Guido Vranken,

    Intelworks
    http://www.intelworks.com/

    @Guido Guido mannequin added the type-security A security issue label Jan 4, 2015
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 4, 2015

    New changeset 1ce98e85929d by Benjamin Peterson in branch '3.2':
    add some overflow checks before multiplying (closes bpo-23165)
    https://hg.python.org/cpython/rev/1ce98e85929d

    New changeset d1af6f3a8ce3 by Benjamin Peterson in branch '3.3':
    merge 3.2 (closes bpo-23165)
    https://hg.python.org/cpython/rev/d1af6f3a8ce3

    New changeset d45e16b1ed86 by Benjamin Peterson in branch '3.4':
    merge 3.3 (closes bpo-23165)
    https://hg.python.org/cpython/rev/d45e16b1ed86

    New changeset 8c4fb312e15d by Benjamin Peterson in branch 'default':
    merge 3.4 (bpo-23165)
    https://hg.python.org/cpython/rev/8c4fb312e15d

    @python-dev python-dev mannequin closed this as completed Jan 4, 2015
    @vstinner
    Copy link
    Member

    vstinner commented Jan 4, 2015

    + size_t argsize = strlen(arg) + 1;
    + if (argsize > PY_SSIZE_T_MAX/sizeof(wchar_t))
    + return NULL;
    + res = PyMem_Malloc(argsize*sizeof(wchar_t));

    The code doesn't check for integer overflow on "+1". I suggest instead:

    + size_t arglen = strlen(arg);
    + if (arglen > PY_SSIZE_T_MAX / sizeof(wchar_t) - 1)
    + return NULL;
    + res = PyMem_Malloc((arglen + 1) * sizeof(wchar_t));

    @benjaminp
    Copy link
    Contributor

    Presumably strlen can't return SIZE_T_MAX because the trailing '\0' has to have been allocated somewhere.

    @vstinner
    Copy link
    Member

    vstinner commented Jan 4, 2015

    PY_SSIZE_T_MAX is usually smaller than SIZE_T_MAX ;-)

    (strlen result is not signed.)

    @benjaminp
    Copy link
    Contributor

    Right, but there's still no danger of overflow.

    On Sun, Jan 4, 2015, at 16:50, STINNER Victor wrote:

    STINNER Victor added the comment:

    PY_SSIZE_T_MAX is usually smaller than SIZE_T_MAX ;-)

    (strlen result is not signed.)

    ----------


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue23165\>


    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    type-security A security issue
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants