This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Integer overflow in strop
Type: security Stage:
Components: Extension Modules, Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: benjamin.peterson, jcea
Priority: critical Keywords:

Created on 2014-05-19 00:45 by benjamin.peterson, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (2)
msg218774 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2014-05-19 00:45
(copy paste from security list)

"""
Integer overflow in strop.expandtabs:

Aside:
I took a look at python over the weekend as I saw the
the Internet Bug Bounty [ https://hackerone.com/python ]
are now offering bounties for security bugs + patches for
python.

It's quite unclear from their site what they consider to be
a valid bug under their program (they say 'arbitrary code
execution' but I guess just saying "ctypes" would be cheating.)

Anyway, here's a bug in a string handling function which might be
reachable in some "sandboxed python" environments, and maybe
(at a stretch) remotely if someone were to offer
"expanding-tabs-in-strings"-as-a-service...

Bug:

Modules/stropmodule.c

static PyObject *
strop_expandtabs(PyObject *self, PyObject *args)
...
  i = j = old_j = 0;
  e = string + stringlen;
  for (p = string; p < e; p++) {
      if (*p == '\t') {
          j += tabsize - (j%tabsize);
          if (old_j > j) {
              PyErr_SetString(PyExc_OverflowError,
                              "new string is too long");
              return NULL;
          }
          old_j = j;
      } else {
          j++;
          if (*p == '\n') {
              i += j;               <-- missing check here
              j = 0;
          }
      }
  }
...
  out = PyString_FromStringAndSize(NULL, i+j);
...
  i = 0;
  q = PyString_AS_STRING(out);

  for (p = string; p < e; p++) {
      if (*p == '\t') {
          j = tabsize - (i%tabsize);
          i += j;
          while (j-- > 0)
              *q++ = ' ';
      } else {
          *q++ = *p;
          i++;
          if (*p == '\n')
              i = 0;
      }
  }
...

There's no check preventing i from overflowing, meaning that a
string consisting of multiple tabs spread over multiple lines
combined with a large tabsize can cause the allocation of an
undersized string buffer.

With some simple heap manipulation the length of the copy into
this buffer can be controlled, and it's pretty easy to corrupt
memory in such a way as to gain native code execution:

[h1|----------][h2|\t\n\t\n....][h3|----------]

h1: PyStringObject header of undersized buffer
h2: PyStringObject header of tabstring

By grooming the heap such that this allocation pattern is
achieved when the expandtabs function starts expanding the h2 string
into the h1 inline buffer it will overflow into the string being
expanded, overwriting the tabs in the original string with spaces
so that the second loop won't expand them anymore.

By carefully crafting the string to expand and choosing the tabsize
you can limit the extent of the memory corruption to chosen objects.

Getting code execution is simply a matter of pointing the ob_type
field of the h2 string header to a controlled address with a fake
struct _typeobject. The struct contains the following function pointers
which will be called when their corrisponing python function
is called:

  destructor tp_dealloc;
  printfunc tp_print;
  getattrfunc tp_getattr;
  setattrfunc tp_setattr;
  cmpfunc tp_compare;
  reprfunc tp_repr;

Run this email as a script for a very simple crashing PoC for 32-bit python 2.7
which should crash at at address near 0x20202020 (since the ob_type field
will be overwritten with spaces.) No idea if the heap manipulation used
here will work on other platforms but it should be easy to do.


Patch:
You've actually already patched this bug in a copy-and-paste version of
this function... In fact there seem to be at least three versions of expandtabs;
transmogrify.h and stropmodule.c are both vulnerable; stringobject.c isn't. I'm
not familiar enough with the code to know when each version will be used.

The patch is to use the stringobject.c implementation which does the overflow
check correctly, but here's a quick patch (for the 2.7 branch) which will do
the job:

--- a/Modules/stropmodule.c Sun Mar 30 16:43:11 2014 -0400
+++ b/Modules/stropmodule.c Mon Mar 31 00:36:57 2014 +0200
@@ -624,6 +624,11 @@
         } else {
             j++;
             if (*p == '\n') {
+                if (i > PY_SSIZE_T_MAX - j){
+                    PyErr_SetString(PyExc_OverflowError,
+                                    "new string is too long");
+                    return NULL;
+                }
                 i += j;
                 j = 0;
             }

Cheers,

Ian Beer
"""

import strop

strs = []
for i in range(20):
  strs.append('\t\n' * 0x10000 + 'A' * 0x1000000)
for i in range(20):
  print hex(id(strs[i]))
strs[14] = None
strop.expandtabs(strs[15], 0x10001)
print strs[15]
msg218775 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2014-05-19 00:45
5dabc2d2f776
History
Date User Action Args
2022-04-11 14:58:03adminsetgithub: 65729
2014-05-19 18:51:59jceasetnosy: + jcea
2014-05-19 00:45:40benjamin.petersonsetstatus: open -> closed
resolution: fixed
messages: + msg218775
2014-05-19 00:45:05benjamin.petersoncreate