Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple buffer overflows in unicode processing #46872

Closed
jnferguson mannequin opened this issue Apr 11, 2008 · 21 comments
Closed

Multiple buffer overflows in unicode processing #46872

jnferguson mannequin opened this issue Apr 11, 2008 · 21 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-security A security issue

Comments

@jnferguson
Copy link
Mannequin

jnferguson mannequin commented Apr 11, 2008

BPO 2620
Nosy @malemburg, @terryjreedy, @gpshead, @abalkin, @vstinner
Files
  • python-2.5.2-unicode_resize-utf7.py
  • python-2.5.2-unicode_resize-utf8.py
  • python-2.5.2-unicode_resize-utf16.py
  • issue2620-gps02-patch.txt: always check for overflow in _New and _Resize
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/gpshead'
    closed_at = <Date 2010-08-03.18:06:26.772>
    created_at = <Date 2008-04-11.22:35:36.662>
    labels = ['type-security', 'interpreter-core']
    title = 'Multiple buffer overflows in unicode processing'
    updated_at = <Date 2010-08-03.18:06:26.771>
    user = 'https://bugs.python.org/jnferguson'

    bugs.python.org fields:

    activity = <Date 2010-08-03.18:06:26.771>
    actor = 'terry.reedy'
    assignee = 'gregory.p.smith'
    closed = True
    closed_date = <Date 2010-08-03.18:06:26.772>
    closer = 'terry.reedy'
    components = ['Interpreter Core']
    creation = <Date 2008-04-11.22:35:36.662>
    creator = 'jnferguson'
    dependencies = []
    files = ['10011', '10012', '10013', '10825']
    hgrepos = []
    issue_num = 2620
    keywords = ['patch']
    message_count = 21.0
    messages = ['65379', '65382', '65384', '65386', '65387', '65389', '65393', '65395', '65397', '65398', '65441', '65457', '65458', '69319', '70061', '70103', '70137', '70337', '92146', '105545', '107420']
    nosy_count = 8.0
    nosy_names = ['lemburg', 'nnorwitz', 'terry.reedy', 'gregory.p.smith', 'belopolsky', 'vstinner', 'jnferguson', 'boya']
    pr_nums = []
    priority = 'high'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'security'
    url = 'https://bugs.python.org/issue2620'
    versions = ['Python 2.5']

    @jnferguson
    Copy link
    Mannequin Author

    jnferguson mannequin commented Apr 11, 2008

    174 static
    175 int unicode_resize(register PyUnicodeObject *unicode,
    176 Py_ssize_t length)
    177 {
    [...]
    201
    202 oldstr = unicode->str;
    203 PyMem_RESIZE(unicode->str, Py_UNICODE, length + 1);
    [...]
    209 unicode->str[length] = 0;
    210 unicode->length = length;
    211

    95 #define PyMem_RESIZE(p, type, n) \
    96 ( assert((n) <= PY_SIZE_MAX / sizeof(type)) , \
    97 ( (p) = (type *) PyMem_REALLOC((p), (n) * sizeof(type)) ) )

    The unicode_resize() function acts essentially as a wrapper to
    realloc(), it accomplishes this via the PyMem_RESIZE() macro which
    factors the size with the size of the type, in this case it multiplies
    by two as Py_UNICODE is typedef'd to a wchar_t. When resizing large
    strings, this results in an incorrect allocation that in turn leads to
    buffer overflow.

    This is specific to the Unicode objects, however I would not be
    surprised to see that other types have this complication as well. Please
    see attached proof of concepts.

    @jnferguson jnferguson mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-security A security issue labels Apr 11, 2008
    @malemburg
    Copy link
    Member

    You are probably referring to 32-bit platforms. At least on 64-bit
    platforms, there's no problem with your test cases:

    >>> # this is to get the unicode_freelist initialized
    ... # the length of the string must be <= 9 to keep
    ... # unicode->str from being deallocated and set to
    ... # NULL
    ... bla = unicode('IOActive')
    >>> del bla
    >>>
    >>>
    >>> msg = 'A'*2147483647
    >>>
    >>> msg.decode('utf7')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    MemoryError

    The code does check for success of the realloc():

        PyMem_RESIZE(unicode->str, Py_UNICODE, length + 1);
        if (!unicode->str) {
    	unicode->str = (Py_UNICODE *)oldstr;
            PyErr_NoMemory();
            return -1;
        }

    Are you after the integer overflow and the fact that realloc() would (if
    possible) allocate a buffer smaller than needed ?

    @jnferguson
    Copy link
    Mannequin Author

    jnferguson mannequin commented Apr 12, 2008

    Yes, excuse me-- this should be 32-bit specific as I believe Python will
    not let me get a string long enough to overflow the integer on 64-bit.
    It's a big string, the only realistic scenario I can see is XML parsing
    or similar.

    theory$ ./python -V
    Python 2.5.2
    theory$ cat /proc/cpuinfo | grep -i model
    model : 4
    model name : Intel(R) Pentium(R) 4 CPU 3.00GHz
    theory$ ./python python-2.5.2-unicode_resize-utf7.py
    Segmentation fault

    @abalkin
    Copy link
    Member

    abalkin commented Apr 12, 2008

    Note that in r61458 Neal replaced PyMem_RESIZE with a direct call to PyMem_REALLOC thus eliminating integer overflow check even from the debug
    builds.

    @abalkin
    Copy link
    Member

    abalkin commented Apr 12, 2008

    Justin,

    Where did you find the definition that you cited:

    95 #define PyMem_RESIZE(p, type, n) \
    96 ( assert((n) <= PY_SIZE_MAX / sizeof(type)) , \
    97 ( (p) = (type *) PyMem_REALLOC((p), (n) * sizeof(type)) ) )

    ?

    Current Include/pymem.h does not have the assert:

     94 #define PyMem_RESIZE(p, type, n) \\
     95         ( (p) = (type \*) PyMem_REALLOC((p), (n) * sizeof(type)) 
    

    )

    and it did not change for a while.

    @abalkin
    Copy link
    Member

    abalkin commented Apr 12, 2008

    The following simple change should be enough for this issue, but I would
    consider implementing the overflow check in the PyMem_RESIZE and PyMem_NEW macros and de-deprecate their use.

    ===================================================================

    --- Objects/unicodeobject.c     (revision 62237)
    +++ Objects/unicodeobject.c     (working copy)
    @@ -261,8 +261,8 @@
            it contains). */
     
         oldstr = unicode->str;
    -    unicode->str = PyObject_REALLOC(unicode->str,
    -                                   sizeof(Py_UNICODE) * (length + 1));
    +    unicode->str = SIZE_MAX/sizeof(Py_UNICODE) - 1 < length ? NULL :
    +        PyObject_REALLOC(unicode->str, sizeof(Py_UNICODE) * (length + 
    1));
         if (!unicode->str) {
            unicode->str = (Py_UNICODE *)oldstr;
             PyErr_NoMemory();

    @jnferguson
    Copy link
    Mannequin Author

    jnferguson mannequin commented Apr 12, 2008

    i pulled the Macros out of pymem.h in a Vanille 2.5.2?

    @jnferguson
    Copy link
    Mannequin Author

    jnferguson mannequin commented Apr 12, 2008

    sorry didnt mean to change components and version-- I'm typing this from
    my phone and its being uncooperative at the moment

    @jnferguson jnferguson mannequin added the stdlib Python modules in the Lib dir label Apr 12, 2008
    @jnferguson
    Copy link
    Mannequin Author

    jnferguson mannequin commented Apr 12, 2008

    just fixing the modifications my phone made earlier tonight

    @jnferguson jnferguson mannequin removed the stdlib Python modules in the Lib dir label Apr 12, 2008
    @jnferguson
    Copy link
    Mannequin Author

    jnferguson mannequin commented Apr 12, 2008

    Additionally-- the PyMem_NEW()/PyMem_New() macro's need to be fixed:

    231 static
    232 PyUnicodeObject *_PyUnicode_New(Py_ssize_t length)
    233 {
    234 register PyUnicodeObject *unicode;
    235
    236 /* Optimization for empty strings */
    237 if (length == 0 && unicode_empty != NULL) {
    238 Py_INCREF(unicode_empty);
    239 return unicode_empty;
    240 }
    241
    242 /* Unicode freelist & memory allocation */
    243 if (unicode_freelist) {
    244 unicode = unicode_freelist;
    245 unicode_freelist = *(PyUnicodeObject **)unicode;
    246 unicode_freelist_size--;
    247 if (unicode->str) {
    248 /* Keep-Alive optimization: we only upsize the buffer,
    249 never downsize it. */
    250 if ((unicode->length < length) &&
    251 unicode_resize(unicode, length) < 0) {
    252 PyMem_DEL(unicode->str);
    253 goto onError;
    254 }
    255 }
    256 else {
    257 unicode->str = PyMem_NEW(Py_UNICODE, length + 1);
    258 }

    85 #define PyMem_New(type, n) \
    86 ( assert((n) <= PY_SIZE_MAX / sizeof(type)) , \
    87 ( (type *) PyMem_Malloc((n) * sizeof(type)) ) )
    88 #define PyMem_NEW(type, n) \
    89 ( assert((n) <= PY_SIZE_MAX / sizeof(type)) , \
    90 ( (type *) PyMem_MALLOC((n) * sizeof(type)) ) )
    91
    92 #define PyMem_Resize(p, type, n) \
    93 ( assert((n) <= PY_SIZE_MAX / sizeof(type)) , \
    94 ( (p) = (type *) PyMem_Realloc((p), (n) * sizeof(type)) ) )
    95 #define PyMem_RESIZE(p, type, n) \
    96 ( assert((n) <= PY_SIZE_MAX / sizeof(type)) , \
    97 ( (p) = (type *) PyMem_REALLOC((p), (n) * sizeof(type)) ) )

    It looks like much of this is reachable from modules, unicode objects,
    dict objects, set objects, et cetera. Also, if a 2G string across the
    network seems unrealistic consider what 2 billion A's compresses down
    to.(Macros again taken from 2.5.2 vanilla)

    @malemburg
    Copy link
    Member

    On 32-bit platforms, it's probably best to add a size check. I don't
    it's worth doing that on 64-bit platforms - overflows are rather
    unlikely there.

    @gpshead gpshead self-assigned this Apr 14, 2008
    @gpshead
    Copy link
    Member

    gpshead commented Apr 14, 2008

    Here's a patch that fixes this by making both Python's malloc and
    realloc return NULL if (0 <= size <= PY_SSIZE_T_MAX).

    A side effect of this is that strings on 32bit platforms can no longer
    be allocated up to 2**31-1 in length as the malloc includes the internal
    python object structure overhead. The maximum string size becomes
    2147483609 with an optimized build on this system.

    I do not think that is a problem. A 32-bit process by definition can
    only ever have one such object allocated at a time anyways. ;)

    any objections?

    @abalkin
    Copy link
    Member

    abalkin commented Apr 14, 2008

    On Sun, Apr 13, 2008 at 11:12 PM, Gregory P. Smith
    <report@bugs.python.org> wrote:
    ..

    Here's a patch that fixes this by making both Python's malloc and
    realloc return NULL if (0 <= size <= PY_SSIZE_T_MAX).

    This will not solve the original problem completely: multiplicative
    overflow may produce size in the 0 to PY_SSIZE_T_MAX range.
    Furthemore, malloc and realloc take unsigned arguments and I believe
    there are cases when they are called with unsigned arguments in python
    code. Using the proposed macro definitions in these cases will lead
    to compiler warnings.

    I don't object to limiting the allowed malloc/realoc size, but the
    check should be expressed as unsigned comparison: (size_t)(n) >
    (size_t)PY_SSIZE_T_MAX and multiplications by n > 2 should still be
    checked for overflow before the result can be used for malloc.

    @gpshead
    Copy link
    Member

    gpshead commented Jul 6, 2008

    here's an updated patch. It makes PyMem_NEW and PyMem_RESIZE macros
    always do explicit an overflow check before doing the multiplication.

    Uses of the the macros have been cleaned up in the couple places I
    noticed that would leak memory or corrupt their own state by replacing
    the original pointer to their memory with NULL on error before raising
    MemoryError. This bug was already present in the existing code if
    realloc ever returned NULL.

    (IMHO PyMem_RESIZE & PyMem_Resize are a poorly designed macros. The
    blind pointer assignment should never have been done within the macro.
    But given that it is exposed as an API and presumably used by third
    party extension modules the broken API must be maintained.)

    @gpshead
    Copy link
    Member

    gpshead commented Jul 19, 2008

    diff up for review on http://codereview.appspot.com/2599

    @malemburg
    Copy link
    Member

    The patch looks good to me.

    @gpshead
    Copy link
    Member

    gpshead commented Jul 22, 2008

    Commited to trunk. r65182.

    This still needs backporting to release25-maint. (and release24-maint
    if anyone is maintaining that)

    @nnorwitz
    Copy link
    Mannequin

    nnorwitz mannequin commented Jul 28, 2008

    Committed revision 65261 for 2.5
    Committed revision 65262 for 2.4.

    @nnorwitz nnorwitz mannequin closed this as completed Jul 28, 2008
    @boya
    Copy link
    Mannequin

    boya mannequin commented Sep 1, 2009

    In Python/pyarena.c:

    block_new(size_t size)
    {
    	/* Allocate header and block as one unit.
    	   ab_mem points just past header. */
    	block *b = (block *)malloc(sizeof(block) + size);
            ...
    }

    Should a check for overflow of "size" also be performed before calling
    "malloc"?

    @brettcannon brettcannon reopened this Sep 2, 2009
    @terryjreedy
    Copy link
    Member

    Neal committed changes for 2.4,2.5, so I removed those.
    3.0 is dead. Is this an issue for 3.1,3.2 or should it be closed?

    @terryjreedy
    Copy link
    Member

    Brett, open and fixed are contradictory? for what version did you reopen this?

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-security A security issue
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants