Multiple buffer overflows in unicode processing #46872

jnferguson · 2008-04-11T22:35:37Z

BPO	2620
Nosy	@malemburg, @terryjreedy, @gpshead, @abalkin, @vstinner
Files	python-2.5.2-unicode_resize-utf7.py python-2.5.2-unicode_resize-utf8.py python-2.5.2-unicode_resize-utf16.py issue2620-gps02-patch.txt: always check for overflow in _New and _Resize

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/gpshead'
closed_at = <Date 2010-08-03.18:06:26.772>
created_at = <Date 2008-04-11.22:35:36.662>
labels = ['type-security', 'interpreter-core']
title = 'Multiple buffer overflows in unicode processing'
updated_at = <Date 2010-08-03.18:06:26.771>
user = 'https://bugs.python.org/jnferguson'

bugs.python.org fields:

activity = <Date 2010-08-03.18:06:26.771>
actor = 'terry.reedy'
assignee = 'gregory.p.smith'
closed = True
closed_date = <Date 2010-08-03.18:06:26.772>
closer = 'terry.reedy'
components = ['Interpreter Core']
creation = <Date 2008-04-11.22:35:36.662>
creator = 'jnferguson'
dependencies = []
files = ['10011', '10012', '10013', '10825']
hgrepos = []
issue_num = 2620
keywords = ['patch']
message_count = 21.0
messages = ['65379', '65382', '65384', '65386', '65387', '65389', '65393', '65395', '65397', '65398', '65441', '65457', '65458', '69319', '70061', '70103', '70137', '70337', '92146', '105545', '107420']
nosy_count = 8.0
nosy_names = ['lemburg', 'nnorwitz', 'terry.reedy', 'gregory.p.smith', 'belopolsky', 'vstinner', 'jnferguson', 'boya']
pr_nums = []
priority = 'high'
resolution = 'fixed'
stage = None
status = 'closed'
superseder = None
type = 'security'
url = 'https://bugs.python.org/issue2620'
versions = ['Python 2.5']

jnferguson · 2008-04-11T22:35:34Z

174 static
175 int unicode_resize(register PyUnicodeObject *unicode,
176 Py_ssize_t length)
177 {
[...]
201
202 oldstr = unicode->str;
203 PyMem_RESIZE(unicode->str, Py_UNICODE, length + 1);
[...]
209 unicode->str[length] = 0;
210 unicode->length = length;
211

95 #define PyMem_RESIZE(p, type, n) \
96 ( assert((n) <= PY_SIZE_MAX / sizeof(type)) , \
97 ( (p) = (type *) PyMem_REALLOC((p), (n) * sizeof(type)) ) )

The unicode_resize() function acts essentially as a wrapper to
realloc(), it accomplishes this via the PyMem_RESIZE() macro which
factors the size with the size of the type, in this case it multiplies
by two as Py_UNICODE is typedef'd to a wchar_t. When resizing large
strings, this results in an incorrect allocation that in turn leads to
buffer overflow.

This is specific to the Unicode objects, however I would not be
surprised to see that other types have this complication as well. Please
see attached proof of concepts.

malemburg · 2008-04-11T23:29:28Z

You are probably referring to 32-bit platforms. At least on 64-bit
platforms, there's no problem with your test cases:

>>> # this is to get the unicode_freelist initialized
... # the length of the string must be <= 9 to keep
... # unicode->str from being deallocated and set to
... # NULL
... bla = unicode('IOActive')
>>> del bla
>>>
>>>
>>> msg = 'A'*2147483647
>>>
>>> msg.decode('utf7')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError

The code does check for success of the realloc():

    PyMem_RESIZE(unicode->str, Py_UNICODE, length + 1);
    if (!unicode->str) {
	unicode->str = (Py_UNICODE *)oldstr;
        PyErr_NoMemory();
        return -1;
    }

Are you after the integer overflow and the fact that realloc() would (if
possible) allocate a buffer smaller than needed ?

jnferguson · 2008-04-12T00:20:33Z

Yes, excuse me-- this should be 32-bit specific as I believe Python will
not let me get a string long enough to overflow the integer on 64-bit.
It's a big string, the only realistic scenario I can see is XML parsing
or similar.

theory$ ./python -V
Python 2.5.2
theory$ cat /proc/cpuinfo | grep -i model
model : 4
model name : Intel(R) Pentium(R) 4 CPU 3.00GHz
theory$ ./python python-2.5.2-unicode_resize-utf7.py
Segmentation fault

abalkin · 2008-04-12T03:26:51Z

Note that in r61458 Neal replaced PyMem_RESIZE with a direct call to PyMem_REALLOC thus eliminating integer overflow check even from the debug
builds.

abalkin · 2008-04-12T03:42:10Z

Justin,

Where did you find the definition that you cited:

95 #define PyMem_RESIZE(p, type, n) \
96 ( assert((n) <= PY_SIZE_MAX / sizeof(type)) , \
97 ( (p) = (type *) PyMem_REALLOC((p), (n) * sizeof(type)) ) )

?

Current Include/pymem.h does not have the assert:

 94 #define PyMem_RESIZE(p, type, n) \\
 95         ( (p) = (type \*) PyMem_REALLOC((p), (n) * sizeof(type))

)

and it did not change for a while.

abalkin · 2008-04-12T04:18:46Z

The following simple change should be enough for this issue, but I would
consider implementing the overflow check in the PyMem_RESIZE and PyMem_NEW macros and de-deprecate their use.

===================================================================

--- Objects/unicodeobject.c     (revision 62237)
+++ Objects/unicodeobject.c     (working copy)
@@ -261,8 +261,8 @@
        it contains). */
 
     oldstr = unicode->str;
-    unicode->str = PyObject_REALLOC(unicode->str,
-                                   sizeof(Py_UNICODE) * (length + 1));
+    unicode->str = SIZE_MAX/sizeof(Py_UNICODE) - 1 < length ? NULL :
+        PyObject_REALLOC(unicode->str, sizeof(Py_UNICODE) * (length + 
1));
     if (!unicode->str) {
        unicode->str = (Py_UNICODE *)oldstr;
         PyErr_NoMemory();

jnferguson · 2008-04-12T06:04:21Z

i pulled the Macros out of pymem.h in a Vanille 2.5.2?

jnferguson · 2008-04-12T06:16:06Z

sorry didnt mean to change components and version-- I'm typing this from
my phone and its being uncooperative at the moment

jnferguson · 2008-04-12T10:54:57Z

just fixing the modifications my phone made earlier tonight

jnferguson · 2008-04-12T12:13:55Z

Additionally-- the PyMem_NEW()/PyMem_New() macro's need to be fixed:

231 static
232 PyUnicodeObject *_PyUnicode_New(Py_ssize_t length)
233 {
234 register PyUnicodeObject *unicode;
235
236 /* Optimization for empty strings */
237 if (length == 0 && unicode_empty != NULL) {
238 Py_INCREF(unicode_empty);
239 return unicode_empty;
240 }
241
242 /* Unicode freelist & memory allocation */
243 if (unicode_freelist) {
244 unicode = unicode_freelist;
245 unicode_freelist = *(PyUnicodeObject **)unicode;
246 unicode_freelist_size--;
247 if (unicode->str) {
248 /* Keep-Alive optimization: we only upsize the buffer,
249 never downsize it. */
250 if ((unicode->length < length) &&
251 unicode_resize(unicode, length) < 0) {
252 PyMem_DEL(unicode->str);
253 goto onError;
254 }
255 }
256 else {
257 unicode->str = PyMem_NEW(Py_UNICODE, length + 1);
258 }

85 #define PyMem_New(type, n) \
86 ( assert((n) <= PY_SIZE_MAX / sizeof(type)) , \
87 ( (type *) PyMem_Malloc((n) * sizeof(type)) ) )
88 #define PyMem_NEW(type, n) \
89 ( assert((n) <= PY_SIZE_MAX / sizeof(type)) , \
90 ( (type *) PyMem_MALLOC((n) * sizeof(type)) ) )
91
92 #define PyMem_Resize(p, type, n) \
93 ( assert((n) <= PY_SIZE_MAX / sizeof(type)) , \
94 ( (p) = (type *) PyMem_Realloc((p), (n) * sizeof(type)) ) )
95 #define PyMem_RESIZE(p, type, n) \
96 ( assert((n) <= PY_SIZE_MAX / sizeof(type)) , \
97 ( (p) = (type *) PyMem_REALLOC((p), (n) * sizeof(type)) ) )

It looks like much of this is reachable from modules, unicode objects,
dict objects, set objects, et cetera. Also, if a 2G string across the
network seems unrealistic consider what 2 billion A's compresses down
to.(Macros again taken from 2.5.2 vanilla)

malemburg · 2008-04-13T12:22:16Z

On 32-bit platforms, it's probably best to add a size check. I don't
it's worth doing that on 64-bit platforms - overflows are rather
unlikely there.

gpshead · 2008-04-14T03:12:42Z

Here's a patch that fixes this by making both Python's malloc and
realloc return NULL if (0 <= size <= PY_SSIZE_T_MAX).

A side effect of this is that strings on 32bit platforms can no longer
be allocated up to 2**31-1 in length as the malloc includes the internal
python object structure overhead. The maximum string size becomes
2147483609 with an optimized build on this system.

I do not think that is a problem. A 32-bit process by definition can
only ever have one such object allocated at a time anyways. ;)

any objections?

abalkin · 2008-04-14T03:29:33Z

On Sun, Apr 13, 2008 at 11:12 PM, Gregory P. Smith
<report@bugs.python.org> wrote:
..

Here's a patch that fixes this by making both Python's malloc and
realloc return NULL if (0 <= size <= PY_SSIZE_T_MAX).

This will not solve the original problem completely: multiplicative
overflow may produce size in the 0 to PY_SSIZE_T_MAX range.
Furthemore, malloc and realloc take unsigned arguments and I believe
there are cases when they are called with unsigned arguments in python
code. Using the proposed macro definitions in these cases will lead
to compiler warnings.

I don't object to limiting the allowed malloc/realoc size, but the
check should be expressed as unsigned comparison: (size_t)(n) >
(size_t)PY_SSIZE_T_MAX and multiplications by n > 2 should still be
checked for overflow before the result can be used for malloc.

gpshead · 2008-07-06T06:42:28Z

here's an updated patch. It makes PyMem_NEW and PyMem_RESIZE macros
always do explicit an overflow check before doing the multiplication.

Uses of the the macros have been cleaned up in the couple places I
noticed that would leak memory or corrupt their own state by replacing
the original pointer to their memory with NULL on error before raising
MemoryError. This bug was already present in the existing code if
realloc ever returned NULL.

(IMHO PyMem_RESIZE & PyMem_Resize are a poorly designed macros. The
blind pointer assignment should never have been done within the macro.
But given that it is exposed as an API and presumably used by third
party extension modules the broken API must be maintained.)

gpshead · 2008-07-19T23:17:52Z

diff up for review on http://codereview.appspot.com/2599

malemburg · 2008-07-21T11:52:01Z

The patch looks good to me.

gpshead · 2008-07-22T04:47:46Z

Commited to trunk. r65182.

This still needs backporting to release25-maint. (and release24-maint
if anyone is maintaining that)

nnorwitz · 2008-07-28T05:26:48Z

Committed revision 65261 for 2.5
Committed revision 65262 for 2.4.

boya · 2009-09-01T22:59:04Z

In Python/pyarena.c:

block_new(size_t size)
{
	/* Allocate header and block as one unit.
	   ab_mem points just past header. */
	block *b = (block *)malloc(sizeof(block) + size);
        ...
}

Should a check for overflow of "size" also be performed before calling
"malloc"?

terryjreedy · 2010-05-11T21:01:12Z

Neal committed changes for 2.4,2.5, so I removed those.
3.0 is dead. Is this an issue for 3.1,3.2 or should it be closed?

terryjreedy · 2010-06-09T21:34:23Z

Brett, open and fixed are contradictory? for what version did you reopen this?

jnferguson mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-security A security issue labels Apr 11, 2008

jnferguson mannequin added the stdlib Python modules in the Lib dir label Apr 12, 2008

jnferguson mannequin removed the stdlib Python modules in the Lib dir label Apr 12, 2008

gpshead self-assigned this Apr 14, 2008

nnorwitz mannequin closed this as completed Jul 28, 2008

brettcannon reopened this Sep 2, 2009

terryjreedy closed this as completed Aug 3, 2010

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple buffer overflows in unicode processing #46872

Multiple buffer overflows in unicode processing #46872

jnferguson mannequin commented Apr 11, 2008

jnferguson mannequin commented Apr 11, 2008

malemburg commented Apr 11, 2008

jnferguson mannequin commented Apr 12, 2008

abalkin commented Apr 12, 2008

abalkin commented Apr 12, 2008

abalkin commented Apr 12, 2008

jnferguson mannequin commented Apr 12, 2008

jnferguson mannequin commented Apr 12, 2008

jnferguson mannequin commented Apr 12, 2008

jnferguson mannequin commented Apr 12, 2008

malemburg commented Apr 13, 2008

gpshead commented Apr 14, 2008

abalkin commented Apr 14, 2008

gpshead commented Jul 6, 2008

gpshead commented Jul 19, 2008

malemburg commented Jul 21, 2008

gpshead commented Jul 22, 2008

nnorwitz mannequin commented Jul 28, 2008

boya mannequin commented Sep 1, 2009

terryjreedy commented May 11, 2010

terryjreedy commented Jun 9, 2010

Multiple buffer overflows in unicode processing #46872

Multiple buffer overflows in unicode processing #46872

Comments

jnferguson mannequin commented Apr 11, 2008

jnferguson mannequin commented Apr 11, 2008

malemburg commented Apr 11, 2008

jnferguson mannequin commented Apr 12, 2008

abalkin commented Apr 12, 2008

abalkin commented Apr 12, 2008

abalkin commented Apr 12, 2008

jnferguson mannequin commented Apr 12, 2008

jnferguson mannequin commented Apr 12, 2008

jnferguson mannequin commented Apr 12, 2008

jnferguson mannequin commented Apr 12, 2008

malemburg commented Apr 13, 2008

gpshead commented Apr 14, 2008

abalkin commented Apr 14, 2008

gpshead commented Jul 6, 2008

gpshead commented Jul 19, 2008

malemburg commented Jul 21, 2008

gpshead commented Jul 22, 2008

nnorwitz mannequin commented Jul 28, 2008

boya mannequin commented Sep 1, 2009

terryjreedy commented May 11, 2010

terryjreedy commented Jun 9, 2010