This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Multiple buffer overflows in unicode processing
Type: security Stage:
Components: Interpreter Core Versions: Python 2.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: gregory.p.smith Nosy List: belopolsky, boya, gregory.p.smith, jnferguson, lemburg, nnorwitz, terry.reedy, vstinner
Priority: high Keywords: patch

Created on 2008-04-11 22:35 by jnferguson, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
python-2.5.2-unicode_resize-utf7.py jnferguson, 2008-04-11 22:35
python-2.5.2-unicode_resize-utf8.py jnferguson, 2008-04-11 22:35
python-2.5.2-unicode_resize-utf16.py jnferguson, 2008-04-11 22:36
issue2620-gps02-patch.txt gregory.p.smith, 2008-07-06 06:42 always check for overflow in _New and _Resize
Messages (21)
msg65379 - (view) Author: Justin Ferguson (jnferguson) Date: 2008-04-11 22:35
174 static
 175 int unicode_resize(register PyUnicodeObject *unicode,
 176                       Py_ssize_t length)
 177 {
 [...]
 201 
 202     oldstr = unicode->str;
 203     PyMem_RESIZE(unicode->str, Py_UNICODE, length + 1);
[...]
 209     unicode->str[length] = 0;
 210     unicode->length = length;
 211  

 95 #define PyMem_RESIZE(p, type, n) \
 96   ( assert((n) <= PY_SIZE_MAX / sizeof(type)) , \
 97         ( (p) = (type *) PyMem_REALLOC((p), (n) * sizeof(type)) ) )

The unicode_resize() function acts essentially as a wrapper to
realloc(), it accomplishes this via the PyMem_RESIZE() macro which
factors the size with the size of the type, in this case it multiplies
by two as Py_UNICODE is typedef'd to a wchar_t. When resizing large
strings, this results in an incorrect allocation that in turn leads to
buffer overflow.

This is specific to the Unicode objects, however I would not be
surprised to see that other types have this complication as well. Please
see attached proof of concepts.
msg65382 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2008-04-11 23:29
You are probably referring to 32-bit platforms. At least on 64-bit
platforms, there's no problem with your test cases:

>>> # this is to get the unicode_freelist initialized
... # the length of the string must be <= 9 to keep
... # unicode->str from being deallocated and set to
... # NULL
... bla = unicode('IOActive')
>>> del bla
>>>
>>>
>>> msg = 'A'*2147483647
>>>
>>> msg.decode('utf7')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError

The code does check for success of the realloc():

    PyMem_RESIZE(unicode->str, Py_UNICODE, length + 1);
    if (!unicode->str) {
	unicode->str = (Py_UNICODE *)oldstr;
        PyErr_NoMemory();
        return -1;
    }

Are you after the integer overflow and the fact that realloc() would (if
possible) allocate a buffer smaller than needed ?
msg65384 - (view) Author: Justin Ferguson (jnferguson) Date: 2008-04-12 00:20
Yes, excuse me-- this should be 32-bit specific as I believe Python will
not let me get a string long enough to overflow the integer on 64-bit.
It's a big string, the only realistic scenario I can see is XML parsing
or similar.

theory$ ./python -V
Python 2.5.2
theory$ cat /proc/cpuinfo | grep -i model
model           : 4
model name      : Intel(R) Pentium(R) 4 CPU 3.00GHz
theory$ ./python python-2.5.2-unicode_resize-utf7.py
Segmentation fault
msg65386 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2008-04-12 03:26
Note that in r61458 Neal replaced PyMem_RESIZE with a direct call to PyMem_REALLOC thus eliminating integer overflow check even from the debug 
builds.
msg65387 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2008-04-12 03:42
Justin,

Where did you find the definition that you cited:

 95 #define PyMem_RESIZE(p, type, n) \
 96   ( assert((n) <= PY_SIZE_MAX / sizeof(type)) , \
 97         ( (p) = (type *) PyMem_REALLOC((p), (n) * sizeof(type)) ) )

?

Current Include/pymem.h does not have the assert:

     94 #define PyMem_RESIZE(p, type, n) \
     95         ( (p) = (type *) PyMem_REALLOC((p), (n) * sizeof(type)) 
)

and it did not change for a while.
msg65389 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2008-04-12 04:18
The following simple change should be enough for this issue, but I would 
consider implementing the overflow check in the PyMem_RESIZE and PyMem_NEW macros and de-deprecate their use.

===================================================================
--- Objects/unicodeobject.c     (revision 62237)
+++ Objects/unicodeobject.c     (working copy)
@@ -261,8 +261,8 @@
        it contains). */
 
     oldstr = unicode->str;
-    unicode->str = PyObject_REALLOC(unicode->str,
-                                   sizeof(Py_UNICODE) * (length + 1));
+    unicode->str = SIZE_MAX/sizeof(Py_UNICODE) - 1 < length ? NULL :
+        PyObject_REALLOC(unicode->str, sizeof(Py_UNICODE) * (length + 
1));
     if (!unicode->str) {
        unicode->str = (Py_UNICODE *)oldstr;
         PyErr_NoMemory();
msg65393 - (view) Author: Justin Ferguson (jnferguson) Date: 2008-04-12 06:04
i pulled the Macros out of pymem.h in a Vanille 2.5.2?
msg65395 - (view) Author: Justin Ferguson (jnferguson) Date: 2008-04-12 06:16
sorry didnt mean to change components and version-- I'm typing this from 
my phone and its being uncooperative at the moment
msg65397 - (view) Author: Justin Ferguson (jnferguson) Date: 2008-04-12 10:54
just fixing the modifications my phone made earlier tonight
msg65398 - (view) Author: Justin Ferguson (jnferguson) Date: 2008-04-12 12:13
Additionally-- the PyMem_NEW()/PyMem_New() macro's need to be fixed:

 231 static
 232 PyUnicodeObject *_PyUnicode_New(Py_ssize_t length)
 233 {
 234     register PyUnicodeObject *unicode;
 235 
 236     /* Optimization for empty strings */
 237     if (length == 0 && unicode_empty != NULL) {
 238         Py_INCREF(unicode_empty);
 239         return unicode_empty;
 240     }
 241 
 242     /* Unicode freelist & memory allocation */
 243     if (unicode_freelist) {
 244         unicode = unicode_freelist;
 245         unicode_freelist = *(PyUnicodeObject **)unicode;
 246         unicode_freelist_size--;
 247         if (unicode->str) {
 248             /* Keep-Alive optimization: we only upsize the buffer,
 249                never downsize it. */
 250             if ((unicode->length < length) &&
 251                 unicode_resize(unicode, length) < 0) {
 252                 PyMem_DEL(unicode->str);
 253                 goto onError;
 254             }
 255         }
 256         else {
 257             unicode->str = PyMem_NEW(Py_UNICODE, length + 1);
 258         }

 85 #define PyMem_New(type, n) \
 86   ( assert((n) <= PY_SIZE_MAX / sizeof(type)) , \
 87         ( (type *) PyMem_Malloc((n) * sizeof(type)) ) )
 88 #define PyMem_NEW(type, n) \
 89   ( assert((n) <= PY_SIZE_MAX / sizeof(type)) , \
 90         ( (type *) PyMem_MALLOC((n) * sizeof(type)) ) )
 91 
 92 #define PyMem_Resize(p, type, n) \
 93   ( assert((n) <= PY_SIZE_MAX / sizeof(type)) , \
 94         ( (p) = (type *) PyMem_Realloc((p), (n) * sizeof(type)) ) )
 95 #define PyMem_RESIZE(p, type, n) \
 96   ( assert((n) <= PY_SIZE_MAX / sizeof(type)) , \
 97         ( (p) = (type *) PyMem_REALLOC((p), (n) * sizeof(type)) ) )


It looks like much of this is reachable from modules, unicode objects,
dict objects, set objects, et cetera. Also, if a 2G string across the
network seems unrealistic consider what 2 billion A's compresses down
to.(Macros again taken from 2.5.2 vanilla)
msg65441 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2008-04-13 12:22
On 32-bit platforms, it's probably best to add a size check. I don't
it's worth doing that on 64-bit platforms - overflows are rather
unlikely there.
msg65457 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-04-14 03:12
Here's a patch that fixes this by making both Python's malloc and
realloc return NULL if (0 <= size <= PY_SSIZE_T_MAX).

A side effect of this is that strings on 32bit platforms can no longer
be allocated up to 2**31-1 in length as the malloc includes the internal
python object structure overhead.  The maximum string size becomes
2147483609 with an optimized build on this system.

I do not think that is a problem.  A 32-bit process by definition can
only ever have one such object allocated at a time anyways. ;)

any objections?
msg65458 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2008-04-14 03:29
On Sun, Apr 13, 2008 at 11:12 PM, Gregory P. Smith
<report@bugs.python.org> wrote:
..
>  Here's a patch that fixes this by making both Python's malloc and
>  realloc return NULL if (0 <= size <= PY_SSIZE_T_MAX).
>
This will not solve the original problem completely: multiplicative
overflow may produce size in the 0 to PY_SSIZE_T_MAX range.
Furthemore, malloc and realloc take unsigned arguments and I believe
there are cases when they are called with unsigned arguments in python
code.  Using the proposed macro definitions in these cases will lead
to compiler warnings.

I don't object to limiting the allowed malloc/realoc size, but the
check should be expressed as unsigned comparison:  (size_t)(n) >
(size_t)PY_SSIZE_T_MAX and multiplications by n > 2 should still be
checked for overflow before the result can be used for malloc.
msg69319 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-07-06 06:42
here's an updated patch.  It makes PyMem_NEW and PyMem_RESIZE macros
always do explicit an overflow check before doing the multiplication.

Uses of the the macros have been cleaned up in the couple places I
noticed that would leak memory or corrupt their own state by replacing
the original pointer to their memory with NULL on error before raising
MemoryError.  This bug was already present in the existing code if
realloc ever returned NULL.

(IMHO PyMem_RESIZE & PyMem_Resize are a poorly designed macros.  The
blind pointer assignment should never have been done within the macro. 
But given that it is exposed as an API and presumably used by third
party extension modules the broken API must be maintained.)
msg70061 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-07-19 23:17
diff up for review on http://codereview.appspot.com/2599
msg70103 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2008-07-21 11:52
The patch looks good to me.
msg70137 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2008-07-22 04:47
Commited to trunk.  r65182.

This still needs backporting to release25-maint.  (and release24-maint
if anyone is maintaining that)
msg70337 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2008-07-28 05:26
Committed revision 65261 for 2.5
Committed revision 65262 for 2.4.
msg92146 - (view) Author: Boya Sun (boya) Date: 2009-09-01 22:59
In Python/pyarena.c:

block_new(size_t size)
{
	/* Allocate header and block as one unit.
	   ab_mem points just past header. */
	block *b = (block *)malloc(sizeof(block) + size);
        ...
}

Should a check for overflow of "size" also be performed before calling
"malloc"?
msg105545 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-05-11 21:01
Neal committed changes for 2.4,2.5, so I removed those.
3.0 is dead. Is this an issue for 3.1,3.2 or should it be closed?
msg107420 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-06-09 21:34
Brett, open and fixed are contradictory? for what version did you reopen this?
History
Date User Action Args
2022-04-11 14:56:33adminsetgithub: 46872
2010-08-03 18:06:26terry.reedysetstatus: open -> closed
versions: - Python 2.4, Python 3.0
2010-06-09 21:34:23terry.reedysetmessages: + msg107420
2010-05-12 11:54:48r.david.murraysetnosy: + vstinner
2010-05-11 21:01:12terry.reedysetnosy: + terry.reedy
messages: + msg105545
2009-09-02 20:42:30brett.cannonsetstatus: closed -> open
2009-09-01 22:59:04boyasetnosy: + boya
messages: + msg92146
2008-09-17 01:05:11brett.cannonsetresolution: fixed
2008-07-28 05:26:53nnorwitzsetstatus: open -> closed
messages: + msg70337
2008-07-22 04:47:46gregory.p.smithsetkeywords: + patch
messages: + msg70137
versions: + Python 3.0, - Python 2.6
2008-07-21 11:52:02lemburgsetmessages: + msg70103
2008-07-19 23:17:52gregory.p.smithsetmessages: + msg70061
2008-07-06 06:42:31gregory.p.smithsetfiles: + issue2620-gps02-patch.txt
messages: + msg69319
2008-07-06 05:37:19gregory.p.smithsetfiles: - issue2620-gps01-patch.txt
2008-04-14 03:29:33belopolskysetmessages: + msg65458
2008-04-14 03:12:43gregory.p.smithsetfiles: + issue2620-gps01-patch.txt
messages: + msg65457
2008-04-14 01:08:58gregory.p.smithsetassignee: gregory.p.smith
versions: + Python 2.6
2008-04-13 12:22:16lemburgsetmessages: + msg65441
2008-04-12 12:13:56jnfergusonsetmessages: + msg65398
2008-04-12 10:54:57jnfergusonsetmessages: + msg65397
components: - Library (Lib), None
versions: - Python 3.0
2008-04-12 06:16:05jnfergusonsetmessages: + msg65395
components: + Library (Lib)
2008-04-12 06:04:21jnfergusonsetmessages: + msg65393
components: + None
versions: + Python 3.0
2008-04-12 04:18:46belopolskysetmessages: + msg65389
2008-04-12 03:55:42gregory.p.smithsetpriority: high
nosy: + gregory.p.smith
versions: + Python 2.4
2008-04-12 03:42:10belopolskysetmessages: + msg65387
2008-04-12 03:26:51belopolskysetnosy: + belopolsky, nnorwitz
messages: + msg65386
2008-04-12 00:20:34jnfergusonsetmessages: + msg65384
2008-04-11 23:29:29lemburgsetnosy: + lemburg
messages: + msg65382
2008-04-11 22:36:03jnfergusonsetfiles: + python-2.5.2-unicode_resize-utf16.py
2008-04-11 22:35:51jnfergusonsetfiles: + python-2.5.2-unicode_resize-utf8.py
2008-04-11 22:35:37jnfergusoncreate