msg65379 - (view) |
Author: Justin Ferguson (jnferguson) |
Date: 2008-04-11 22:35 |
174 static
175 int unicode_resize(register PyUnicodeObject *unicode,
176 Py_ssize_t length)
177 {
[...]
201
202 oldstr = unicode->str;
203 PyMem_RESIZE(unicode->str, Py_UNICODE, length + 1);
[...]
209 unicode->str[length] = 0;
210 unicode->length = length;
211
95 #define PyMem_RESIZE(p, type, n) \
96 ( assert((n) <= PY_SIZE_MAX / sizeof(type)) , \
97 ( (p) = (type *) PyMem_REALLOC((p), (n) * sizeof(type)) ) )
The unicode_resize() function acts essentially as a wrapper to
realloc(), it accomplishes this via the PyMem_RESIZE() macro which
factors the size with the size of the type, in this case it multiplies
by two as Py_UNICODE is typedef'd to a wchar_t. When resizing large
strings, this results in an incorrect allocation that in turn leads to
buffer overflow.
This is specific to the Unicode objects, however I would not be
surprised to see that other types have this complication as well. Please
see attached proof of concepts.
|
msg65382 - (view) |
Author: Marc-Andre Lemburg (lemburg) * |
Date: 2008-04-11 23:29 |
You are probably referring to 32-bit platforms. At least on 64-bit
platforms, there's no problem with your test cases:
>>> # this is to get the unicode_freelist initialized
... # the length of the string must be <= 9 to keep
... # unicode->str from being deallocated and set to
... # NULL
... bla = unicode('IOActive')
>>> del bla
>>>
>>>
>>> msg = 'A'*2147483647
>>>
>>> msg.decode('utf7')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
MemoryError
The code does check for success of the realloc():
PyMem_RESIZE(unicode->str, Py_UNICODE, length + 1);
if (!unicode->str) {
unicode->str = (Py_UNICODE *)oldstr;
PyErr_NoMemory();
return -1;
}
Are you after the integer overflow and the fact that realloc() would (if
possible) allocate a buffer smaller than needed ?
|
msg65384 - (view) |
Author: Justin Ferguson (jnferguson) |
Date: 2008-04-12 00:20 |
Yes, excuse me-- this should be 32-bit specific as I believe Python will
not let me get a string long enough to overflow the integer on 64-bit.
It's a big string, the only realistic scenario I can see is XML parsing
or similar.
theory$ ./python -V
Python 2.5.2
theory$ cat /proc/cpuinfo | grep -i model
model : 4
model name : Intel(R) Pentium(R) 4 CPU 3.00GHz
theory$ ./python python-2.5.2-unicode_resize-utf7.py
Segmentation fault
|
msg65386 - (view) |
Author: Alexander Belopolsky (belopolsky) * |
Date: 2008-04-12 03:26 |
Note that in r61458 Neal replaced PyMem_RESIZE with a direct call to PyMem_REALLOC thus eliminating integer overflow check even from the debug
builds.
|
msg65387 - (view) |
Author: Alexander Belopolsky (belopolsky) * |
Date: 2008-04-12 03:42 |
Justin,
Where did you find the definition that you cited:
95 #define PyMem_RESIZE(p, type, n) \
96 ( assert((n) <= PY_SIZE_MAX / sizeof(type)) , \
97 ( (p) = (type *) PyMem_REALLOC((p), (n) * sizeof(type)) ) )
?
Current Include/pymem.h does not have the assert:
94 #define PyMem_RESIZE(p, type, n) \
95 ( (p) = (type *) PyMem_REALLOC((p), (n) * sizeof(type))
)
and it did not change for a while.
|
msg65389 - (view) |
Author: Alexander Belopolsky (belopolsky) * |
Date: 2008-04-12 04:18 |
The following simple change should be enough for this issue, but I would
consider implementing the overflow check in the PyMem_RESIZE and PyMem_NEW macros and de-deprecate their use.
===================================================================
--- Objects/unicodeobject.c (revision 62237)
+++ Objects/unicodeobject.c (working copy)
@@ -261,8 +261,8 @@
it contains). */
oldstr = unicode->str;
- unicode->str = PyObject_REALLOC(unicode->str,
- sizeof(Py_UNICODE) * (length + 1));
+ unicode->str = SIZE_MAX/sizeof(Py_UNICODE) - 1 < length ? NULL :
+ PyObject_REALLOC(unicode->str, sizeof(Py_UNICODE) * (length +
1));
if (!unicode->str) {
unicode->str = (Py_UNICODE *)oldstr;
PyErr_NoMemory();
|
msg65393 - (view) |
Author: Justin Ferguson (jnferguson) |
Date: 2008-04-12 06:04 |
i pulled the Macros out of pymem.h in a Vanille 2.5.2?
|
msg65395 - (view) |
Author: Justin Ferguson (jnferguson) |
Date: 2008-04-12 06:16 |
sorry didnt mean to change components and version-- I'm typing this from
my phone and its being uncooperative at the moment
|
msg65397 - (view) |
Author: Justin Ferguson (jnferguson) |
Date: 2008-04-12 10:54 |
just fixing the modifications my phone made earlier tonight
|
msg65398 - (view) |
Author: Justin Ferguson (jnferguson) |
Date: 2008-04-12 12:13 |
Additionally-- the PyMem_NEW()/PyMem_New() macro's need to be fixed:
231 static
232 PyUnicodeObject *_PyUnicode_New(Py_ssize_t length)
233 {
234 register PyUnicodeObject *unicode;
235
236 /* Optimization for empty strings */
237 if (length == 0 && unicode_empty != NULL) {
238 Py_INCREF(unicode_empty);
239 return unicode_empty;
240 }
241
242 /* Unicode freelist & memory allocation */
243 if (unicode_freelist) {
244 unicode = unicode_freelist;
245 unicode_freelist = *(PyUnicodeObject **)unicode;
246 unicode_freelist_size--;
247 if (unicode->str) {
248 /* Keep-Alive optimization: we only upsize the buffer,
249 never downsize it. */
250 if ((unicode->length < length) &&
251 unicode_resize(unicode, length) < 0) {
252 PyMem_DEL(unicode->str);
253 goto onError;
254 }
255 }
256 else {
257 unicode->str = PyMem_NEW(Py_UNICODE, length + 1);
258 }
85 #define PyMem_New(type, n) \
86 ( assert((n) <= PY_SIZE_MAX / sizeof(type)) , \
87 ( (type *) PyMem_Malloc((n) * sizeof(type)) ) )
88 #define PyMem_NEW(type, n) \
89 ( assert((n) <= PY_SIZE_MAX / sizeof(type)) , \
90 ( (type *) PyMem_MALLOC((n) * sizeof(type)) ) )
91
92 #define PyMem_Resize(p, type, n) \
93 ( assert((n) <= PY_SIZE_MAX / sizeof(type)) , \
94 ( (p) = (type *) PyMem_Realloc((p), (n) * sizeof(type)) ) )
95 #define PyMem_RESIZE(p, type, n) \
96 ( assert((n) <= PY_SIZE_MAX / sizeof(type)) , \
97 ( (p) = (type *) PyMem_REALLOC((p), (n) * sizeof(type)) ) )
It looks like much of this is reachable from modules, unicode objects,
dict objects, set objects, et cetera. Also, if a 2G string across the
network seems unrealistic consider what 2 billion A's compresses down
to.(Macros again taken from 2.5.2 vanilla)
|
msg65441 - (view) |
Author: Marc-Andre Lemburg (lemburg) * |
Date: 2008-04-13 12:22 |
On 32-bit platforms, it's probably best to add a size check. I don't
it's worth doing that on 64-bit platforms - overflows are rather
unlikely there.
|
msg65457 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2008-04-14 03:12 |
Here's a patch that fixes this by making both Python's malloc and
realloc return NULL if (0 <= size <= PY_SSIZE_T_MAX).
A side effect of this is that strings on 32bit platforms can no longer
be allocated up to 2**31-1 in length as the malloc includes the internal
python object structure overhead. The maximum string size becomes
2147483609 with an optimized build on this system.
I do not think that is a problem. A 32-bit process by definition can
only ever have one such object allocated at a time anyways. ;)
any objections?
|
msg65458 - (view) |
Author: Alexander Belopolsky (belopolsky) * |
Date: 2008-04-14 03:29 |
On Sun, Apr 13, 2008 at 11:12 PM, Gregory P. Smith
<report@bugs.python.org> wrote:
..
> Here's a patch that fixes this by making both Python's malloc and
> realloc return NULL if (0 <= size <= PY_SSIZE_T_MAX).
>
This will not solve the original problem completely: multiplicative
overflow may produce size in the 0 to PY_SSIZE_T_MAX range.
Furthemore, malloc and realloc take unsigned arguments and I believe
there are cases when they are called with unsigned arguments in python
code. Using the proposed macro definitions in these cases will lead
to compiler warnings.
I don't object to limiting the allowed malloc/realoc size, but the
check should be expressed as unsigned comparison: (size_t)(n) >
(size_t)PY_SSIZE_T_MAX and multiplications by n > 2 should still be
checked for overflow before the result can be used for malloc.
|
msg69319 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2008-07-06 06:42 |
here's an updated patch. It makes PyMem_NEW and PyMem_RESIZE macros
always do explicit an overflow check before doing the multiplication.
Uses of the the macros have been cleaned up in the couple places I
noticed that would leak memory or corrupt their own state by replacing
the original pointer to their memory with NULL on error before raising
MemoryError. This bug was already present in the existing code if
realloc ever returned NULL.
(IMHO PyMem_RESIZE & PyMem_Resize are a poorly designed macros. The
blind pointer assignment should never have been done within the macro.
But given that it is exposed as an API and presumably used by third
party extension modules the broken API must be maintained.)
|
msg70061 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2008-07-19 23:17 |
diff up for review on http://codereview.appspot.com/2599
|
msg70103 - (view) |
Author: Marc-Andre Lemburg (lemburg) * |
Date: 2008-07-21 11:52 |
The patch looks good to me.
|
msg70137 - (view) |
Author: Gregory P. Smith (gregory.p.smith) * |
Date: 2008-07-22 04:47 |
Commited to trunk. r65182.
This still needs backporting to release25-maint. (and release24-maint
if anyone is maintaining that)
|
msg70337 - (view) |
Author: Neal Norwitz (nnorwitz) * |
Date: 2008-07-28 05:26 |
Committed revision 65261 for 2.5
Committed revision 65262 for 2.4.
|
msg92146 - (view) |
Author: Boya Sun (boya) |
Date: 2009-09-01 22:59 |
In Python/pyarena.c:
block_new(size_t size)
{
/* Allocate header and block as one unit.
ab_mem points just past header. */
block *b = (block *)malloc(sizeof(block) + size);
...
}
Should a check for overflow of "size" also be performed before calling
"malloc"?
|
msg105545 - (view) |
Author: Terry J. Reedy (terry.reedy) * |
Date: 2010-05-11 21:01 |
Neal committed changes for 2.4,2.5, so I removed those.
3.0 is dead. Is this an issue for 3.1,3.2 or should it be closed?
|
msg107420 - (view) |
Author: Terry J. Reedy (terry.reedy) * |
Date: 2010-06-09 21:34 |
Brett, open and fixed are contradictory? for what version did you reopen this?
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:33 | admin | set | github: 46872 |
2010-08-03 18:06:26 | terry.reedy | set | status: open -> closed versions:
- Python 2.4, Python 3.0 |
2010-06-09 21:34:23 | terry.reedy | set | messages:
+ msg107420 |
2010-05-12 11:54:48 | r.david.murray | set | nosy:
+ vstinner
|
2010-05-11 21:01:12 | terry.reedy | set | nosy:
+ terry.reedy messages:
+ msg105545
|
2009-09-02 20:42:30 | brett.cannon | set | status: closed -> open |
2009-09-01 22:59:04 | boya | set | nosy:
+ boya messages:
+ msg92146
|
2008-09-17 01:05:11 | brett.cannon | set | resolution: fixed |
2008-07-28 05:26:53 | nnorwitz | set | status: open -> closed messages:
+ msg70337 |
2008-07-22 04:47:46 | gregory.p.smith | set | keywords:
+ patch messages:
+ msg70137 versions:
+ Python 3.0, - Python 2.6 |
2008-07-21 11:52:02 | lemburg | set | messages:
+ msg70103 |
2008-07-19 23:17:52 | gregory.p.smith | set | messages:
+ msg70061 |
2008-07-06 06:42:31 | gregory.p.smith | set | files:
+ issue2620-gps02-patch.txt messages:
+ msg69319 |
2008-07-06 05:37:19 | gregory.p.smith | set | files:
- issue2620-gps01-patch.txt |
2008-04-14 03:29:33 | belopolsky | set | messages:
+ msg65458 |
2008-04-14 03:12:43 | gregory.p.smith | set | files:
+ issue2620-gps01-patch.txt messages:
+ msg65457 |
2008-04-14 01:08:58 | gregory.p.smith | set | assignee: gregory.p.smith versions:
+ Python 2.6 |
2008-04-13 12:22:16 | lemburg | set | messages:
+ msg65441 |
2008-04-12 12:13:56 | jnferguson | set | messages:
+ msg65398 |
2008-04-12 10:54:57 | jnferguson | set | messages:
+ msg65397 components:
- Library (Lib), None versions:
- Python 3.0 |
2008-04-12 06:16:05 | jnferguson | set | messages:
+ msg65395 components:
+ Library (Lib) |
2008-04-12 06:04:21 | jnferguson | set | messages:
+ msg65393 components:
+ None versions:
+ Python 3.0 |
2008-04-12 04:18:46 | belopolsky | set | messages:
+ msg65389 |
2008-04-12 03:55:42 | gregory.p.smith | set | priority: high nosy:
+ gregory.p.smith versions:
+ Python 2.4 |
2008-04-12 03:42:10 | belopolsky | set | messages:
+ msg65387 |
2008-04-12 03:26:51 | belopolsky | set | nosy:
+ belopolsky, nnorwitz messages:
+ msg65386 |
2008-04-12 00:20:34 | jnferguson | set | messages:
+ msg65384 |
2008-04-11 23:29:29 | lemburg | set | nosy:
+ lemburg messages:
+ msg65382 |
2008-04-11 22:36:03 | jnferguson | set | files:
+ python-2.5.2-unicode_resize-utf16.py |
2008-04-11 22:35:51 | jnferguson | set | files:
+ python-2.5.2-unicode_resize-utf8.py |
2008-04-11 22:35:37 | jnferguson | create | |