msg103333 - (view) |
Author: Evgeny Kapun (abacabadabacaba) |
Date: 2010-04-16 15:43 |
I've noticed that set_lookkey (in Objects/setobject.c) does some unsafe things:
Objects/setobject.c:
> if (entry->hash == hash) {
> startkey = entry->key;
> Py_INCREF(startkey);
> cmp = PyObject_RichCompareBool(startkey, key, Py_EQ);
> Py_DECREF(startkey);
At this point, object pointed to by startkey could be deallocated, and then new object may be allocated at the same address.
> if (cmp < 0)
> return NULL;
> if (table == so->table && entry->key == startkey) {
At this point, the table may be reallocated at the same address but with different (possibly smaller) size, so entry->key may be in deallocated memory. Also, entry->key may be equal to startkey but still point to an object other than one key was compared with.
> if (cmp > 0)
> return entry;
> }
> else {
> /* The compare did major nasty stuff to the
> * set: start over.
> */
> return set_lookkey(so, key, hash);
This can lead to infinite recursion.
> }
|
msg103424 - (view) |
Author: Evgeny Kapun (abacabadabacaba) |
Date: 2010-04-17 19:05 |
I've found more unsafe code in Objects/setobject.c.
This code makes Python 3.1.2 segfault by using a bug in function set_merge:
class bad:
def __eq__(self, other):
if be_bad:
set2.clear()
raise Exception
return self is other
def __hash__(self):
return 0
be_bad = False
set1 = {bad()}
set2 = {bad() for i in range(2000)}
be_bad = True
set1.update(set2)
Function set_symmetric_difference_update has a similar bug.
Another bug in set_symmetric_difference_update:
class bad:
def __init__(self):
print("Creating", id(self))
def __del__(self):
print("Deleting", id(self))
def __eq__(self, other):
print("Comparing", id(self), "and", id(other))
if be_bad:
dict2.clear()
return self is other
def __hash__(self):
return 0
be_bad = False
set1 = {bad()}
dict2 = {bad(): None}
be_bad = True
set1.symmetric_difference_update(dict2)
|
msg103473 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2010-04-18 08:46 |
Attaching a patch for set_lookkey() that does a DECREF only when the refcnt > 1 so that the DECREF won't trigger any state changes.
The merge crasher still needs a patch.
|
msg103486 - (view) |
Author: Evgeny Kapun (abacabadabacaba) |
Date: 2010-04-18 12:42 |
This patch still assumes that if so->table didn't change then the table wasn't reallocated (see http://en.wikipedia.org/wiki/ABA_problem). One solution is to check that so->mask didn't change as well. Also, checking that refcnt > 1 is redundant because if entry->key == startkey then there are at least two references: one from entry->key and another from startkey.
These functions have a bug that may cause them to refer to deallocated memory when both arguments are sets: set_intersection, set_isdisjoint, set_difference_update_internal, set_difference, set_symmetric_difference_update, set_issubset.
These functions may also do the same if the first argument is a set and the second argument is a dict: set_difference, set_symmetric_difference_update.
Bugs in set_repr:
> keys = PySequence_List((PyObject *)so);
> if (keys == NULL)
> goto done;
>
> listrepr = PyObject_Repr(keys);
> Py_DECREF(keys);
List pointed to by keys is already deallocated at this point.
> if (listrepr == NULL) {
> Py_DECREF(keys);
But this code tries to DECREF it.
> goto done;
> }
> newsize = PyUnicode_GET_SIZE(listrepr);
> result = PyUnicode_FromUnicode(NULL, newsize);
> if (result) {
> u = PyUnicode_AS_UNICODE(result);
> *u++ = '{';
> /* Omit the brackets from the listrepr */
> Py_UNICODE_COPY(u, PyUnicode_AS_UNICODE(listrepr)+1,
> PyUnicode_GET_SIZE(listrepr)-2);
> u += newsize-2;
> *u++ = '}';
> }
> Py_DECREF(listrepr);
> if (Py_TYPE(so) != &PySet_Type) {
result may be NULL here.
> PyObject *tmp = PyUnicode_FromFormat("%s(%U)",
> Py_TYPE(so)->tp_name,
> result);
I think PyUnicode_FromFormat won't like it.
> Py_DECREF(result);
> result = tmp;
> }
|
msg103505 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2010-04-18 17:33 |
> One solution is to check that so->mask didn't
> change as well.
I saw that and agree it would make a tighter check, but haven't convinced myself that it is necessary.
> Also, checking that refcnt > 1 is redundant
> because if entry->key == startkey then there
> are at least two references: one from entry->key
> and another from startkey.
It is a meaningful check. We have our own INCREF
and one for the key being in the table. If the
count is 1, then it means that the comparison
check deleted the key from the table or replaced
its value (see issue 1517).
|
msg103510 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2010-04-18 18:21 |
>Bugs in set_repr:
>> keys = PySequence_List((PyObject *)so);
>> if (keys == NULL)
>> goto done;
>>
>> listrepr = PyObject_Repr(keys);
>> Py_DECREF(keys);
>List pointed to by keys is already deallocated at this point.
>> if (listrepr == NULL) {
>> Py_DECREF(keys);
>>But this code tries to DECREF it.
>> goto done;
>> }
I don't follow why you think keys is already deallocated.
When assigned by PySequence_List() without a NULL return, the refcnt is one. The call to PyObject_Repr(keys) does not change the refcnt of keys,
so the Py_DECREF(keys) is correct.
|
msg103516 - (view) |
Author: Evgeny Kapun (abacabadabacaba) |
Date: 2010-04-18 19:11 |
> > One solution is to check that so->mask didn't
> > change as well.
>
> I saw that and agree it would make a tighter check, but haven't convinced myself that it is necessary.
Without this check, it is possible that the comparison shrinks the table, so entry becomes out of bounds. However, if both so->table and so->mask didn't change then entry is still a pointer to one of table elements so it can be used safely.
> > Also, checking that refcnt > 1 is redundant
> > because if entry->key == startkey then there
> > are at least two references: one from entry->key
> > and another from startkey.
>
> It is a meaningful check. We have our own INCREF
> and one for the key being in the table. If the
> count is 1, then it means that the comparison
> check deleted the key from the table or replaced
> its value (see issue 1517).
If the comparison deleted or changed the key then the check entry->key == startkey would fail so refcnt check won't be reached. Checking refcounts is also bad because someone else may have references to the key.
> I don't follow why you think keys is already deallocated.
> When assigned by PySequence_List() without a NULL return, the refcnt is one. The call to PyObject_Repr(keys) does not change the refcnt of keys,
> so the Py_DECREF(keys) is correct.
Look at the code again. If listrepr happens to be NULL, you do Py_DECREF(keys) twice (this bug is only present in py3k branch).
|
msg103524 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2010-04-18 20:30 |
Fixed set_repr() issue in r80197 and r80196. Looks like someone futzed the unicode updates for 3.x.
|
msg103541 - (view) |
Author: Evgeny Kapun (abacabadabacaba) |
Date: 2010-04-18 22:25 |
This code crashes python by using another bug in set_repr. This only affects py3k. This code relies on out-of-memory condition, so run it like:
$ (ulimit -v 65536 && python3 test.py)
Otherwise, it will eat all your free memory before crashing.
val = "a" * 10000
class big:
def __repr__(self):
return val
i = 16
while True:
repr(frozenset(big() for j in range(i)))
i = (i * 5) >> 2
|
msg103543 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2010-04-18 22:28 |
Patch please.
|
msg115430 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2010-09-03 10:07 |
Fixed in r84447
|
msg115667 - (view) |
Author: Éric Araujo (eric.araujo) * |
Date: 2010-09-05 19:23 |
I’m doing the backport, I’ll assign back to Raymond when the patch is ready.
|
msg131610 - (view) |
Author: Éric Araujo (eric.araujo) * |
Date: 2011-03-21 02:37 |
I was able to reproduce the crash in 2.7 and fix it with the patch. I only had to remove a cast to long to make the patch apply. I also changed tabs to spaces in two lines.
Please review and assign back to me for commit. :)
|
msg131757 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2011-03-22 17:37 |
This looks okay.
On the first lines for set_merge() where there is:
key = entry->key;
also do:
hash = entry->key;
so that the two get handled in a parallel fashion.
Thanks for doing the backporting.
|
msg131783 - (view) |
Author: Éric Araujo (eric.araujo) * |
Date: 2011-03-22 21:20 |
I assume you meant “hash = entry->hash”, not “entry->key”. Updated patch attached.
|
msg131784 - (view) |
Author: Éric Araujo (eric.araujo) * |
Date: 2011-03-22 21:22 |
BTW, set_add_entry still uses entry->hash.
|
msg131789 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2011-03-22 21:34 |
The revised patch looks good.
Please make the same change for set_add_entry.
If the tests run, go ahead and apply.
|
msg131801 - (view) |
Author: Éric Araujo (eric.araujo) * |
Date: 2011-03-22 22:49 |
Is it a candidate for 3.1 too?
|
msg131804 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2011-03-22 23:06 |
Sure, go ahead and apply to 3.1
|
msg131821 - (view) |
Author: Éric Araujo (eric.araujo) * |
Date: 2011-03-23 00:20 |
Here is a part of my 3.1 patch:
+ PyObject *key = entry->key;
+ long hash = entry->hash;
assert(so->fill <= so->mask); /* at least one empty slot */
n_used = so->used;
- Py_INCREF(entry->key);
- if (set_insert_key(so, entry->key, (long) entry->hash) == -1) {
- Py_DECREF(entry->key);
+ Py_INCREF(key);
+ if (set_insert_key(so, key, hash) == -1) {
+ Py_DECREF(key);
You’ll notice that I declare hash as long, but I don’t need to cast it to long to make it compile and run. However, the previous code did cast; was that done on purpose, maybe for 32-bit platforms, or is it okay to not have the cast?
|
msg131832 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2011-03-23 00:56 |
Looks good.
Can this be closed now?
|
msg131835 - (view) |
Author: Éric Araujo (eric.araujo) * |
Date: 2011-03-23 01:13 |
Used wrong number in the commit message :/ Fixed in 24179f82b7de for 2.7.
|
msg131849 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2011-03-23 03:53 |
New changeset 57657393ceaf by Éric Araujo in branch '3.1':
Fix obscure set crashers (#8420). Backport of d56b3cafb1e6, reviewed by Raymond.
http://hg.python.org/cpython/rev/57657393ceaf
|
msg131852 - (view) |
Author: Éric Araujo (eric.araujo) * |
Date: 2011-03-23 04:10 |
All done.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:59 | admin | set | github: 52667 |
2011-03-23 04:10:56 | eric.araujo | set | status: open -> closed
messages:
+ msg131852 stage: commit review -> resolved |
2011-03-23 03:53:54 | python-dev | set | status: pending -> open
messages:
+ msg131849 nosy:
+ python-dev |
2011-03-23 01:13:08 | eric.araujo | set | status: open -> pending
messages:
+ msg131835 |
2011-03-23 00:56:51 | rhettinger | set | messages:
+ msg131832 |
2011-03-23 00:20:47 | eric.araujo | set | messages:
+ msg131821 |
2011-03-22 23:06:31 | rhettinger | set | messages:
+ msg131804 |
2011-03-22 22:49:38 | eric.araujo | set | messages:
+ msg131801 |
2011-03-22 21:34:53 | rhettinger | set | messages:
+ msg131789 |
2011-03-22 21:22:11 | eric.araujo | set | messages:
+ msg131784 |
2011-03-22 21:20:59 | eric.araujo | set | files:
+ fix-set-crashers-2.7.diff
messages:
+ msg131783 |
2011-03-22 21:18:42 | eric.araujo | set | files:
- fix-set-crashers-2.7.diff |
2011-03-22 17:37:32 | rhettinger | set | assignee: rhettinger -> eric.araujo messages:
+ msg131757 |
2011-03-21 02:37:28 | eric.araujo | set | files:
+ fix-set-crashers-2.7.diff
messages:
+ msg131610 assignee: eric.araujo -> rhettinger stage: needs patch -> commit review |
2010-09-08 03:00:57 | eric.araujo | set | status: closed -> open |
2010-09-05 19:23:02 | eric.araujo | set | versions:
- Python 2.6 nosy:
+ eric.araujo
messages:
+ msg115667
assignee: rhettinger -> eric.araujo |
2010-09-03 10:07:41 | rhettinger | set | status: open -> closed resolution: fixed messages:
+ msg115430
|
2010-04-18 23:16:19 | abacabadabacaba | set | files:
+ repr.diff |
2010-04-18 22:28:25 | rhettinger | set | messages:
+ msg103543 |
2010-04-18 22:25:47 | abacabadabacaba | set | messages:
+ msg103541 |
2010-04-18 20:30:30 | rhettinger | set | messages:
+ msg103524 |
2010-04-18 19:11:01 | abacabadabacaba | set | messages:
+ msg103516 |
2010-04-18 18:21:39 | rhettinger | set | messages:
+ msg103510 |
2010-04-18 17:33:15 | rhettinger | set | messages:
+ msg103505 |
2010-04-18 12:42:59 | abacabadabacaba | set | messages:
+ msg103486 |
2010-04-18 08:46:16 | rhettinger | set | files:
+ lookkey.diff priority: normal versions:
+ Python 2.6, Python 2.7, Python 3.2 messages:
+ msg103473
keywords:
+ patch stage: needs patch |
2010-04-17 19:34:49 | abacabadabacaba | set | type: crash |
2010-04-17 19:05:36 | abacabadabacaba | set | messages:
+ msg103424 title: set_lookkey is unsafe -> Objects/setobject.c contains unsafe code |
2010-04-16 16:15:49 | rhettinger | set | assignee: rhettinger
nosy:
+ rhettinger |
2010-04-16 15:43:34 | abacabadabacaba | create | |