This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Objects/setobject.c contains unsafe code
Type: crash Stage: resolved
Components: Interpreter Core Versions: Python 3.1, Python 3.2, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: eric.araujo Nosy List: abacabadabacaba, eric.araujo, python-dev, rhettinger
Priority: normal Keywords: patch

Created on 2010-04-16 15:43 by abacabadabacaba, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
lookkey.diff rhettinger, 2010-04-18 08:46 lookkey.diff review
repr.diff abacabadabacaba, 2010-04-18 23:16 patch for set_repr
fix-set-crashers-2.7.diff eric.araujo, 2011-03-22 21:20
Messages (24)
msg103333 - (view) Author: Evgeny Kapun (abacabadabacaba) Date: 2010-04-16 15:43
I've noticed that set_lookkey (in Objects/setobject.c) does some unsafe things:
Objects/setobject.c:
> if (entry->hash == hash) {
> 	startkey = entry->key;
> 	Py_INCREF(startkey);
> 	cmp = PyObject_RichCompareBool(startkey, key, Py_EQ);
> 	Py_DECREF(startkey);
At this point, object pointed to by startkey could be deallocated, and then new object may be allocated at the same address.
> 	if (cmp < 0)
> 		return NULL;
> 	if (table == so->table && entry->key == startkey) {
At this point, the table may be reallocated at the same address but with different (possibly smaller) size, so entry->key may be in deallocated memory. Also, entry->key may be equal to startkey but still point to an object other than one key was compared with.
> 		if (cmp > 0)
> 			return entry;
> 	}
> 	else {
> 		/* The compare did major nasty stuff to the
> 		 * set:  start over.
> 		 */
> 		return set_lookkey(so, key, hash);
This can lead to infinite recursion.
> 	}
msg103424 - (view) Author: Evgeny Kapun (abacabadabacaba) Date: 2010-04-17 19:05
I've found more unsafe code in Objects/setobject.c.
This code makes Python 3.1.2 segfault by using a bug in function set_merge:

class bad:
	def __eq__(self, other):
		if be_bad:
			set2.clear()
			raise Exception
		return self is other
	def __hash__(self):
		return 0
be_bad = False
set1 = {bad()}
set2 = {bad() for i in range(2000)}
be_bad = True
set1.update(set2)

Function set_symmetric_difference_update has a similar bug.

Another bug in set_symmetric_difference_update:

class bad:
	def __init__(self):
		print("Creating", id(self))
	def __del__(self):
		print("Deleting", id(self))
	def __eq__(self, other):
		print("Comparing", id(self), "and", id(other))
		if be_bad:
			dict2.clear()
		return self is other
	def __hash__(self):
		return 0
be_bad = False
set1 = {bad()}
dict2 = {bad(): None}
be_bad = True
set1.symmetric_difference_update(dict2)
msg103473 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-04-18 08:46
Attaching a patch for set_lookkey() that does a DECREF only when the refcnt > 1 so that the DECREF won't trigger any state changes.

The merge crasher still needs a patch.
msg103486 - (view) Author: Evgeny Kapun (abacabadabacaba) Date: 2010-04-18 12:42
This patch still assumes that if so->table didn't change then the table wasn't reallocated (see http://en.wikipedia.org/wiki/ABA_problem). One solution is to check that so->mask didn't change as well. Also, checking that refcnt > 1 is redundant because if entry->key == startkey then there are at least two references: one from entry->key and another from startkey.

These functions have a bug that may cause them to refer to deallocated memory when both arguments are sets: set_intersection, set_isdisjoint, set_difference_update_internal, set_difference, set_symmetric_difference_update, set_issubset.
These functions may also do the same if the first argument is a set and the second argument is a dict: set_difference, set_symmetric_difference_update.

Bugs in set_repr:
> keys = PySequence_List((PyObject *)so);
> if (keys == NULL)
> 	goto done;
> 
> listrepr = PyObject_Repr(keys);
> Py_DECREF(keys);
List pointed to by keys is already deallocated at this point.
> if (listrepr == NULL) {
> 	Py_DECREF(keys);
But this code tries to DECREF it.
> 	goto done;
> }
> newsize = PyUnicode_GET_SIZE(listrepr);
> result = PyUnicode_FromUnicode(NULL, newsize);
> if (result) {
> 	u = PyUnicode_AS_UNICODE(result);
> 	*u++ = '{';
> 	/* Omit the brackets from the listrepr */
> 	Py_UNICODE_COPY(u, PyUnicode_AS_UNICODE(listrepr)+1,
> 			   PyUnicode_GET_SIZE(listrepr)-2);
> 	u += newsize-2;
> 	*u++ = '}';
> }
> Py_DECREF(listrepr);
> if (Py_TYPE(so) != &PySet_Type) {
result may be NULL here.
> 	PyObject *tmp = PyUnicode_FromFormat("%s(%U)",
> 					     Py_TYPE(so)->tp_name,
> 					     result);
I think PyUnicode_FromFormat won't like it.
> 	Py_DECREF(result);
> 	result = tmp;
> }
msg103505 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-04-18 17:33
> One solution is to check that so->mask didn't 
> change as well. 

I saw that and agree it would make a tighter check, but haven't convinced myself that it is necessary.

> Also, checking that refcnt > 1 is redundant 
> because if entry->key == startkey then there 
> are at least two references: one from entry->key 
> and another from startkey.

It is a meaningful check.  We have our own INCREF
and one for the key being in the table.  If the
count is 1, then it means that the comparison
check deleted the key from the table or replaced
its value (see issue 1517).
msg103510 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-04-18 18:21
>Bugs in set_repr:
>> keys = PySequence_List((PyObject *)so);
>> if (keys == NULL)
>> 	goto done;
>> 
>> listrepr = PyObject_Repr(keys);
>> Py_DECREF(keys);
>List pointed to by keys is already deallocated at this point.
>> if (listrepr == NULL) {
>> 	Py_DECREF(keys);
>>But this code tries to DECREF it.
>> 	goto done;
>> }

I don't follow why you think keys is already deallocated.
When assigned by PySequence_List() without a NULL return, the refcnt is one. The call to PyObject_Repr(keys) does not change the refcnt of keys,
so the Py_DECREF(keys) is correct.
msg103516 - (view) Author: Evgeny Kapun (abacabadabacaba) Date: 2010-04-18 19:11
> > One solution is to check that so->mask didn't 
> > change as well. 
> 
> I saw that and agree it would make a tighter check, but haven't convinced myself that it is necessary.

Without this check, it is possible that the comparison shrinks the table, so entry becomes out of bounds. However, if both so->table and so->mask didn't change then entry is still a pointer to one of table elements so it can be used safely.

> > Also, checking that refcnt > 1 is redundant 
> > because if entry->key == startkey then there 
> > are at least two references: one from entry->key 
> > and another from startkey.
> 
> It is a meaningful check.  We have our own INCREF
> and one for the key being in the table.  If the
> count is 1, then it means that the comparison
> check deleted the key from the table or replaced
> its value (see issue 1517).

If the comparison deleted or changed the key then the check entry->key == startkey would fail so refcnt check won't be reached. Checking refcounts is also bad because someone else may have references to the key.

> I don't follow why you think keys is already deallocated.
> When assigned by PySequence_List() without a NULL return, the refcnt is one. The call to PyObject_Repr(keys) does not change the refcnt of keys,
> so the Py_DECREF(keys) is correct.
Look at the code again. If listrepr happens to be NULL, you do Py_DECREF(keys) twice (this bug is only present in py3k branch).
msg103524 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-04-18 20:30
Fixed set_repr() issue in r80197 and r80196.  Looks like someone futzed the unicode updates for 3.x.
msg103541 - (view) Author: Evgeny Kapun (abacabadabacaba) Date: 2010-04-18 22:25
This code crashes python by using another bug in set_repr. This only affects py3k. This code relies on out-of-memory condition, so run it like:
$ (ulimit -v 65536 && python3 test.py)
Otherwise, it will eat all your free memory before crashing.

val = "a" * 10000
class big:
	def __repr__(self):
		return val
i = 16
while True:
	repr(frozenset(big() for j in range(i)))
	i = (i * 5) >> 2
msg103543 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-04-18 22:28
Patch please.
msg115430 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-09-03 10:07
Fixed in r84447
msg115667 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-09-05 19:23
I’m doing the backport, I’ll assign back to Raymond when the patch is ready.
msg131610 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-03-21 02:37
I was able to reproduce the crash in 2.7 and fix it with the patch.  I only had to remove a cast to long to make the patch apply.  I also changed tabs to spaces in two lines.

Please review and assign back to me for commit. :)
msg131757 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-03-22 17:37
This looks okay.

On the first lines for set_merge() where there is:
  key = entry->key;
also do:
  hash = entry->key;
so that the two get handled in a parallel fashion.

Thanks for doing the backporting.
msg131783 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-03-22 21:20
I assume you meant “hash = entry->hash”, not “entry->key”.  Updated patch attached.
msg131784 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-03-22 21:22
BTW, set_add_entry still uses entry->hash.
msg131789 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-03-22 21:34
The revised patch looks good.
Please make the same change for set_add_entry.
If the tests run, go ahead and apply.
msg131801 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-03-22 22:49
Is it a candidate for 3.1 too?
msg131804 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-03-22 23:06
Sure, go ahead and apply to 3.1
msg131821 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-03-23 00:20
Here is a part of my 3.1 patch:

+    PyObject *key = entry->key;
+    long hash = entry->hash;

     assert(so->fill <= so->mask);  /* at least one empty slot */
     n_used = so->used;
-    Py_INCREF(entry->key);
-    if (set_insert_key(so, entry->key, (long) entry->hash) == -1) {
-        Py_DECREF(entry->key);
+    Py_INCREF(key);
+    if (set_insert_key(so, key, hash) == -1) {
+        Py_DECREF(key);

You’ll notice that I declare hash as long, but I don’t need to cast it to long to make it compile and run.  However, the previous code did cast; was that done on purpose, maybe for 32-bit platforms, or is it okay to not have the cast?
msg131832 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-03-23 00:56
Looks good.
Can this be closed now?
msg131835 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-03-23 01:13
Used wrong number in the commit message :/  Fixed in 24179f82b7de for 2.7.
msg131849 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-03-23 03:53
New changeset 57657393ceaf by Éric Araujo in branch '3.1':
Fix obscure set crashers (#8420).  Backport of d56b3cafb1e6, reviewed by Raymond.
http://hg.python.org/cpython/rev/57657393ceaf
msg131852 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-03-23 04:10
All done.
History
Date User Action Args
2022-04-11 14:56:59adminsetgithub: 52667
2011-03-23 04:10:56eric.araujosetstatus: open -> closed

messages: + msg131852
stage: commit review -> resolved
2011-03-23 03:53:54python-devsetstatus: pending -> open


messages: + msg131849
nosy: + python-dev
2011-03-23 01:13:08eric.araujosetstatus: open -> pending

messages: + msg131835
2011-03-23 00:56:51rhettingersetmessages: + msg131832
2011-03-23 00:20:47eric.araujosetmessages: + msg131821
2011-03-22 23:06:31rhettingersetmessages: + msg131804
2011-03-22 22:49:38eric.araujosetmessages: + msg131801
2011-03-22 21:34:53rhettingersetmessages: + msg131789
2011-03-22 21:22:11eric.araujosetmessages: + msg131784
2011-03-22 21:20:59eric.araujosetfiles: + fix-set-crashers-2.7.diff

messages: + msg131783
2011-03-22 21:18:42eric.araujosetfiles: - fix-set-crashers-2.7.diff
2011-03-22 17:37:32rhettingersetassignee: rhettinger -> eric.araujo
messages: + msg131757
2011-03-21 02:37:28eric.araujosetfiles: + fix-set-crashers-2.7.diff

messages: + msg131610
assignee: eric.araujo -> rhettinger
stage: needs patch -> commit review
2010-09-08 03:00:57eric.araujosetstatus: closed -> open
2010-09-05 19:23:02eric.araujosetversions: - Python 2.6
nosy: + eric.araujo

messages: + msg115667

assignee: rhettinger -> eric.araujo
2010-09-03 10:07:41rhettingersetstatus: open -> closed
resolution: fixed
messages: + msg115430
2010-04-18 23:16:19abacabadabacabasetfiles: + repr.diff
2010-04-18 22:28:25rhettingersetmessages: + msg103543
2010-04-18 22:25:47abacabadabacabasetmessages: + msg103541
2010-04-18 20:30:30rhettingersetmessages: + msg103524
2010-04-18 19:11:01abacabadabacabasetmessages: + msg103516
2010-04-18 18:21:39rhettingersetmessages: + msg103510
2010-04-18 17:33:15rhettingersetmessages: + msg103505
2010-04-18 12:42:59abacabadabacabasetmessages: + msg103486
2010-04-18 08:46:16rhettingersetfiles: + lookkey.diff
priority: normal
versions: + Python 2.6, Python 2.7, Python 3.2
messages: + msg103473

keywords: + patch
stage: needs patch
2010-04-17 19:34:49abacabadabacabasettype: crash
2010-04-17 19:05:36abacabadabacabasetmessages: + msg103424
title: set_lookkey is unsafe -> Objects/setobject.c contains unsafe code
2010-04-16 16:15:49rhettingersetassignee: rhettinger

nosy: + rhettinger
2010-04-16 15:43:34abacabadabacabacreate