classification
Title: Automatic set-to-frozenset conversions not thread-safe
Type: behavior Stage:
Components: Interpreter Core Versions: Python 3.1, Python 3.2, Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: dstanek, ezio.melotti, flox, rhettinger, stutzbach
Priority: low Keywords:

Created on 2010-05-18 21:07 by stutzbach, last changed 2010-08-06 13:57 by stutzbach. This issue is now closed.

Files
File name Uploaded Description Edit
set-race.py stutzbach, 2010-05-18 21:07 Script to demonstrate the race condition
Messages (8)
msg106007 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-05-18 21:07
"some_set in some_set_of_sets" contains a race condition that can lead to odd behavior.  To perform the test, the set_contains() function in setobject.c creates a temporary frozenset object (t), swaps the bodies of some_set and t, checks if t is in some_set_of_sets, then swaps the bodies back.

Unfortunately, comparisons or hash functions may release the GIL, so the swapped bodies may be exposed on a different thread, i.e., "some_set in some_set_of_sets" may cause "some_set" to be empty on some other thread.

The same race condition exists in set_discard() and set_remove().

Attached is a short script that demonstrates the problem and could be easily converted to a unit test.
msg112833 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-08-04 16:36
FWIW, I'm am considering removing this functionality in Py3.3 after the language moratorium ends. 

The swap-bodies technique had been included in the original sets.py and the technique is similar to the one list.sort() uses to protect against mutation during sorting.  However, the technique is open to exploits like the one in this bug report.

I've retitled this report because the set code itself does not have a race condition.  The race is in the provided exploit code which falsely assumes that the set-in-set operation is either atomic or non-mutating, so it doesn't put locks around it as you would with pure python code like that in sets.py.
msg112846 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-08-04 18:05
The swap bodies technique has been used in list.sort for a long time, but users expect list.sort to mutate the list.

Even with a pure-Python implementation of sets, I would not expect __contains__ to be a mutating method (unless I have gone out of way to write an evil __eq__ or __hash__ function).

I grant you that I would not expect atomicity, but the example script does not assume atomicity.
msg112855 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-08-04 18:35
I didn't disagree with you.  The swap bodies implementation technique is not expected and is exploitable in a number of ways.

What outcome to you want?  Eliminate the implicit set-to-frozenset conversion feature; introduce a non-mutating version that copies the entire set; or just document that the automatic conversion is non-atomic, temporarily mutating, and not thread-safe?

Personally, I've never been convinced of the value of this feature and would not feel sad to have it removed.
msg112931 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-08-05 00:14
FWIW, the current docs adequately cover the existing situation:
"Note, the elem argument to the __contains__(), remove(), and discard() methods may be a set. To support searching for an equivalent frozenset, the elem set is temporarily mutated during the search and then restored. During the search, the elem set should not be read or mutated since it does not have a meaningful value."

That being said, I'm going to get rid of set_swap_bodies technique.  Instead, will use an atomic full-copy of the set into a separate frozenset.  That will cure the most egregious problems at the expense of some speed and of creating a new frozenset object that will be visible to a user determined to see it.
msg113090 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2010-08-06 10:06
Can the attached script be converted in a unittest that tests that the fix is correct?
msg113095 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-08-06 10:38
See r83757, r83756, and r83755.

The existing unittests prove the script does what it is supposed to do.  The new code is just a different approach to the same problem so that it is less likely to be user visible.

The attached set-race.py fragment is an incorrect test.  The docs state that the object should not be read by another thread.   So, this "test" would be validating a behavior that is not guaranteed either in this implementation or in others.
msg113100 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-08-06 13:57
It is easy in online communications to interpret a response as a disagreement.  I apologize for falling into that trap.

I am +1 on removing implicit set-to-frozenset conversions, which would more effectively "fix" issue8752 for me (this is the only aspect of set behavior that I cannot emulate with a carefully designed collections.Set subclass).  Also, I think forcing users to be explicit is more Pythonic.

In the meantime, a full copy is the solution I had in mind.  While it's less efficient, it's only slower by a constant factor.  The swap method still required O(n) steps to compute the hash.
History
Date User Action Args
2010-08-06 13:57:28stutzbachsetmessages: + msg113100
2010-08-06 10:38:05rhettingersetstatus: open -> closed
resolution: not a bug
messages: + msg113095
2010-08-06 10:06:28ezio.melottisetnosy: + ezio.melotti
messages: + msg113090
2010-08-05 00:14:55rhettingersetmessages: + msg112931
versions: - Python 2.6
2010-08-04 18:35:59rhettingersetmessages: + msg112855
2010-08-04 18:05:47stutzbachsetmessages: + msg112846
2010-08-04 16:36:44rhettingersetpriority: normal -> low

messages: + msg112833
title: Race condition when checking for set in set -> Automatic set-to-frozenset conversions not thread-safe
2010-08-04 07:41:21floxsetnosy: + flox
2010-08-04 01:32:00dstaneksetnosy: + dstanek
2010-08-01 06:45:28georg.brandlsetassignee: rhettinger

nosy: + rhettinger
2010-05-18 21:07:32stutzbachcreate