This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: set_swap_bodies is unsafe
Type: behavior Stage:
Components: Versions:
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: Rhamphoryncus, rhettinger
Priority: low Keywords: patch

Created on 2008-05-07 02:19 by Rhamphoryncus, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
brokensetswap.py Rhamphoryncus, 2008-05-07 02:19
python-setswap.diff Rhamphoryncus, 2008-05-08 03:10
python-setswap-2.diff Rhamphoryncus, 2008-05-13 06:31
python-setswap-3.diff Rhamphoryncus, 2008-05-14 17:08
Messages (16)
msg66349 - (view) Author: Adam Olsen (Rhamphoryncus) Date: 2008-05-07 02:19
set_swap_bodies() is used to cheaply create a frozenset from a set,
which is then used for lookups within a set.  It does this by creating a
temporary empty frozenset, swapping its contents with the original set,
doing the lookup using the frozenset, then swapping the contents back
and releasing the temporary frozenset.

Unfortunately, the lookup can invoke arbitrary code, which may examine
the original set and find it empty (until the lookup completes).  The
lookup may also save a reference to the temporary frozenset, which
mutates to being empty after the lookup completes.

The purpose seems to be allowing "someset in someotherset" to
automatically convert someset to a frozenset.  A brief search didn't
reveal a rationale for this, and in fact PEP 218's history section
claims the opposite: "Auto-conversion between mutable and immutable sets
was dropped."  Perhaps this is a forgotten vestige of that?

set_intersection_update uses set_swap_bodies for a different purpose and
it may be safe.  It depends on whether subclasses of set may retain a
reference to the tmp set somehow.
msg66358 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-05-07 12:07
set_swap_bodies() is fine as it involves pure C with no possible 
callbacks.  The issue is in its use in the __contains__() check.  I'll 
take a look at it and see what if anything needs to be changed.  

Am lowering the priority because you really have to be trying to create 
this behavior.
msg66371 - (view) Author: Adam Olsen (Rhamphoryncus) Date: 2008-05-07 20:11
The intended use is unsafe.  contains, remove, and discard all use it
for a lookup, which can't be fixed.

Upon further inspection, intersection_update is fine.  Only a temporary
set (not frozenset!) is given junk, which I don't see as a problem.

If someone can confirm that contains/remove/discard's usage is an
artifact I'll submit a patch that removes it.
msg66395 - (view) Author: Adam Olsen (Rhamphoryncus) Date: 2008-05-08 03:10
I decided not to wait.  Here's a patch.

Several of set's unit tests covered the auto-conversion, so I've
modified them.
msg66401 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-05-08 04:06
Rejecting this patch which simply disables a feature that some users 
consider to be important.

I will look at it further.  Right now, I'm inclined to simply document 
that the three temporary autoconversions deny meaningful 
contemporaneous access to a set used as a key.

The set_swap_bodies() function itself is fine -- it behaves just like 
an atomic version of the pure python sequence:  t=set(a); a.clear(); 
a.update(b); b.clear(); b.update(t); del t.

The issue is simply that the swap/search/swap dance allows the 
possibility that a determined user could graft onto the search step and 
access but not modify the temporary swapped-in frozenset. It doesn't 
crash; it simply produces an undefined result.  I'm not losing sleep 
over this scenario.

I'm am entertaining an alternative where contains/discard/remove would 
duplicate instead of swap the set bodies; however, that approach may do 
more harm than good.
msg66404 - (view) Author: Adam Olsen (Rhamphoryncus) Date: 2008-05-08 04:29
PEP 218 explicitly dropped auto-conversion as a feature.  Why should
this be an exception?
msg66406 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-05-08 04:52
If needed, I'll update the PEP to be more clear.  The sets.py module 
had two protocols: __as_immutable__() and __as_temporarily_immutable__
().  The first protocol was the one that was dropped -- it triggered 
with something like "s1.add(s2)" and it automatically created a 
frozenset copy of s2.  The second protocol is used in 
contains/remove/discard.  Alex Martelli successfully lobbied for its 
retention.
msg66426 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-05-08 17:21
Added documentation in r62873.  Leaving the code as-is.
msg66427 - (view) Author: Adam Olsen (Rhamphoryncus) Date: 2008-05-08 17:29
I've been unable to find any discussion on this feature.  If anything, I
think when PEP 218 was discussed and accepted (and PEP 351 rejected),
the assumption was it didn't exist.  Adding it now should be regarded as
a new feature and discussed on python-dev.
msg66428 - (view) Author: Adam Olsen (Rhamphoryncus) Date: 2008-05-08 17:31
Nevermind that the current implementation *is* broken, even if you
consider fixing it to be a low priority.  Closing the report with a doc
tweak isn't right.
msg66429 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-05-08 17:38
Sorry, you don't like the search with autopromotion feature. It has 
been around since sets were first coded in C.  It is a natural 
extension/consequence of the idea that frozenset('abc')==set('abc').
msg66430 - (view) Author: Adam Olsen (Rhamphoryncus) Date: 2008-05-08 17:43
So why doesn't set() in {} work?  Why was PEP 351 rejected when it would
do this properly?
msg66772 - (view) Author: Adam Olsen (Rhamphoryncus) Date: 2008-05-13 06:30
Here's another approach to avoiding set_swap_bodies.  The existing
semantics are retained.  Rather than creating a temporary frozenset and
swapping the contents, I check for a set and call the internal hash
function directly (bypassing PyObject_Hash).

I even retain the current semantics of PySet_Discard and PySet_Contains,
which do NOT do the implicit conversion (and have unit tests to verify
that!)

I do have some concern that calling PySet_Check on every call may be too
slow.  It may be better to only call it on failure (which is
more-or-less what the old code did.)

set_swap_bodies has only one remaining caller, and their use case could
probably be significantly simplified.
msg66824 - (view) Author: Adam Olsen (Rhamphoryncus) Date: 2008-05-14 17:08
Revised again.  sets are only hashed after PyObject_Hash raises a TypeError.

This also fixes a regression in test_subclass_with_custom_hash.  Oddly,
it doesn't show up in trunk, but does when my previous patch is applied
to py3k.
msg66833 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-05-14 20:59
By replacing temporary immutability with temporary hashability, does 
this approach create the possibility that someone could mutate the key-
set during a search?  Is it possible to get the __eq__ check out-of-
sync with the __hash__ value?

Also, after a quick look at the patch, I'm not too keen on any 
modifications to set_swap_bodies() nor with changing the signature of 
set_contains_key().
msg66835 - (view) Author: Adam Olsen (Rhamphoryncus) Date: 2008-05-14 22:11
There is no temporary hashability.  The hash value is calculated, but
never stored in the set's hash field, so it will never become out of
sync.  Modification while __hash__ or __eq__ is running is possible, but
for __eq__ that applies to any mutable type.

set_contains_key only has two callers, one for each value of the
treat_set_key_as_frozen argument, so I could inline it if you'd prefer that?

set_swap_bodies has only one remaining caller, which uses a normal set,
not a frozenset.  Using set_swap_bodies on a frozenset would be visible
except in a few special circumstances (ie it only contains builtin
types), so a sanity check against that seems appropriate.  The old code
reset ->hash to -1 in case one of the arguments was a frozenset -
impossible now, so I sanity check that it's always -1.
History
Date User Action Args
2022-04-11 14:56:34adminsetgithub: 47027
2008-05-14 22:11:52Rhamphoryncussetmessages: + msg66835
2008-05-14 20:59:11rhettingersetmessages: + msg66833
2008-05-14 17:08:09Rhamphoryncussetfiles: + python-setswap-3.diff
messages: + msg66824
2008-05-13 06:31:15Rhamphoryncussetfiles: + python-setswap-2.diff
messages: + msg66772
2008-05-08 17:43:11Rhamphoryncussetmessages: + msg66430
2008-05-08 17:38:27rhettingersetmessages: + msg66429
2008-05-08 17:31:30Rhamphoryncussetmessages: + msg66428
2008-05-08 17:29:14Rhamphoryncussetmessages: + msg66427
2008-05-08 17:21:09rhettingersetstatus: open -> closed
resolution: fixed
messages: + msg66426
2008-05-08 04:52:14rhettingersetmessages: + msg66406
2008-05-08 04:29:53Rhamphoryncussetmessages: + msg66404
2008-05-08 04:06:41rhettingersetmessages: + msg66401
2008-05-08 03:10:13Rhamphoryncussetfiles: + python-setswap.diff
keywords: + patch
messages: + msg66395
2008-05-07 20:11:48Rhamphoryncussetmessages: + msg66371
2008-05-07 12:07:43rhettingersetpriority: low
messages: + msg66358
2008-05-07 02:36:54benjamin.petersonsetassignee: rhettinger
nosy: + rhettinger
2008-05-07 02:19:30Rhamphoryncuscreate