classification
Title: set() operators don't work with collections.Set instances
Type: behavior Stage: patch review
Components: Interpreter Core Versions: Python 3.4, Python 3.3, Python 3.2, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: Mark.Shannon, asvetlov, daniel.urban, dstanek, jcea, ncoghlan, rhettinger, serhiy.storchaka, stutzbach, terry.reedy, ysj.ray
Priority: high Keywords: patch

Created on 2010-05-17 18:50 by stutzbach, last changed 2013-05-21 13:35 by ncoghlan.

Files
File name Uploaded Description Edit
set-vs-set.py stutzbach, 2010-05-17 18:50 Minimal example
prelim.patch rhettinger, 2010-09-02 09:04 First draft
set-with-Set.patch stutzbach, 2010-12-15 01:53
Messages (17)
msg105929 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-05-17 18:50
The set() operators (__or__, __and__, __sub__, __xor__, and their in-place counterparts) require that the parameter also be an instance of set().

They're documented that way:  "This precludes error-prone constructions like set('abc') &  'cbs' in favor of the more readable set('abc').intersection('cbs')."

However, an unintended consequence of this behavior is that they don't inter-operate with user-created types that derive from collections.Set.

That leads to oddities like this:

MySimpleSet() | set()  # This works
set() | MySimpleSet()  # Raises TypeError

(MySimpleSet is a minimal class derived from collections.Set for illustrative purposes -- set attached file)

collections.Set's operators accept any iterable.

I'm not 100% certain what the correct behavior should be.  Perhaps set's operators should be a bit more liberal and accept any collections.Set instance, while collections.Set's operators should be a bit more conservative.  Perhaps not.  It's a little subjective.

It seems to me that at minimum set() and collections.Set() should inter-operate and have the same behavior.
msg105930 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-05-17 19:01
I should add:

I discovered the inconsistency while working on my sortedset class, which provides the same interface as set() but is also indexable like a list (e.g., S[0] always returns the minimum element, S[-1] returns the maximum element, etc.).

sortedset derives from collections.MutableSet, but it's challenging to precisely emulate set() when collections.MutableSet and set() don't work the same way. ;-)
msg112359 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-08-01 17:04
Guido, do you have a recommendation?
msg112404 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2010-08-01 23:33
No idea, I don't even know what collections.Set is. :-(
msg112480 - (view) Author: ysj.ray (ysj.ray) Date: 2010-08-02 14:33
In my opinion, the set's operator should be a bit more liberal and accept any collections.Set instances. Given collections.Set is an ABC and isinstance(set, collections.Set) is True, the set methods should(strong recommended) follow all the generalized abstract semantic definition in the ABC. This according to PEP 3119:
"""
In addition, the ABCs define a minimal set of methods that establish the characteristic behavior of the type. Code that discriminates objects based on their ABC type can trust that those methods will always be present. Each of these methods are accompanied by an generalized abstract semantic definition that is described in the documentation for the ABC. These standard semantic definitions are not enforced, but are strongly recommended.
"""

The collections.Set defines __or__() as this (for example):
"""
    def __or__(self, other):
        if not isinstance(other, Iterable):
            return NotImplemented
        chain = (e for s in (self, other) for e in s)
        return self._from_iterable(chain)
"""
which means the "|" operator should accept all iterable. So I think it's better to make set's methods should be more liberal.
msg115291 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-09-01 12:27
Raymond, do you agree with Ray's analysis?
msg115344 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-09-02 00:07
The operator methods in setobject.c should be liberalized to accept instances of collections.Set as arguments.  For speed, they should continue to check PyAnySet_Check(other) first and then if that fails, fall back to testing PyObject_IsInstance(other, collections.Set).  

Internally, the set methods will still need to process "other" as just an iterable container because it cannot rely on elements in "other" as being hashable (for example, the ListBasedSet in the docs does not require hashability) or unique (as perceived by setobject.c it may not work with some set implementing a key-function for an equivalence class whose key-function would be unknown to setobject.c which relies on __hash__ and __eq__).

To implement PyObject_IsInstance(other, collections.Set), there may be a bootstrap issue (with the C code being compiled and runnable before _abcoll.py is able to create the Set ABC).  If so, it may be necessary to create an internal _BaseSet object in setobject.c that can be used in collections.Set.  Alternatively, the code in setobject.c can lazily (at runtime) lookup collections.Set by name and cache it so that we only do one successful lookup per session.

Whatever approach is taken, it should be done with an eye towards the larger problem that Python is filled with concrete isinstance() checks that pre-date ABCs and many of those need to be liberalized (accepting a registered ABC and providing different execution paths for known and unknown concrete types).
msg115357 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-09-02 02:51
> The operator methods in setobject.c should be liberalized to accept
> instances of collections.Set as arguments.

Under this plan, set() and collections.Set will still have slightly different behavior.  collections.Set will be more liberal and accept any iterable.  Are you okay with that?  I don't feel strongly about this point; I just want to make sure it's a conscious decision.

I do feel strongly that set and collections.Set should be able to inter-operate nicely and the proposal satisfies that requirement so I would be happy with it.

> To implement PyObject_IsInstance(other, collections.Set), there may
> be a bootstrap issue (with the C code being compiled and runnable
> before _abcoll.py is able to create the Set ABC). Alternatively, 
> the code in setobject.c can lazily (at runtime) lookup 
> collections.Set by name and cache it so that we only do one
> successful lookup per session.

I favor the lazy lookup approach.

> Whatever approach is taken, it should be done with an eye towards 
> the larger problem that Python is filled with concrete isinstance()
> checks that pre-date ABCs and many of those need to be liberalized
> (accepting a registered ABC and providing different execution paths
> for known and unknown concrete types).

Agreed.  Ideally, the "PyObject_IsInstance(other, collections.Set)" logic would be abstracted out as much as possible so other parts of Python can make similar checks without needing tons of boilerplate code in every spot.

For what it's worth, I don't think we will find as many inconsistency issues with ABCs other than Set.  Set has methods that take another Set and return a third Set.  That forces different concrete implementations of the Set ABC to interact in a way that won't come up for a Sequence or Mapping.

(I suppose that Sequence.extend or MutableMapping.update are somewhat similar, but list.extend and dict.update are already very liberal in what they accept as a parameter.)
msg115363 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-09-02 09:04
Rough cut at a first patch is attached.

Still thinking about whether Set operations should be accepting any iterable or whether they should be tightened to expect other Set instances.  The API for set() came from set.py which was broadly discussed and widely exercised.  Guido was insistent that non-sets be excluded from the operator interactions (list.__iadd__ being on his list of regrets).   That was probably a good decision, but the Set API violated this norm and it did not include named methods like difference(), update(), and intersection() to handle the iterable cases.

Also, still thinking about whether the comparison operators should be making tight or loose checks.
msg122959 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-12-01 00:01
Daniel, do you have time to work on this one?

If so, go ahead an make setobject.c accept any instance of collections.Set and make the corresponding change to the ABCs:

    def __or__(self, other):
        if not isinstance(other, Set):
            return NotImplemented
        chain = (e for s in (self, other) for e in s)
        return self._from_iterable(chain)

The code in the attached prelim.patch has working C code isinstance(x, collections.Set), but the rest of the patch that applies is has not been tested.  It needs to be applied very carefully and thoughtfully because:
* internally, the self and other can get swapped on a binary call
* we can't make *any* assumptions about "other" (that duplicates have actually been eliminated or the the elements are even hashable).

The most reliable thing to do for the case where PyAnySet(obj) is False but isinstance(obj, collections.Set) is true is to call the named method such as s.union(other) instead of continuing with s.__or__ which was designed only with real sets in mind.
msg122961 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-12-01 00:03
Yes, I can take a stab at it.
msg122962 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-12-01 00:08
No need to rush this for the beta.  It's a bug fix and can go in at any time.  The important thing is that we don't break the C code.  The __ror__ magic method would still need to do the right thing and the C code needs to defend against the interpreter swapping self and other.
msg124000 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-12-15 01:53
Would it be sufficient to:

1) Restrict collections.Set()'s operators to accept collection.Set but not arbitrary iterables, and
2) Fix Issue2226 and let set() | MySimpleSet() work via collections.Set.__ror__

Attached is a patch that implements this approach, nominally fixing both this and Issue2226.

This solutions seems much too simple in light of how long I've been thinking about these bugs.  I suspect there are code hobgoblins waiting to ambush me. ;)
msg138222 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-06-12 22:19
If the code were acting exactly as documented, I would consider this a feature request. But "require that the parameter also be an instance of set()" (from original message) is too limited.

>>> set() | frozenset()
set()

So 'set' in "their operator based counterparts require their arguments to be sets." (doc) seems to be meant to be more generic, in which case 'instance of collections.Set' seems reasonable. To be clear, the doc could be updated to "... sets, frozensets, and other instances of collections.Set."

"Both set and frozenset support set to set comparisons. " This includes comparisons between the two classes.

>>> set() == frozenset()
True

so perhaps comparisons should be extended also.
msg155760 - (view) Author: Mark Shannon (Mark.Shannon) * Date: 2012-03-14 16:09
Review of set-with-Set.patch:

Looks good overall. 
I agree that restricting operations to instances of Set rather than Iterable is correct.

Implementing "__rsub__" in terms of - (subtraction) means that infinite recursion is a possibility. It also creates an unnecessary temporary.
Could you just reverse the expression used in __sub__?

Would you add tests for comparisons; Set() == set(), etc.
There are probably tested implicitly in the rest of the test suite, but explicit tests would be good.
msg174333 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2012-10-31 17:07
Heads up, Issue #16373.
msg189753 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-05-21 13:35
Armin pointed out in http://lucumr.pocoo.org/2013/5/21/porting-to-python-3-redux/ that one nasty consequence of the remaining part of issue 2226 and this bug is making it much harder than it should be to use the ItemsView, KeysView and ValuesView from collections.abc to implement third party mappings that behave like the builtin dict.
History
Date User Action Args
2013-05-21 13:35:47ncoghlansetmessages: + msg189753
2013-05-20 14:28:43ncoghlansetnosy: + ncoghlan
2012-11-01 05:08:15rhettingersetassignee: stutzbach -> rhettinger
2012-10-31 21:50:35asvetlovsetnosy: + asvetlov
2012-10-31 17:07:49jceasetmessages: + msg174333
2012-10-31 16:50:50jceasetnosy: + jcea
2012-10-31 15:57:11serhiy.storchakasetnosy: + serhiy.storchaka

versions: + Python 3.4
2012-03-14 16:09:30Mark.Shannonsetnosy: + Mark.Shannon
messages: + msg155760
2011-08-19 01:33:19meador.ingesetresolution: accepted ->
stage: needs patch -> patch review
2011-06-26 18:51:34terry.reedysetversions: + Python 3.3, - Python 3.1
2011-06-12 22:19:52terry.reedysetnosy: + terry.reedy
messages: + msg138222
2011-01-26 17:56:01dstaneksetnosy: + dstanek
2010-12-15 01:53:13stutzbachsetfiles: + set-with-Set.patch
nosy: rhettinger, stutzbach, daniel.urban, ysj.ray
messages: + msg124000
2010-12-01 17:01:32daniel.urbansetnosy: + daniel.urban
2010-12-01 00:08:19rhettingersetmessages: + msg122962
2010-12-01 00:03:59stutzbachsetmessages: + msg122961
2010-12-01 00:01:19rhettingersetassignee: rhettinger -> stutzbach
messages: + msg122959
2010-11-29 21:00:32rhettingersetmessages: - msg117005
2010-09-20 23:36:59rhettingersetmessages: + msg117005
2010-09-20 23:34:23rhettingersetmessages: - msg116998
2010-09-20 23:02:58rhettingersetmessages: + msg116998
2010-09-02 09:04:20rhettingersetfiles: + prelim.patch
keywords: + patch
messages: + msg115363
2010-09-02 02:51:48stutzbachsetresolution: accepted
messages: + msg115357
versions: + Python 3.1
2010-09-02 00:07:28rhettingersetmessages: + msg115344
stage: test needed -> needs patch
2010-09-01 12:27:56stutzbachsetmessages: + msg115291
2010-08-22 16:57:05gvanrossumsetnosy: - gvanrossum
2010-08-22 08:13:44rhettingersetpriority: normal -> high
2010-08-02 14:33:21ysj.raysetnosy: + ysj.ray
messages: + msg112480
2010-08-02 00:44:41rhettingersetassignee: rhettinger
2010-08-01 23:33:30gvanrossumsetassignee: gvanrossum -> (no value)
messages: + msg112404
2010-08-01 17:04:37rhettingersetassignee: rhettinger -> gvanrossum

messages: + msg112359
nosy: + gvanrossum
2010-08-01 06:22:16georg.brandlsetassignee: rhettinger

nosy: + rhettinger
2010-05-17 19:01:17stutzbachsetmessages: + msg105930
2010-05-17 18:50:44stutzbachcreate