classification
Title: set() operators don't work with collections.Set instances
Type: behavior Stage: patch review
Components: Interpreter Core Versions: Python 3.5, Python 3.4, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: Mark.Shannon, asvetlov, daniel.urban, dstanek, jcea, mdengler, ncoghlan, python-dev, rhettinger, serhiy.storchaka, stutzbach, terry.reedy, yselivanov, ysj.ray
Priority: normal Keywords: patch

Created on 2010-05-17 18:50 by stutzbach, last changed 2014-05-26 08:02 by rhettinger. This issue is now closed.

Files
File name Uploaded Description Edit
set-vs-set.py stutzbach, 2010-05-17 18:50 Minimal example
prelim.patch rhettinger, 2010-09-02 09:04 First draft
set-with-Set.patch stutzbach, 2010-12-15 01:53
issue8743-set-ABC-interoperability.diff ncoghlan, 2014-02-02 08:01 Additional tests and updated for 3.4 review
issue8743-set-ABC-interoperability_v2.diff ncoghlan, 2014-02-02 08:12 Also avoid infinite recursion risk in Set.__rsub__ review
fix_set_abc.diff rhettinger, 2014-05-25 08:39 Fix set abc and set for cross-type comparisons
fix_set_abc2.diff rhettinger, 2014-05-25 17:51 Add non-set iterables tests
fix_set_abc3.diff rhettinger, 2014-05-25 22:58 Add tests for sets.Set
Messages (29)
msg105929 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-05-17 18:50
The set() operators (__or__, __and__, __sub__, __xor__, and their in-place counterparts) require that the parameter also be an instance of set().

They're documented that way:  "This precludes error-prone constructions like set('abc') &  'cbs' in favor of the more readable set('abc').intersection('cbs')."

However, an unintended consequence of this behavior is that they don't inter-operate with user-created types that derive from collections.Set.

That leads to oddities like this:

MySimpleSet() | set()  # This works
set() | MySimpleSet()  # Raises TypeError

(MySimpleSet is a minimal class derived from collections.Set for illustrative purposes -- set attached file)

collections.Set's operators accept any iterable.

I'm not 100% certain what the correct behavior should be.  Perhaps set's operators should be a bit more liberal and accept any collections.Set instance, while collections.Set's operators should be a bit more conservative.  Perhaps not.  It's a little subjective.

It seems to me that at minimum set() and collections.Set() should inter-operate and have the same behavior.
msg105930 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-05-17 19:01
I should add:

I discovered the inconsistency while working on my sortedset class, which provides the same interface as set() but is also indexable like a list (e.g., S[0] always returns the minimum element, S[-1] returns the maximum element, etc.).

sortedset derives from collections.MutableSet, but it's challenging to precisely emulate set() when collections.MutableSet and set() don't work the same way. ;-)
msg112359 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-08-01 17:04
Guido, do you have a recommendation?
msg112404 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2010-08-01 23:33
No idea, I don't even know what collections.Set is. :-(
msg112480 - (view) Author: ysj.ray (ysj.ray) Date: 2010-08-02 14:33
In my opinion, the set's operator should be a bit more liberal and accept any collections.Set instances. Given collections.Set is an ABC and isinstance(set, collections.Set) is True, the set methods should(strong recommended) follow all the generalized abstract semantic definition in the ABC. This according to PEP 3119:
"""
In addition, the ABCs define a minimal set of methods that establish the characteristic behavior of the type. Code that discriminates objects based on their ABC type can trust that those methods will always be present. Each of these methods are accompanied by an generalized abstract semantic definition that is described in the documentation for the ABC. These standard semantic definitions are not enforced, but are strongly recommended.
"""

The collections.Set defines __or__() as this (for example):
"""
    def __or__(self, other):
        if not isinstance(other, Iterable):
            return NotImplemented
        chain = (e for s in (self, other) for e in s)
        return self._from_iterable(chain)
"""
which means the "|" operator should accept all iterable. So I think it's better to make set's methods should be more liberal.
msg115291 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-09-01 12:27
Raymond, do you agree with Ray's analysis?
msg115344 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-09-02 00:07
The operator methods in setobject.c should be liberalized to accept instances of collections.Set as arguments.  For speed, they should continue to check PyAnySet_Check(other) first and then if that fails, fall back to testing PyObject_IsInstance(other, collections.Set).  

Internally, the set methods will still need to process "other" as just an iterable container because it cannot rely on elements in "other" as being hashable (for example, the ListBasedSet in the docs does not require hashability) or unique (as perceived by setobject.c it may not work with some set implementing a key-function for an equivalence class whose key-function would be unknown to setobject.c which relies on __hash__ and __eq__).

To implement PyObject_IsInstance(other, collections.Set), there may be a bootstrap issue (with the C code being compiled and runnable before _abcoll.py is able to create the Set ABC).  If so, it may be necessary to create an internal _BaseSet object in setobject.c that can be used in collections.Set.  Alternatively, the code in setobject.c can lazily (at runtime) lookup collections.Set by name and cache it so that we only do one successful lookup per session.

Whatever approach is taken, it should be done with an eye towards the larger problem that Python is filled with concrete isinstance() checks that pre-date ABCs and many of those need to be liberalized (accepting a registered ABC and providing different execution paths for known and unknown concrete types).
msg115357 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-09-02 02:51
> The operator methods in setobject.c should be liberalized to accept
> instances of collections.Set as arguments.

Under this plan, set() and collections.Set will still have slightly different behavior.  collections.Set will be more liberal and accept any iterable.  Are you okay with that?  I don't feel strongly about this point; I just want to make sure it's a conscious decision.

I do feel strongly that set and collections.Set should be able to inter-operate nicely and the proposal satisfies that requirement so I would be happy with it.

> To implement PyObject_IsInstance(other, collections.Set), there may
> be a bootstrap issue (with the C code being compiled and runnable
> before _abcoll.py is able to create the Set ABC). Alternatively, 
> the code in setobject.c can lazily (at runtime) lookup 
> collections.Set by name and cache it so that we only do one
> successful lookup per session.

I favor the lazy lookup approach.

> Whatever approach is taken, it should be done with an eye towards 
> the larger problem that Python is filled with concrete isinstance()
> checks that pre-date ABCs and many of those need to be liberalized
> (accepting a registered ABC and providing different execution paths
> for known and unknown concrete types).

Agreed.  Ideally, the "PyObject_IsInstance(other, collections.Set)" logic would be abstracted out as much as possible so other parts of Python can make similar checks without needing tons of boilerplate code in every spot.

For what it's worth, I don't think we will find as many inconsistency issues with ABCs other than Set.  Set has methods that take another Set and return a third Set.  That forces different concrete implementations of the Set ABC to interact in a way that won't come up for a Sequence or Mapping.

(I suppose that Sequence.extend or MutableMapping.update are somewhat similar, but list.extend and dict.update are already very liberal in what they accept as a parameter.)
msg115363 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-09-02 09:04
Rough cut at a first patch is attached.

Still thinking about whether Set operations should be accepting any iterable or whether they should be tightened to expect other Set instances.  The API for set() came from set.py which was broadly discussed and widely exercised.  Guido was insistent that non-sets be excluded from the operator interactions (list.__iadd__ being on his list of regrets).   That was probably a good decision, but the Set API violated this norm and it did not include named methods like difference(), update(), and intersection() to handle the iterable cases.

Also, still thinking about whether the comparison operators should be making tight or loose checks.
msg122959 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-12-01 00:01
Daniel, do you have time to work on this one?

If so, go ahead an make setobject.c accept any instance of collections.Set and make the corresponding change to the ABCs:

    def __or__(self, other):
        if not isinstance(other, Set):
            return NotImplemented
        chain = (e for s in (self, other) for e in s)
        return self._from_iterable(chain)

The code in the attached prelim.patch has working C code isinstance(x, collections.Set), but the rest of the patch that applies is has not been tested.  It needs to be applied very carefully and thoughtfully because:
* internally, the self and other can get swapped on a binary call
* we can't make *any* assumptions about "other" (that duplicates have actually been eliminated or the the elements are even hashable).

The most reliable thing to do for the case where PyAnySet(obj) is False but isinstance(obj, collections.Set) is true is to call the named method such as s.union(other) instead of continuing with s.__or__ which was designed only with real sets in mind.
msg122961 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-12-01 00:03
Yes, I can take a stab at it.
msg122962 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-12-01 00:08
No need to rush this for the beta.  It's a bug fix and can go in at any time.  The important thing is that we don't break the C code.  The __ror__ magic method would still need to do the right thing and the C code needs to defend against the interpreter swapping self and other.
msg124000 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-12-15 01:53
Would it be sufficient to:

1) Restrict collections.Set()'s operators to accept collection.Set but not arbitrary iterables, and
2) Fix Issue2226 and let set() | MySimpleSet() work via collections.Set.__ror__

Attached is a patch that implements this approach, nominally fixing both this and Issue2226.

This solutions seems much too simple in light of how long I've been thinking about these bugs.  I suspect there are code hobgoblins waiting to ambush me. ;)
msg138222 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-06-12 22:19
If the code were acting exactly as documented, I would consider this a feature request. But "require that the parameter also be an instance of set()" (from original message) is too limited.

>>> set() | frozenset()
set()

So 'set' in "their operator based counterparts require their arguments to be sets." (doc) seems to be meant to be more generic, in which case 'instance of collections.Set' seems reasonable. To be clear, the doc could be updated to "... sets, frozensets, and other instances of collections.Set."

"Both set and frozenset support set to set comparisons. " This includes comparisons between the two classes.

>>> set() == frozenset()
True

so perhaps comparisons should be extended also.
msg155760 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2012-03-14 16:09
Review of set-with-Set.patch:

Looks good overall. 
I agree that restricting operations to instances of Set rather than Iterable is correct.

Implementing "__rsub__" in terms of - (subtraction) means that infinite recursion is a possibility. It also creates an unnecessary temporary.
Could you just reverse the expression used in __sub__?

Would you add tests for comparisons; Set() == set(), etc.
There are probably tested implicitly in the rest of the test suite, but explicit tests would be good.
msg174333 - (view) Author: Jesús Cea Avión (jcea) * (Python committer) Date: 2012-10-31 17:07
Heads up, Issue #16373.
msg189753 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-05-21 13:35
Armin pointed out in http://lucumr.pocoo.org/2013/5/21/porting-to-python-3-redux/ that one nasty consequence of the remaining part of issue 2226 and this bug is making it much harder than it should be to use the ItemsView, KeysView and ValuesView from collections.abc to implement third party mappings that behave like the builtin dict.
msg207284 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-01-04 14:03
Raymond, will you have a chance to look at this before 3.4rc1? Otherwise I'd like to take it.
msg209957 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-02-02 08:01
I updated the patch to apply cleanly to the default branch. I also added several new test cases which uncovered issues with Daniel's previous patch.

Specifically:

- the reverse functions were not be tested properly (added a separate test to ensure they all return NotImplemented when appropriate)

- the checks in the in-place operands were not being tested, and were also too strict (added tests for their input checking, and also ensured they still accepted arbitrary iterables as input)

I've also reduced the target versions to just 3.4 - this will require a porting note in the What's New, since the inappropriate handling of arbitrary iterables in the ABC methods has been removed, which means that things that previously worked when they shouldn't (like accepting a list as the RHS of a binary set operator) will now throw TypeError.

Python 3.3:
>>> set() | list()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for |: 'set' and 'list'
>>> from test.test_collections import WithSet
>>> WithSet() | list()
<test.test_collections.WithSet object at 0x7f71ff2f6210>

After applying the attached patch:

>>> set() | list()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for |: 'set' and 'list'
>>> from test.test_collections import WithSet
>>> WithSet() | list()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for |: 'WithSet' and 'list'
msg209961 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-02-02 08:12
I initially missed Mark's suggestion above to avoid the recursive subtraction operation in __rsub__. v2 of my patch includes that tweak.
msg209989 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-02-02 14:24
I think set operations with iterable (but not set) should be tested.
msg214088 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-03-19 12:44
This didn't make it into 3.4, and the comment about needing a porting note above still applies, so to 3.5 it goes.
msg214206 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-03-20 11:40
Thanks for the patch update.  I will look at it shortly.
msg219075 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-05-25 08:39
Attaching a draft patch with tests.
msg219091 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-05-25 13:45
Ah, interesting - I completely missed the comparison operators in my patch and tests. Your version looks good to me, though.

That looks like a patch against 2.7 - do you want to add 2.7 & 3.4 back to the list of target versions for the fix?
msg219098 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-05-25 17:51
Adding tests for non-set iterables as suggested by Serhiy.
msg219110 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-05-25 22:58
Added tests that include the pure python sets.Set().  Only the binary or/and/sub/xor methods are tested.   

The comparison operators were designed to only interact with their own kind.  A comment from Tim Peters explains the decision raise a TypeError instead of returning NotImplemented (it has unfortunate interactions with cmp()).  At any rate, nothing good would come from changing that design decision now, so I'm leaving it alone to fade peacefully into oblivion.
msg219130 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-05-26 05:13
New changeset 3615cdb3b86d by Raymond Hettinger in branch '2.7':
Issue 8743:  Improve interoperability between sets and the collections.Set abstract base class.
http://hg.python.org/cpython/rev/3615cdb3b86d
msg219139 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-05-26 07:14
New changeset cd8b5b5b6356 by Raymond Hettinger in branch '3.4':
Issue 8743: Improve interoperability between sets and the collections.Set abstract base class.
http://hg.python.org/cpython/rev/cd8b5b5b6356
History
Date User Action Args
2014-05-31 16:04:05serhiy.storchakalinkissue21620 superseder
2014-05-26 08:02:33rhettingersetstatus: open -> closed
resolution: fixed
2014-05-26 07:14:22python-devsetmessages: + msg219139
2014-05-26 05:13:49python-devsetnosy: + python-dev
messages: + msg219130
2014-05-25 22:58:32rhettingersetfiles: + fix_set_abc3.diff

messages: + msg219110
2014-05-25 17:51:59rhettingersetfiles: + fix_set_abc2.diff

messages: + msg219098
2014-05-25 17:29:12rhettingersetversions: + Python 2.7, Python 3.4
2014-05-25 13:45:08ncoghlansetmessages: + msg219091
2014-05-25 08:39:16rhettingersetfiles: + fix_set_abc.diff

messages: + msg219075
2014-03-20 11:40:21rhettingersetmessages: + msg214206
2014-03-19 12:44:39ncoghlansetpriority: high -> normal

messages: + msg214088
versions: + Python 3.5, - Python 3.4
2014-02-02 14:24:06serhiy.storchakasetmessages: + msg209989
2014-02-02 08:12:57ncoghlansetfiles: + issue8743-set-ABC-interoperability_v2.diff

messages: + msg209961
2014-02-02 08:06:49ncoghlanlinkissue2226 superseder
2014-02-02 08:01:31ncoghlansetfiles: + issue8743-set-ABC-interoperability.diff

messages: + msg209957
versions: - Python 2.7, Python 3.2, Python 3.3
2014-01-31 22:58:23yselivanovsetnosy: + yselivanov
2014-01-04 14:03:23ncoghlansetmessages: + msg207284
2013-11-27 09:10:22mdenglersetnosy: + mdengler
2013-05-21 13:35:47ncoghlansetmessages: + msg189753
2013-05-20 14:28:43ncoghlansetnosy: + ncoghlan
2012-11-01 05:08:15rhettingersetassignee: stutzbach -> rhettinger
2012-10-31 21:50:35asvetlovsetnosy: + asvetlov
2012-10-31 17:07:49jceasetmessages: + msg174333
2012-10-31 16:50:50jceasetnosy: + jcea
2012-10-31 15:57:11serhiy.storchakasetnosy: + serhiy.storchaka

versions: + Python 3.4
2012-03-14 16:09:30Mark.Shannonsetnosy: + Mark.Shannon
messages: + msg155760
2011-08-19 01:33:19meador.ingesetresolution: accepted -> (no value)
stage: needs patch -> patch review
2011-06-26 18:51:34terry.reedysetversions: + Python 3.3, - Python 3.1
2011-06-12 22:19:52terry.reedysetnosy: + terry.reedy
messages: + msg138222
2011-01-26 17:56:01dstaneksetnosy: + dstanek
2010-12-15 01:53:13stutzbachsetfiles: + set-with-Set.patch
nosy: rhettinger, stutzbach, daniel.urban, ysj.ray
messages: + msg124000
2010-12-01 17:01:32daniel.urbansetnosy: + daniel.urban
2010-12-01 00:08:19rhettingersetmessages: + msg122962
2010-12-01 00:03:59stutzbachsetmessages: + msg122961
2010-12-01 00:01:19rhettingersetassignee: rhettinger -> stutzbach
messages: + msg122959
2010-11-29 21:00:32rhettingersetmessages: - msg117005
2010-09-20 23:36:59rhettingersetmessages: + msg117005
2010-09-20 23:34:23rhettingersetmessages: - msg116998
2010-09-20 23:02:58rhettingersetmessages: + msg116998
2010-09-02 09:04:20rhettingersetfiles: + prelim.patch
keywords: + patch
messages: + msg115363
2010-09-02 02:51:48stutzbachsetresolution: accepted
messages: + msg115357
versions: + Python 3.1
2010-09-02 00:07:28rhettingersetmessages: + msg115344
stage: test needed -> needs patch
2010-09-01 12:27:56stutzbachsetmessages: + msg115291
2010-08-22 16:57:05gvanrossumsetnosy: - gvanrossum
2010-08-22 08:13:44rhettingersetpriority: normal -> high
2010-08-02 14:33:21ysj.raysetnosy: + ysj.ray
messages: + msg112480
2010-08-02 00:44:41rhettingersetassignee: rhettinger
2010-08-01 23:33:30gvanrossumsetassignee: gvanrossum -> (no value)
messages: + msg112404
2010-08-01 17:04:37rhettingersetassignee: rhettinger -> gvanrossum

messages: + msg112359
nosy: + gvanrossum
2010-08-01 06:22:16georg.brandlsetassignee: rhettinger

nosy: + rhettinger
2010-05-17 19:01:17stutzbachsetmessages: + msg105930
2010-05-17 18:50:44stutzbachcreate