New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
set() operators don't work with collections.Set instances #52989
Comments
The set() operators (or, __and__, __sub__, __xor__, and their in-place counterparts) require that the parameter also be an instance of set(). They're documented that way: "This precludes error-prone constructions like set('abc') & 'cbs' in favor of the more readable set('abc').intersection('cbs')." However, an unintended consequence of this behavior is that they don't inter-operate with user-created types that derive from collections.Set. That leads to oddities like this: MySimpleSet() | set() # This works (MySimpleSet is a minimal class derived from collections.Set for illustrative purposes -- set attached file) collections.Set's operators accept any iterable. I'm not 100% certain what the correct behavior should be. Perhaps set's operators should be a bit more liberal and accept any collections.Set instance, while collections.Set's operators should be a bit more conservative. Perhaps not. It's a little subjective. It seems to me that at minimum set() and collections.Set() should inter-operate and have the same behavior. |
I should add: I discovered the inconsistency while working on my sortedset class, which provides the same interface as set() but is also indexable like a list (e.g., S[0] always returns the minimum element, S[-1] returns the maximum element, etc.). sortedset derives from collections.MutableSet, but it's challenging to precisely emulate set() when collections.MutableSet and set() don't work the same way. ;-) |
Guido, do you have a recommendation? |
No idea, I don't even know what collections.Set is. :-( |
In my opinion, the set's operator should be a bit more liberal and accept any collections.Set instances. Given collections.Set is an ABC and isinstance(set, collections.Set) is True, the set methods should(strong recommended) follow all the generalized abstract semantic definition in the ABC. This according to PEP-3119: The collections.Set defines __or__() as this (for example): |
Raymond, do you agree with Ray's analysis? |
The operator methods in setobject.c should be liberalized to accept instances of collections.Set as arguments. For speed, they should continue to check PyAnySet_Check(other) first and then if that fails, fall back to testing PyObject_IsInstance(other, collections.Set). Internally, the set methods will still need to process "other" as just an iterable container because it cannot rely on elements in "other" as being hashable (for example, the ListBasedSet in the docs does not require hashability) or unique (as perceived by setobject.c it may not work with some set implementing a key-function for an equivalence class whose key-function would be unknown to setobject.c which relies on __hash__ and __eq__). To implement PyObject_IsInstance(other, collections.Set), there may be a bootstrap issue (with the C code being compiled and runnable before _abcoll.py is able to create the Set ABC). If so, it may be necessary to create an internal _BaseSet object in setobject.c that can be used in collections.Set. Alternatively, the code in setobject.c can lazily (at runtime) lookup collections.Set by name and cache it so that we only do one successful lookup per session. Whatever approach is taken, it should be done with an eye towards the larger problem that Python is filled with concrete isinstance() checks that pre-date ABCs and many of those need to be liberalized (accepting a registered ABC and providing different execution paths for known and unknown concrete types). |
Under this plan, set() and collections.Set will still have slightly different behavior. collections.Set will be more liberal and accept any iterable. Are you okay with that? I don't feel strongly about this point; I just want to make sure it's a conscious decision. I do feel strongly that set and collections.Set should be able to inter-operate nicely and the proposal satisfies that requirement so I would be happy with it.
I favor the lazy lookup approach.
Agreed. Ideally, the "PyObject_IsInstance(other, collections.Set)" logic would be abstracted out as much as possible so other parts of Python can make similar checks without needing tons of boilerplate code in every spot. For what it's worth, I don't think we will find as many inconsistency issues with ABCs other than Set. Set has methods that take another Set and return a third Set. That forces different concrete implementations of the Set ABC to interact in a way that won't come up for a Sequence or Mapping. (I suppose that Sequence.extend or MutableMapping.update are somewhat similar, but list.extend and dict.update are already very liberal in what they accept as a parameter.) |
Rough cut at a first patch is attached. Still thinking about whether Set operations should be accepting any iterable or whether they should be tightened to expect other Set instances. The API for set() came from set.py which was broadly discussed and widely exercised. Guido was insistent that non-sets be excluded from the operator interactions (list.__iadd__ being on his list of regrets). That was probably a good decision, but the Set API violated this norm and it did not include named methods like difference(), update(), and intersection() to handle the iterable cases. Also, still thinking about whether the comparison operators should be making tight or loose checks. |
Daniel, do you have time to work on this one? If so, go ahead an make setobject.c accept any instance of collections.Set and make the corresponding change to the ABCs: def __or__(self, other):
if not isinstance(other, Set):
return NotImplemented
chain = (e for s in (self, other) for e in s)
return self._from_iterable(chain) The code in the attached prelim.patch has working C code isinstance(x, collections.Set), but the rest of the patch that applies is has not been tested. It needs to be applied very carefully and thoughtfully because:
The most reliable thing to do for the case where PyAnySet(obj) is False but isinstance(obj, collections.Set) is true is to call the named method such as s.union(other) instead of continuing with s.__or__ which was designed only with real sets in mind. |
Yes, I can take a stab at it. |
No need to rush this for the beta. It's a bug fix and can go in at any time. The important thing is that we don't break the C code. The __ror__ magic method would still need to do the right thing and the C code needs to defend against the interpreter swapping self and other. |
Would it be sufficient to:
Attached is a patch that implements this approach, nominally fixing both this and bpo-2226. This solutions seems much too simple in light of how long I've been thinking about these bugs. I suspect there are code hobgoblins waiting to ambush me. ;) |
If the code were acting exactly as documented, I would consider this a feature request. But "require that the parameter also be an instance of set()" (from original message) is too limited. >>> set() | frozenset()
set() So 'set' in "their operator based counterparts require their arguments to be sets." (doc) seems to be meant to be more generic, in which case 'instance of collections.Set' seems reasonable. To be clear, the doc could be updated to "... sets, frozensets, and other instances of collections.Set." "Both set and frozenset support set to set comparisons. " This includes comparisons between the two classes. >>> set() == frozenset()
True so perhaps comparisons should be extended also. |
Review of set-with-Set.patch: Looks good overall. Implementing "__rsub__" in terms of - (subtraction) means that infinite recursion is a possibility. It also creates an unnecessary temporary. Would you add tests for comparisons; Set() == set(), etc. |
Heads up, Issue bpo-16373. |
Armin pointed out in http://lucumr.pocoo.org/2013/5/21/porting-to-python-3-redux/ that one nasty consequence of the remaining part of bpo-2226 and this bug is making it much harder than it should be to use the ItemsView, KeysView and ValuesView from collections.abc to implement third party mappings that behave like the builtin dict. |
Raymond, will you have a chance to look at this before 3.4rc1? Otherwise I'd like to take it. |
I updated the patch to apply cleanly to the default branch. I also added several new test cases which uncovered issues with Daniel's previous patch. Specifically:
I've also reduced the target versions to just 3.4 - this will require a porting note in the What's New, since the inappropriate handling of arbitrary iterables in the ABC methods has been removed, which means that things that previously worked when they shouldn't (like accepting a list as the RHS of a binary set operator) will now throw TypeError. Python 3.3:
>>> set() | list()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for |: 'set' and 'list'
>>> from test.test_collections import WithSet
>>> WithSet() | list()
<test.test_collections.WithSet object at 0x7f71ff2f6210> After applying the attached patch: >>> set() | list()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for |: 'set' and 'list'
>>> from test.test_collections import WithSet
>>> WithSet() | list()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for |: 'WithSet' and 'list' |
I initially missed Mark's suggestion above to avoid the recursive subtraction operation in __rsub__. v2 of my patch includes that tweak. |
I think set operations with iterable (but not set) should be tested. |
This didn't make it into 3.4, and the comment about needing a porting note above still applies, so to 3.5 it goes. |
Thanks for the patch update. I will look at it shortly. |
Attaching a draft patch with tests. |
Ah, interesting - I completely missed the comparison operators in my patch and tests. Your version looks good to me, though. That looks like a patch against 2.7 - do you want to add 2.7 & 3.4 back to the list of target versions for the fix? |
Adding tests for non-set iterables as suggested by Serhiy. |
Added tests that include the pure python sets.Set(). Only the binary or/and/sub/xor methods are tested. The comparison operators were designed to only interact with their own kind. A comment from Tim Peters explains the decision raise a TypeError instead of returning NotImplemented (it has unfortunate interactions with cmp()). At any rate, nothing good would come from changing that design decision now, so I'm leaving it alone to fade peacefully into oblivion. |
New changeset 3615cdb3b86d by Raymond Hettinger in branch '2.7': |
New changeset cd8b5b5b6356 by Raymond Hettinger in branch '3.4': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: