|
msg105929 - (view) |
Author: Daniel Stutzbach (stutzbach)  |
Date: 2010-05-17 18:50 |
The set() operators (__or__, __and__, __sub__, __xor__, and their in-place counterparts) require that the parameter also be an instance of set().
They're documented that way: "This precludes error-prone constructions like set('abc') & 'cbs' in favor of the more readable set('abc').intersection('cbs')."
However, an unintended consequence of this behavior is that they don't inter-operate with user-created types that derive from collections.Set.
That leads to oddities like this:
MySimpleSet() | set() # This works
set() | MySimpleSet() # Raises TypeError
(MySimpleSet is a minimal class derived from collections.Set for illustrative purposes -- set attached file)
collections.Set's operators accept any iterable.
I'm not 100% certain what the correct behavior should be. Perhaps set's operators should be a bit more liberal and accept any collections.Set instance, while collections.Set's operators should be a bit more conservative. Perhaps not. It's a little subjective.
It seems to me that at minimum set() and collections.Set() should inter-operate and have the same behavior.
|
|
msg105930 - (view) |
Author: Daniel Stutzbach (stutzbach)  |
Date: 2010-05-17 19:01 |
I should add:
I discovered the inconsistency while working on my sortedset class, which provides the same interface as set() but is also indexable like a list (e.g., S[0] always returns the minimum element, S[-1] returns the maximum element, etc.).
sortedset derives from collections.MutableSet, but it's challenging to precisely emulate set() when collections.MutableSet and set() don't work the same way. ;-)
|
|
msg112359 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2010-08-01 17:04 |
Guido, do you have a recommendation?
|
|
msg112404 - (view) |
Author: Guido van Rossum (gvanrossum) *  |
Date: 2010-08-01 23:33 |
No idea, I don't even know what collections.Set is. :-(
|
|
msg112480 - (view) |
Author: ysj.ray (ysj.ray) |
Date: 2010-08-02 14:33 |
In my opinion, the set's operator should be a bit more liberal and accept any collections.Set instances. Given collections.Set is an ABC and isinstance(set, collections.Set) is True, the set methods should(strong recommended) follow all the generalized abstract semantic definition in the ABC. This according to PEP 3119:
"""
In addition, the ABCs define a minimal set of methods that establish the characteristic behavior of the type. Code that discriminates objects based on their ABC type can trust that those methods will always be present. Each of these methods are accompanied by an generalized abstract semantic definition that is described in the documentation for the ABC. These standard semantic definitions are not enforced, but are strongly recommended.
"""
The collections.Set defines __or__() as this (for example):
"""
def __or__(self, other):
if not isinstance(other, Iterable):
return NotImplemented
chain = (e for s in (self, other) for e in s)
return self._from_iterable(chain)
"""
which means the "|" operator should accept all iterable. So I think it's better to make set's methods should be more liberal.
|
|
msg115291 - (view) |
Author: Daniel Stutzbach (stutzbach)  |
Date: 2010-09-01 12:27 |
Raymond, do you agree with Ray's analysis?
|
|
msg115344 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2010-09-02 00:07 |
The operator methods in setobject.c should be liberalized to accept instances of collections.Set as arguments. For speed, they should continue to check PyAnySet_Check(other) first and then if that fails, fall back to testing PyObject_IsInstance(other, collections.Set).
Internally, the set methods will still need to process "other" as just an iterable container because it cannot rely on elements in "other" as being hashable (for example, the ListBasedSet in the docs does not require hashability) or unique (as perceived by setobject.c it may not work with some set implementing a key-function for an equivalence class whose key-function would be unknown to setobject.c which relies on __hash__ and __eq__).
To implement PyObject_IsInstance(other, collections.Set), there may be a bootstrap issue (with the C code being compiled and runnable before _abcoll.py is able to create the Set ABC). If so, it may be necessary to create an internal _BaseSet object in setobject.c that can be used in collections.Set. Alternatively, the code in setobject.c can lazily (at runtime) lookup collections.Set by name and cache it so that we only do one successful lookup per session.
Whatever approach is taken, it should be done with an eye towards the larger problem that Python is filled with concrete isinstance() checks that pre-date ABCs and many of those need to be liberalized (accepting a registered ABC and providing different execution paths for known and unknown concrete types).
|
|
msg115357 - (view) |
Author: Daniel Stutzbach (stutzbach)  |
Date: 2010-09-02 02:51 |
> The operator methods in setobject.c should be liberalized to accept
> instances of collections.Set as arguments.
Under this plan, set() and collections.Set will still have slightly different behavior. collections.Set will be more liberal and accept any iterable. Are you okay with that? I don't feel strongly about this point; I just want to make sure it's a conscious decision.
I do feel strongly that set and collections.Set should be able to inter-operate nicely and the proposal satisfies that requirement so I would be happy with it.
> To implement PyObject_IsInstance(other, collections.Set), there may
> be a bootstrap issue (with the C code being compiled and runnable
> before _abcoll.py is able to create the Set ABC). Alternatively,
> the code in setobject.c can lazily (at runtime) lookup
> collections.Set by name and cache it so that we only do one
> successful lookup per session.
I favor the lazy lookup approach.
> Whatever approach is taken, it should be done with an eye towards
> the larger problem that Python is filled with concrete isinstance()
> checks that pre-date ABCs and many of those need to be liberalized
> (accepting a registered ABC and providing different execution paths
> for known and unknown concrete types).
Agreed. Ideally, the "PyObject_IsInstance(other, collections.Set)" logic would be abstracted out as much as possible so other parts of Python can make similar checks without needing tons of boilerplate code in every spot.
For what it's worth, I don't think we will find as many inconsistency issues with ABCs other than Set. Set has methods that take another Set and return a third Set. That forces different concrete implementations of the Set ABC to interact in a way that won't come up for a Sequence or Mapping.
(I suppose that Sequence.extend or MutableMapping.update are somewhat similar, but list.extend and dict.update are already very liberal in what they accept as a parameter.)
|
|
msg115363 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2010-09-02 09:04 |
Rough cut at a first patch is attached.
Still thinking about whether Set operations should be accepting any iterable or whether they should be tightened to expect other Set instances. The API for set() came from set.py which was broadly discussed and widely exercised. Guido was insistent that non-sets be excluded from the operator interactions (list.__iadd__ being on his list of regrets). That was probably a good decision, but the Set API violated this norm and it did not include named methods like difference(), update(), and intersection() to handle the iterable cases.
Also, still thinking about whether the comparison operators should be making tight or loose checks.
|
|
msg122959 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2010-12-01 00:01 |
Daniel, do you have time to work on this one?
If so, go ahead an make setobject.c accept any instance of collections.Set and make the corresponding change to the ABCs:
def __or__(self, other):
if not isinstance(other, Set):
return NotImplemented
chain = (e for s in (self, other) for e in s)
return self._from_iterable(chain)
The code in the attached prelim.patch has working C code isinstance(x, collections.Set), but the rest of the patch that applies is has not been tested. It needs to be applied very carefully and thoughtfully because:
* internally, the self and other can get swapped on a binary call
* we can't make *any* assumptions about "other" (that duplicates have actually been eliminated or the the elements are even hashable).
The most reliable thing to do for the case where PyAnySet(obj) is False but isinstance(obj, collections.Set) is true is to call the named method such as s.union(other) instead of continuing with s.__or__ which was designed only with real sets in mind.
|
|
msg122961 - (view) |
Author: Daniel Stutzbach (stutzbach)  |
Date: 2010-12-01 00:03 |
Yes, I can take a stab at it.
|
|
msg122962 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2010-12-01 00:08 |
No need to rush this for the beta. It's a bug fix and can go in at any time. The important thing is that we don't break the C code. The __ror__ magic method would still need to do the right thing and the C code needs to defend against the interpreter swapping self and other.
|
|
msg124000 - (view) |
Author: Daniel Stutzbach (stutzbach)  |
Date: 2010-12-15 01:53 |
Would it be sufficient to:
1) Restrict collections.Set()'s operators to accept collection.Set but not arbitrary iterables, and
2) Fix Issue2226 and let set() | MySimpleSet() work via collections.Set.__ror__
Attached is a patch that implements this approach, nominally fixing both this and Issue2226.
This solutions seems much too simple in light of how long I've been thinking about these bugs. I suspect there are code hobgoblins waiting to ambush me. ;)
|
|
msg138222 - (view) |
Author: Terry J. Reedy (terry.reedy) *  |
Date: 2011-06-12 22:19 |
If the code were acting exactly as documented, I would consider this a feature request. But "require that the parameter also be an instance of set()" (from original message) is too limited.
>>> set() | frozenset()
set()
So 'set' in "their operator based counterparts require their arguments to be sets." (doc) seems to be meant to be more generic, in which case 'instance of collections.Set' seems reasonable. To be clear, the doc could be updated to "... sets, frozensets, and other instances of collections.Set."
"Both set and frozenset support set to set comparisons. " This includes comparisons between the two classes.
>>> set() == frozenset()
True
so perhaps comparisons should be extended also.
|
|
msg155760 - (view) |
Author: Mark Shannon (Mark.Shannon) * |
Date: 2012-03-14 16:09 |
Review of set-with-Set.patch:
Looks good overall.
I agree that restricting operations to instances of Set rather than Iterable is correct.
Implementing "__rsub__" in terms of - (subtraction) means that infinite recursion is a possibility. It also creates an unnecessary temporary.
Could you just reverse the expression used in __sub__?
Would you add tests for comparisons; Set() == set(), etc.
There are probably tested implicitly in the rest of the test suite, but explicit tests would be good.
|
|
msg174333 - (view) |
Author: Jesús Cea Avión (jcea) *  |
Date: 2012-10-31 17:07 |
Heads up, Issue #16373.
|
|
msg189753 - (view) |
Author: Nick Coghlan (ncoghlan) *  |
Date: 2013-05-21 13:35 |
Armin pointed out in http://lucumr.pocoo.org/2013/5/21/porting-to-python-3-redux/ that one nasty consequence of the remaining part of issue 2226 and this bug is making it much harder than it should be to use the ItemsView, KeysView and ValuesView from collections.abc to implement third party mappings that behave like the builtin dict.
|
|
| Date |
User |
Action |
Args |
| 2013-05-21 13:35:47 | ncoghlan | set | messages:
+ msg189753 |
| 2013-05-20 14:28:43 | ncoghlan | set | nosy:
+ ncoghlan
|
| 2012-11-01 05:08:15 | rhettinger | set | assignee: stutzbach -> rhettinger |
| 2012-10-31 21:50:35 | asvetlov | set | nosy:
+ asvetlov
|
| 2012-10-31 17:07:49 | jcea | set | messages:
+ msg174333 |
| 2012-10-31 16:50:50 | jcea | set | nosy:
+ jcea
|
| 2012-10-31 15:57:11 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka
versions:
+ Python 3.4 |
| 2012-03-14 16:09:30 | Mark.Shannon | set | nosy:
+ Mark.Shannon messages:
+ msg155760
|
| 2011-08-19 01:33:19 | meador.inge | set | resolution: accepted -> stage: needs patch -> patch review |
| 2011-06-26 18:51:34 | terry.reedy | set | versions:
+ Python 3.3, - Python 3.1 |
| 2011-06-12 22:19:52 | terry.reedy | set | nosy:
+ terry.reedy messages:
+ msg138222
|
| 2011-01-26 17:56:01 | dstanek | set | nosy:
+ dstanek
|
| 2010-12-15 01:53:13 | stutzbach | set | files:
+ set-with-Set.patch nosy:
rhettinger, stutzbach, daniel.urban, ysj.ray messages:
+ msg124000
|
| 2010-12-01 17:01:32 | daniel.urban | set | nosy:
+ daniel.urban
|
| 2010-12-01 00:08:19 | rhettinger | set | messages:
+ msg122962 |
| 2010-12-01 00:03:59 | stutzbach | set | messages:
+ msg122961 |
| 2010-12-01 00:01:19 | rhettinger | set | assignee: rhettinger -> stutzbach messages:
+ msg122959 |
| 2010-11-29 21:00:32 | rhettinger | set | messages:
- msg117005 |
| 2010-09-20 23:36:59 | rhettinger | set | messages:
+ msg117005 |
| 2010-09-20 23:34:23 | rhettinger | set | messages:
- msg116998 |
| 2010-09-20 23:02:58 | rhettinger | set | messages:
+ msg116998 |
| 2010-09-02 09:04:20 | rhettinger | set | files:
+ prelim.patch keywords:
+ patch messages:
+ msg115363
|
| 2010-09-02 02:51:48 | stutzbach | set | resolution: accepted messages:
+ msg115357 versions:
+ Python 3.1 |
| 2010-09-02 00:07:28 | rhettinger | set | messages:
+ msg115344 stage: test needed -> needs patch |
| 2010-09-01 12:27:56 | stutzbach | set | messages:
+ msg115291 |
| 2010-08-22 16:57:05 | gvanrossum | set | nosy:
- gvanrossum
|
| 2010-08-22 08:13:44 | rhettinger | set | priority: normal -> high |
| 2010-08-02 14:33:21 | ysj.ray | set | nosy:
+ ysj.ray messages:
+ msg112480
|
| 2010-08-02 00:44:41 | rhettinger | set | assignee: rhettinger |
| 2010-08-01 23:33:30 | gvanrossum | set | assignee: gvanrossum -> (no value) messages:
+ msg112404 |
| 2010-08-01 17:04:37 | rhettinger | set | assignee: rhettinger -> gvanrossum
messages:
+ msg112359 nosy:
+ gvanrossum |
| 2010-08-01 06:22:16 | georg.brandl | set | assignee: rhettinger
nosy:
+ rhettinger |
| 2010-05-17 19:01:17 | stutzbach | set | messages:
+ msg105930 |
| 2010-05-17 18:50:44 | stutzbach | create | |