Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set() operators don't work with collections.Set instances #52989

Closed
stutzbach mannequin opened this issue May 17, 2010 · 29 comments
Closed

set() operators don't work with collections.Set instances #52989

stutzbach mannequin opened this issue May 17, 2010 · 29 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@stutzbach
Copy link
Mannequin

stutzbach mannequin commented May 17, 2010

BPO 8743
Nosy @rhettinger, @terryjreedy, @jcea, @ncoghlan, @asvetlov, @durban, @markshannon, @serhiy-storchaka, @1st1
Files
  • set-vs-set.py: Minimal example
  • prelim.patch: First draft
  • set-with-Set.patch
  • issue8743-set-ABC-interoperability.diff: Additional tests and updated for 3.4
  • issue8743-set-ABC-interoperability_v2.diff: Also avoid infinite recursion risk in Set.rsub
  • fix_set_abc.diff: Fix set abc and set for cross-type comparisons
  • fix_set_abc2.diff: Add non-set iterables tests
  • fix_set_abc3.diff: Add tests for sets.Set
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/rhettinger'
    closed_at = <Date 2014-05-26.08:02:33.540>
    created_at = <Date 2010-05-17.18:50:44.562>
    labels = ['interpreter-core', 'type-bug']
    title = "set() operators don't work with collections.Set instances"
    updated_at = <Date 2014-05-26.08:02:33.539>
    user = 'https://bugs.python.org/stutzbach'

    bugs.python.org fields:

    activity = <Date 2014-05-26.08:02:33.539>
    actor = 'rhettinger'
    assignee = 'rhettinger'
    closed = True
    closed_date = <Date 2014-05-26.08:02:33.540>
    closer = 'rhettinger'
    components = ['Interpreter Core']
    creation = <Date 2010-05-17.18:50:44.562>
    creator = 'stutzbach'
    dependencies = []
    files = ['17383', '18708', '20045', '33863', '33864', '35346', '35353', '35357']
    hgrepos = []
    issue_num = 8743
    keywords = ['patch']
    message_count = 29.0
    messages = ['105929', '105930', '112359', '112404', '112480', '115291', '115344', '115357', '115363', '122959', '122961', '122962', '124000', '138222', '155760', '174333', '189753', '207284', '209957', '209961', '209989', '214088', '214206', '219075', '219091', '219098', '219110', '219130', '219139']
    nosy_count = 14.0
    nosy_names = ['rhettinger', 'terry.reedy', 'jcea', 'ncoghlan', 'dstanek', 'stutzbach', 'asvetlov', 'daniel.urban', 'ysj.ray', 'Mark.Shannon', 'python-dev', 'serhiy.storchaka', 'mdengler', 'yselivanov']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'patch review'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue8743'
    versions = ['Python 2.7', 'Python 3.4', 'Python 3.5']

    @stutzbach
    Copy link
    Mannequin Author

    stutzbach mannequin commented May 17, 2010

    The set() operators (or, __and__, __sub__, __xor__, and their in-place counterparts) require that the parameter also be an instance of set().

    They're documented that way: "This precludes error-prone constructions like set('abc') & 'cbs' in favor of the more readable set('abc').intersection('cbs')."

    However, an unintended consequence of this behavior is that they don't inter-operate with user-created types that derive from collections.Set.

    That leads to oddities like this:

    MySimpleSet() | set() # This works
    set() | MySimpleSet() # Raises TypeError

    (MySimpleSet is a minimal class derived from collections.Set for illustrative purposes -- set attached file)

    collections.Set's operators accept any iterable.

    I'm not 100% certain what the correct behavior should be. Perhaps set's operators should be a bit more liberal and accept any collections.Set instance, while collections.Set's operators should be a bit more conservative. Perhaps not. It's a little subjective.

    It seems to me that at minimum set() and collections.Set() should inter-operate and have the same behavior.

    @stutzbach stutzbach mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels May 17, 2010
    @stutzbach
    Copy link
    Mannequin Author

    stutzbach mannequin commented May 17, 2010

    I should add:

    I discovered the inconsistency while working on my sortedset class, which provides the same interface as set() but is also indexable like a list (e.g., S[0] always returns the minimum element, S[-1] returns the maximum element, etc.).

    sortedset derives from collections.MutableSet, but it's challenging to precisely emulate set() when collections.MutableSet and set() don't work the same way. ;-)

    @rhettinger
    Copy link
    Contributor

    Guido, do you have a recommendation?

    @rhettinger rhettinger assigned gvanrossum and unassigned rhettinger Aug 1, 2010
    @gvanrossum
    Copy link
    Member

    No idea, I don't even know what collections.Set is. :-(

    @gvanrossum gvanrossum removed their assignment Aug 1, 2010
    @rhettinger rhettinger self-assigned this Aug 2, 2010
    @ysjray
    Copy link
    Mannequin

    ysjray mannequin commented Aug 2, 2010

    In my opinion, the set's operator should be a bit more liberal and accept any collections.Set instances. Given collections.Set is an ABC and isinstance(set, collections.Set) is True, the set methods should(strong recommended) follow all the generalized abstract semantic definition in the ABC. This according to PEP-3119:
    """
    In addition, the ABCs define a minimal set of methods that establish the characteristic behavior of the type. Code that discriminates objects based on their ABC type can trust that those methods will always be present. Each of these methods are accompanied by an generalized abstract semantic definition that is described in the documentation for the ABC. These standard semantic definitions are not enforced, but are strongly recommended.
    """

    The collections.Set defines __or__() as this (for example):
    """
    def __or__(self, other):
    if not isinstance(other, Iterable):
    return NotImplemented
    chain = (e for s in (self, other) for e in s)
    return self._from_iterable(chain)
    """
    which means the "|" operator should accept all iterable. So I think it's better to make set's methods should be more liberal.

    @stutzbach
    Copy link
    Mannequin Author

    stutzbach mannequin commented Sep 1, 2010

    Raymond, do you agree with Ray's analysis?

    @rhettinger
    Copy link
    Contributor

    The operator methods in setobject.c should be liberalized to accept instances of collections.Set as arguments. For speed, they should continue to check PyAnySet_Check(other) first and then if that fails, fall back to testing PyObject_IsInstance(other, collections.Set).

    Internally, the set methods will still need to process "other" as just an iterable container because it cannot rely on elements in "other" as being hashable (for example, the ListBasedSet in the docs does not require hashability) or unique (as perceived by setobject.c it may not work with some set implementing a key-function for an equivalence class whose key-function would be unknown to setobject.c which relies on __hash__ and __eq__).

    To implement PyObject_IsInstance(other, collections.Set), there may be a bootstrap issue (with the C code being compiled and runnable before _abcoll.py is able to create the Set ABC). If so, it may be necessary to create an internal _BaseSet object in setobject.c that can be used in collections.Set. Alternatively, the code in setobject.c can lazily (at runtime) lookup collections.Set by name and cache it so that we only do one successful lookup per session.

    Whatever approach is taken, it should be done with an eye towards the larger problem that Python is filled with concrete isinstance() checks that pre-date ABCs and many of those need to be liberalized (accepting a registered ABC and providing different execution paths for known and unknown concrete types).

    @stutzbach
    Copy link
    Mannequin Author

    stutzbach mannequin commented Sep 2, 2010

    The operator methods in setobject.c should be liberalized to accept
    instances of collections.Set as arguments.

    Under this plan, set() and collections.Set will still have slightly different behavior. collections.Set will be more liberal and accept any iterable. Are you okay with that? I don't feel strongly about this point; I just want to make sure it's a conscious decision.

    I do feel strongly that set and collections.Set should be able to inter-operate nicely and the proposal satisfies that requirement so I would be happy with it.

    To implement PyObject_IsInstance(other, collections.Set), there may
    be a bootstrap issue (with the C code being compiled and runnable
    before _abcoll.py is able to create the Set ABC). Alternatively,
    the code in setobject.c can lazily (at runtime) lookup
    collections.Set by name and cache it so that we only do one
    successful lookup per session.

    I favor the lazy lookup approach.

    Whatever approach is taken, it should be done with an eye towards
    the larger problem that Python is filled with concrete isinstance()
    checks that pre-date ABCs and many of those need to be liberalized
    (accepting a registered ABC and providing different execution paths
    for known and unknown concrete types).

    Agreed. Ideally, the "PyObject_IsInstance(other, collections.Set)" logic would be abstracted out as much as possible so other parts of Python can make similar checks without needing tons of boilerplate code in every spot.

    For what it's worth, I don't think we will find as many inconsistency issues with ABCs other than Set. Set has methods that take another Set and return a third Set. That forces different concrete implementations of the Set ABC to interact in a way that won't come up for a Sequence or Mapping.

    (I suppose that Sequence.extend or MutableMapping.update are somewhat similar, but list.extend and dict.update are already very liberal in what they accept as a parameter.)

    @rhettinger
    Copy link
    Contributor

    Rough cut at a first patch is attached.

    Still thinking about whether Set operations should be accepting any iterable or whether they should be tightened to expect other Set instances. The API for set() came from set.py which was broadly discussed and widely exercised. Guido was insistent that non-sets be excluded from the operator interactions (list.__iadd__ being on his list of regrets). That was probably a good decision, but the Set API violated this norm and it did not include named methods like difference(), update(), and intersection() to handle the iterable cases.

    Also, still thinking about whether the comparison operators should be making tight or loose checks.

    @rhettinger
    Copy link
    Contributor

    Daniel, do you have time to work on this one?

    If so, go ahead an make setobject.c accept any instance of collections.Set and make the corresponding change to the ABCs:

        def __or__(self, other):
            if not isinstance(other, Set):
                return NotImplemented
            chain = (e for s in (self, other) for e in s)
            return self._from_iterable(chain)

    The code in the attached prelim.patch has working C code isinstance(x, collections.Set), but the rest of the patch that applies is has not been tested. It needs to be applied very carefully and thoughtfully because:

    • internally, the self and other can get swapped on a binary call
    • we can't make *any* assumptions about "other" (that duplicates have actually been eliminated or the the elements are even hashable).

    The most reliable thing to do for the case where PyAnySet(obj) is False but isinstance(obj, collections.Set) is true is to call the named method such as s.union(other) instead of continuing with s.__or__ which was designed only with real sets in mind.

    @rhettinger rhettinger assigned stutzbach and unassigned rhettinger Dec 1, 2010
    @stutzbach
    Copy link
    Mannequin Author

    stutzbach mannequin commented Dec 1, 2010

    Yes, I can take a stab at it.

    @rhettinger
    Copy link
    Contributor

    No need to rush this for the beta. It's a bug fix and can go in at any time. The important thing is that we don't break the C code. The __ror__ magic method would still need to do the right thing and the C code needs to defend against the interpreter swapping self and other.

    @stutzbach
    Copy link
    Mannequin Author

    stutzbach mannequin commented Dec 15, 2010

    Would it be sufficient to:

    1. Restrict collections.Set()'s operators to accept collection.Set but not arbitrary iterables, and
    2. Fix bpo-2226 and let set() | MySimpleSet() work via collections.Set.__ror__

    Attached is a patch that implements this approach, nominally fixing both this and bpo-2226.

    This solutions seems much too simple in light of how long I've been thinking about these bugs. I suspect there are code hobgoblins waiting to ambush me. ;)

    @terryjreedy
    Copy link
    Member

    If the code were acting exactly as documented, I would consider this a feature request. But "require that the parameter also be an instance of set()" (from original message) is too limited.

    >>> set() | frozenset()
    set()

    So 'set' in "their operator based counterparts require their arguments to be sets." (doc) seems to be meant to be more generic, in which case 'instance of collections.Set' seems reasonable. To be clear, the doc could be updated to "... sets, frozensets, and other instances of collections.Set."

    "Both set and frozenset support set to set comparisons. " This includes comparisons between the two classes.

    >>> set() == frozenset()
    True

    so perhaps comparisons should be extended also.

    @markshannon
    Copy link
    Member

    Review of set-with-Set.patch:

    Looks good overall.
    I agree that restricting operations to instances of Set rather than Iterable is correct.

    Implementing "__rsub__" in terms of - (subtraction) means that infinite recursion is a possibility. It also creates an unnecessary temporary.
    Could you just reverse the expression used in __sub__?

    Would you add tests for comparisons; Set() == set(), etc.
    There are probably tested implicitly in the rest of the test suite, but explicit tests would be good.

    @jcea
    Copy link
    Member

    jcea commented Oct 31, 2012

    Heads up, Issue bpo-16373.

    @rhettinger rhettinger assigned rhettinger and unassigned stutzbach Nov 1, 2012
    @ncoghlan
    Copy link
    Contributor

    Armin pointed out in http://lucumr.pocoo.org/2013/5/21/porting-to-python-3-redux/ that one nasty consequence of the remaining part of bpo-2226 and this bug is making it much harder than it should be to use the ItemsView, KeysView and ValuesView from collections.abc to implement third party mappings that behave like the builtin dict.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Jan 4, 2014

    Raymond, will you have a chance to look at this before 3.4rc1? Otherwise I'd like to take it.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Feb 2, 2014

    I updated the patch to apply cleanly to the default branch. I also added several new test cases which uncovered issues with Daniel's previous patch.

    Specifically:

    • the reverse functions were not be tested properly (added a separate test to ensure they all return NotImplemented when appropriate)

    • the checks in the in-place operands were not being tested, and were also too strict (added tests for their input checking, and also ensured they still accepted arbitrary iterables as input)

    I've also reduced the target versions to just 3.4 - this will require a porting note in the What's New, since the inappropriate handling of arbitrary iterables in the ABC methods has been removed, which means that things that previously worked when they shouldn't (like accepting a list as the RHS of a binary set operator) will now throw TypeError.

    Python 3.3:
    >>> set() | list()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: unsupported operand type(s) for |: 'set' and 'list'
    >>> from test.test_collections import WithSet
    >>> WithSet() | list()
    <test.test_collections.WithSet object at 0x7f71ff2f6210>

    After applying the attached patch:

    >>> set() | list()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: unsupported operand type(s) for |: 'set' and 'list'
    >>> from test.test_collections import WithSet
    >>> WithSet() | list()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: unsupported operand type(s) for |: 'WithSet' and 'list'

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Feb 2, 2014

    I initially missed Mark's suggestion above to avoid the recursive subtraction operation in __rsub__. v2 of my patch includes that tweak.

    @serhiy-storchaka
    Copy link
    Member

    I think set operations with iterable (but not set) should be tested.

    @ncoghlan
    Copy link
    Contributor

    This didn't make it into 3.4, and the comment about needing a porting note above still applies, so to 3.5 it goes.

    @rhettinger
    Copy link
    Contributor

    Thanks for the patch update. I will look at it shortly.

    @rhettinger
    Copy link
    Contributor

    Attaching a draft patch with tests.

    @ncoghlan
    Copy link
    Contributor

    Ah, interesting - I completely missed the comparison operators in my patch and tests. Your version looks good to me, though.

    That looks like a patch against 2.7 - do you want to add 2.7 & 3.4 back to the list of target versions for the fix?

    @rhettinger
    Copy link
    Contributor

    Adding tests for non-set iterables as suggested by Serhiy.

    @rhettinger
    Copy link
    Contributor

    Added tests that include the pure python sets.Set(). Only the binary or/and/sub/xor methods are tested.

    The comparison operators were designed to only interact with their own kind. A comment from Tim Peters explains the decision raise a TypeError instead of returning NotImplemented (it has unfortunate interactions with cmp()). At any rate, nothing good would come from changing that design decision now, so I'm leaving it alone to fade peacefully into oblivion.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 26, 2014

    New changeset 3615cdb3b86d by Raymond Hettinger in branch '2.7':
    bpo-8743: Improve interoperability between sets and the collections.Set abstract base class.
    http://hg.python.org/cpython/rev/3615cdb3b86d

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 26, 2014

    New changeset cd8b5b5b6356 by Raymond Hettinger in branch '3.4':
    bpo-8743: Improve interoperability between sets and the collections.Set abstract base class.
    http://hg.python.org/cpython/rev/cd8b5b5b6356

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants