classification
Title: dict_keys purports to implement the Set ABC, but is missing the isdisjoint method
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.2
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: stutzbach Nosy List: daniel.urban, eric.araujo, rhettinger, stutzbach, terry.reedy
Priority: high Keywords: patch

Created on 2010-07-09 19:03 by stutzbach, last changed 2010-09-14 16:54 by stutzbach. This issue is now closed.

Files
File name Uploaded Description Edit
issue9212.diff daniel.urban, 2010-07-23 16:47 Patch (py3k branch)
issue9212a.diff daniel.urban, 2010-07-24 08:03 Patch (iterate over the smaller) (py3k branch)
issue9212b.diff daniel.urban, 2010-09-01 13:48 Patch (with corrections)
Messages (14)
msg109783 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-07-09 19:03
>>> isinstance({}.keys(), collections.Set)
True
>>> [method for method in set(dir(collections.Set)) - set(dir({}.keys()))
...  if not method.startswith('_')]
['isdisjoint']

(in Python 2.7, use "viewkeys()" instead of "keys")

dict_items has the same problem.
msg109941 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-07-10 22:08
Concrete classes are allowed to have more features than the corresponding ABC.  The ABCs are not intended to be full-featured; instead, they specify the part of the API that will be guaranteed.  For example, the union() method for built-in sets allows two or more set arguments, but the corresponding method for the ABC is limited to two arguments.  This was an intentional difference, designed to make it easier for people to implement a Set class.

That being said, the omission of isdisjoint() was an oversight and I see no reason that this shouldn't be fixed.
msg109944 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-07-10 23:42
In this case, the concrete class is the one missing a method.  

Concrete classes are allowed to provide more features than the corresponding ABC, but the converse is not true to the best of my knowledge.

dict_keys .register()s as supporting the Set ABC, so it does not automatically pick up the method through inheritance.  Put another way:

>>> # dict_keys provides the Set ABC API
>>> isinstance({}.keys(), collections.Set)
True

>>> # The Set ABC provides isdisjoint
>>> hasattr(collections.Set, 'isdisjoint') 
True

>>> # Ergo, dict_keys should provide isdisjoint ... but it doesn't
>>> hasattr({}.keys(), 'isdisjoint')       
False

See also Issue9213 for another case where a concrete class is missing a method provided by an ABC it claims to support.

I sort of wonder if .register() should verify that the concrete class provides all of the methods of the ABC.
msg109952 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-07-11 00:46
I had misread you original post.  Thought you we saying that the Set ABC was missing disjoint.   Disregard my last post.
msg111362 - (view) Author: Daniel Urban (daniel.urban) * (Python triager) Date: 2010-07-23 16:47
The attached patch adds the isdisjoint method to dict_keys and dict_items.
Pseudocode for the method:

def isdisjoint(self, other):
    if self is other:
        if len(self) == 0:
            return True
        else:
            return False
    else:
        for item in other:
            if item in self:
                return False
        return True
msg111376 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-07-23 19:36
Titles that fit in the box, like 'Dict.keys lacks .isjoint method.' are easier to read and keep track of.

Unless there is a reason I have missed, I would iterate through the smaller set, which might even be empty or nearly so, rather than either in particular.
msg111431 - (view) Author: Daniel Urban (daniel.urban) * (Python triager) Date: 2010-07-24 08:03
> Unless there is a reason I have missed, I would iterate through the
> smaller set, which might even be empty or nearly so, rather than
> either in particular.

You're right, here is a new patch. Pseudocode:

def isdisjoint(self, other):
    if self is other:
        if len(self) == 0:
            return True
        else:
            return False
    else:
        if len(other) > len(self):
            self, other = other, self
        for item in other:
            if item in self:
                return False
        return True
msg114458 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-08-20 21:35
Thank you for the patch.

We should only iterate over the shorter set if the longer set is really a set and not just a sequence.  PySequence_Contains may take O(n) time on a list, making the algorithm an expensive O(n**2) overall.  I note that set_isdisjoint doesn't try to examine the lengths.

Also, since PyIter_Next returns NULL to indicate the end of the iteration OR to indicate an exception, the end of the function should look like this:

    Py_DECREF(it);
    if (PyErr_Occurred())
        return NULL;
    Py_RETURN_TRUE;

Other than those two issues, the patch looks good to me.
msg115301 - (view) Author: Daniel Urban (daniel.urban) * (Python triager) Date: 2010-09-01 13:48
Thanks for the corrections.  I'm attaching the new patch as issue9212b.diff.  I'm using PyAnySet_Check to determine if the other argument is really a set, but I'm not entirely sure, that this is correct. Please let me know if other corrections are needed.
msg115320 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-09-01 18:00
It would be nice to get this fixed before the next release.
msg115322 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-09-01 18:14
I will aim to spend some time with this (and the similar Issue #9213) today and/or tomorrow, so that it can be committed in time for 3.2a2.
msg115384 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-09-02 15:06
Committed to py3k in r84435.  Raymond, do you want to look the commit over before I merge it into 3.1 and 2.7?
msg115385 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-09-02 15:12
Meant to add:

I made some relatively minor changes to Daniel Urban's patch.  Mostly, I rearranged the order of a few things to avoid unnecessary work (e.g., only compute "len_other" if we've already checked that "other" is a set).

Thank you Daniel for the patch! :-)
msg115386 - (view) Author: Daniel Stutzbach (stutzbach) (Python committer) Date: 2010-09-02 15:14
Also, credited Daniel Urban for the patch in r84436 (forgot that the first time around -- sorry!).
History
Date User Action Args
2010-09-14 16:54:23stutzbachsetstatus: open -> closed
stage: patch review -> resolved
versions: - Python 3.1, Python 2.7
2010-09-02 15:14:24stutzbachsetmessages: + msg115386
2010-09-02 15:12:01stutzbachsetmessages: + msg115385
2010-09-02 15:06:50stutzbachsetmessages: + msg115384
2010-09-01 18:14:24stutzbachsetresolution: accepted
messages: + msg115322
stage: test needed -> patch review
2010-09-01 18:00:44rhettingersetmessages: + msg115320
2010-09-01 13:48:35daniel.urbansetfiles: + issue9212b.diff

messages: + msg115301
2010-08-20 21:35:26stutzbachsetmessages: + msg114458
2010-07-24 08:04:00daniel.urbansetfiles: + issue9212a.diff

messages: + msg111431
2010-07-23 19:36:50terry.reedysetnosy: + terry.reedy
messages: + msg111376
2010-07-23 16:47:34daniel.urbansetfiles: + issue9212.diff

nosy: + daniel.urban
messages: + msg111362

keywords: + patch
2010-07-11 04:36:22rhettingersetpriority: normal -> high
2010-07-11 00:46:32rhettingersetmessages: + msg109952
2010-07-10 23:42:24stutzbachsetmessages: + msg109944
2010-07-10 22:08:11rhettingersetnosy: + rhettinger
messages: + msg109941
2010-07-10 21:21:03eric.araujosetnosy: + eric.araujo
2010-07-09 19:03:48stutzbachsettype: behavior
2010-07-09 19:03:35stutzbachcreate