This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Remove needless set operator restriction
Type: enhancement Stage: resolved
Components: Interpreter Core, Library (Lib) Versions: Python 3.8, Python 3.7
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: bup, rhettinger
Priority: normal Keywords: patch

Created on 2018-08-25 11:37 by bup, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 8921 closed bup, 2018-08-25 11:39
Messages (2)
msg324060 - (view) Author: Dan Snider (bup) * Date: 2018-08-25 11:37
I only just now realized that `dict.fromkeys('abc').keys() - 'bc'` returns {'a'} instead of raising an error like {*'abc'} - 'bc' would, which is really quite handy. 

The __xor__, __and__, and __sub__ methods of dict_keys (and items, assuming no there are no unhashable values) work just as set.symmetric_difference, set.intersection, and set.difference do, respectively.

>>> a, b, c, d = [*map(dict.keys, map(dict.fromkeys, 'abcd'))]
>>> ((a | 'a') | (b & 'b') | (c ^ 'c')) - d
{'b', 'a'}
>>> a, b, c, d = [*map(dict.items, map(dict.fromkeys, 'abcd'))]
>>> ((a | 'a') | (b & 'b') | (c ^ 'c')) - d
{'c', ('a', None), 'a', ('c', None)}

However, set objects are arbitrarily restricted to taking a set object for the second argument on these functions. As for the first example here, there is even code specifically there to handle a dictionary as the second argument, but it is unreachable when called through the dunder version. 

{<class 'list'>, <class 'dict'>, <class 'set'>}
>>> {list, set} | dict.fromkeys((dict, set))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for |: 'set' and 'dict'

>>> {*'abc'}.difference('cde')
{'b', 'a'}
>>> {*'abc'} - set('cde')
{'b', 'a'}
>>> {*'abc'} - 'cde'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for -: 'set' and 'str'

>>> {1,2,3}.symmetric_difference(b'\x00')
{0, 1, 2, 3}
>>> {1,2,3} ^ b'\x00'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for ^: 'set' and 'bytes'

The sources of set_and, set_sub, and set_xor all look like this. All they do is add a check that the second argument is a set and then simply call the same function their respective non-dunder method uses. They're so identical in fact that set_xor actually calls the exact same C function used in the PyMethodDef for set.symmetric_update:

static PyObject *
set_xor(PySetObject *so, PyObject *other)
{
    if (!PyAnySet_Check(so) || !PyAnySet_Check(other))
        Py_RETURN_NOTIMPLEMENTED;
    return set_symmetric_difference(so, other);
}
static PyMethodDef set_methods[] = {
/* ... */
{"symmetric_difference",(PyCFunction)set_symmetric_difference, 
 METH_O, symmetric_difference_doc},
/* ... */
};

All that's needed to fix this is to remove a total of 106 characters from setobject.c (4 x " || !PyAnySet_Check(other)").
msg324133 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-08-26 18:10
Sorry Dan, we're going to pass on this one.  The current behavior was an intentional design choice by Guido and reflects a careful balance between some difficult trade-offs.

An early and permanent mistake in Python's design is that list.__iadd__() and list.extend() both accept any input iterable.  For extend(), this proved to be useful.  In contrast, __iadd__() was a recurring bug magnet.  People would routinely type "s=['abc']; s+='def'" expecting to get ['abc', 'def'] rather than ['abc', 'd', 'e', 'f'].   Based on this experience, Guido wisely opined that math operators on other concrete collection classes should be restricted working with members of their own class.

When abstract base classes were introduced, a seemingly inconsistent decision was made.  The Set ABCs allowed the math operators to accept any input iterable and did not provide the spelled-out method names (union, intersection, difference, etc).

IIRC, there were several reasons for this.  It kept the total number of methods to a manageable size (important so as to not unduly burden implementers of concrete classes).  Also, having a same type restriction is at odds with some of the design goals and use cases for collections ABCs.  Additionally, the code for the mixin methods is simpler without the restrictions.

When dict views were implemented, they followed the Set ABCs.  This gave them fewer methods than sets but also gave them fewer restrictions.  For the most part, these design trade-offs have worked out well in practice.  The existing behavior is neither "needless" nor "arbitrary".  It was the result of careful consideration by GvR on what works best for most people, most of the time.
History
Date User Action Args
2022-04-11 14:59:05adminsetgithub: 78678
2018-08-26 18:10:15rhettingersetstatus: open -> closed
resolution: rejected
messages: + msg324133

stage: patch review -> resolved
2018-08-26 17:27:55rhettingersetassignee: rhettinger
2018-08-25 12:23:52xiang.zhangsetnosy: + rhettinger
2018-08-25 11:39:17bupsetkeywords: + patch
stage: patch review
pull_requests: + pull_request8394
2018-08-25 11:37:26bupcreate