This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Title: Inconsistent behavior between set and dict_keys/dict_items: for non-iterable object x, set().__or__(x) returns NotImplemented, but {}.keys().__or__(x) raises TypeError
Type: behavior Stage: resolved
Components: Versions: Python 3.9
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: canjo, josh.r, rhettinger, serhiy.storchaka
Priority: low Keywords:

Created on 2015-06-08 21:04 by canjo, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (5)
msg245044 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2015-06-09 05:32
The dictviews_or() function in Objects/dictobject.c is converting the keys to a set and calling the set.update() method with the given argument.  The set.update() method doesn't return NotImplemented because it has no reflected operation.

It looks like dictviews_or() should instead call set.__or__() to allow it a chance to return NotImplemented.  The other dictview set operations are similarly afflicted.
msg245046 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2015-06-09 06:06
One other thought:  Returning NotImplemented may be an easy change but it isn't clear that it would provide a sensible capability for mapping views.  The non-iterable *other* argument would need to have a __ror__() method than could do something useful with a KeysView or ItemsView argument.  That seems like an improbable use case (which is why I presume this has never come up before).
msg349152 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-08-07 07:16
set.update() accepts arbitrary iterable, but set.__or__() requires a set or frozenset.
msg349195 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2019-08-07 19:51
To be clear, set().__or__(x) returns NotImplemented, it doesn't raise NotImplementedError. I've edited the title to match.

One major problem that gets in the way of a fix is that the interface of set and dict views doesn't match, because dict views only provide the interface of, and *only* requires the operator overloads, not the named equivalent methods. And because the named equivalents don't exist, the operator overloads were made more liberal, to accept any iterable, not just set or set-like things.

set's rules are:

1. Named methods take varargs, and accept any iterable(s)
2. Operator overloads take only one argument, which must be a set/frozenset

dict view's rules are:

1. There are no named methods (aside from isdisjoint, which has no operator equivalent)
2. Operator overloads take only one argument, which can be any iterable

So fixing this is problematic so long as we want to keep a simple implementation; right now, all the methods simplify to:

1. Convert to set
2. Call named, in-place method on resulting set to get liberal behavior

Fixing this would involve either:

1. Rewriting all the dict views operator overloads to do the work themselves, including the logic to reject non-iterables by returning NotImplemented like set's operators do.

    Pro: Not a hack. Resulting code would be more efficient in many cases (e.g. dict view's __and__ could make a new empty set [currently makes set with a copy of the view's values], and iterate the right hand side while performing membership tests in the underlying dict before inserting into the result set, avoiding the need to make a large set that will almost certainly shrink, possibly requiring rehashes)

    Con: Lots of new code to maintain. Would not benefit from any algorithmic improvements to equivalent set code without mirroring them manually.

2. Checking for TypeError from the set method and explicitly returning NotImplemented.

    Pro: Easy, code remains maintainable, benefits from any future optimizations to set

    Cons: Can't distinguish fundamental problems that should be TypeErrors (right hand side contains unhashable objects) from simple case that should return NotImplemented (right hand side isn't iterable). Potentially fixable by explicitly performing the conversion to iterator on the dict view's side (though you'd only do it to confirm it's possible; you'd want to pass the original object unmodified for the common case where it's a set/frozenset and could be processed more efficiently). Also makes failure case even slower than it already is (exception is raised, then ignored, then NotImplemented is returned, then right-hand side is checked, and odds are, the right-hand side won't know what to do either, so it'll return NotImplemented anyway, and a TypeError gets thrown after more work).

3. Refactor the set API to expose (private, C level) helpers for the operator overloads (specifically, the *in-place* versions like __ior__, since we're always making a new true set to call it on anyway, and it seems silly to use __or__ which would make yet another set, and force us to throw away the first set we made) that are less strict about their arguments.

    Pros: Roughly the same as #2, plus a minor per-call (not per item) performance boost: C helpers could be called directly, avoiding overhead of calling Python level methods via _PyObject_CallMethodIdOneArg(result, &PyId_update, other); invoking dict lookups and the like when we know we just made a true set object, so there is no possibility of update being overridden, so a direct C level call would save time. Continues to benefit from any optimizations to set's code (since 95% of the code remains the same).

    Con: More code churn than #2; unfortunately, unlike list (which has type loose in-place operator overloads), set's in-place operator overloads are still type strict; alist += {1,2,3} works, but aset |= [1,2,3] does not, so we can't just call set's __ixor__ unmodified, we'd need a _PySet_IXorIterable separate from the existing set_ior, where both of them might be implemented in terms of set_update_internal, but _PySet_IXorIterable only checks that the argument is iterable to return NotImplemented.

#1 and #3 seem to be the only fully correct options; #1 could squeeze out more performance, but requires a *lot* more new code, #3 requires a lot less new code, and gets a tiny performance boost.

I don't see explicit calls to __or__ being all that common though, and I have a hard time envisioning a class that would hit this case without explicitly calling __or__; for `{}.keys() | x` the only time you'd see a behavioral difference from `set() | x` is when:

1. x is not iterable, and
2. x defines a __ror__ that can do something meaningful with a set

I'd personally assume anything that meets criterion #2 would be iterable; anyone have a counter-example? If this is only a theoretical problem, might it make sense to just leave it as is?
msg350212 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-08-22 16:39
> If this is only a theoretical problem, might it make sense 
> to just leave it as is?

I'm fine with that.  In the last four years, I'm the only one who ever noticed, so it doesn't appear to be a problem in practice.
Date User Action Args
2022-04-11 14:58:17adminsetgithub: 68601
2019-08-22 16:39:25rhettingersetstatus: open -> closed
resolution: wont fix
messages: + msg350212

stage: resolved
2019-08-07 20:04:24pitrousetnosy: - pitrou
2019-08-07 20:03:59rhettingersetassignee: rhettinger
versions: + Python 3.9, - Python 2.7, Python 3.4, Python 3.5, Python 3.6
2019-08-07 19:51:19josh.rsetnosy: + josh.r

messages: + msg349195
title: Inconsistent behavior between set and dict_keys/dict_items: for non-iterable object x, set().__or__(x) raises NotImplementedError, but {}.keys().__or__(x) raises TypeError -> Inconsistent behavior between set and dict_keys/dict_items: for non-iterable object x, set().__or__(x) returns NotImplemented, but {}.keys().__or__(x) raises TypeError
2019-08-07 07:16:04serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg349152
2015-06-10 05:08:23rhettingersetnosy: + pitrou
2015-06-09 06:06:16rhettingersetmessages: + msg245046
2015-06-09 05:32:56rhettingersetpriority: normal -> low

messages: + msg245044
versions: + Python 2.7, Python 3.5, Python 3.6
2015-06-08 23:00:43ned.deilysetnosy: + rhettinger
2015-06-08 21:04:25canjocreate