classification
Title: collections.Counter's in-place operators should return NotImplemented for unsupported types
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: Jim.Jewett, Joshua.Chin, ethan.furman, pitrou, r.david.murray, rhettinger, serhiy.storchaka
Priority: low Keywords: patch

Created on 2014-10-30 15:06 by Joshua.Chin, last changed 2014-11-07 14:34 by ethan.furman. This issue is now closed.

Files
File name Uploaded Description Edit
counter.patch Joshua.Chin, 2014-10-30 15:06 review
Messages (28)
msg230271 - (view) Author: Joshua Chin (Joshua.Chin) * Date: 2014-10-30 15:06
Currently, in-place operations on 'collections.Counter' with unsupported types raises an 'AttributeError'.

Example:
>>> import collections
>>> counter = collections.Counter()
>>> counter += 1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/collections/__init__.py", line 709, in __iadd__
    for elem, count in other.items():
AttributeError: 'int' object has no attribute 'items'

Instead, it should return 'NotImplemented' if 'other' is not a 'collections.Counter'
msg230272 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-10-30 15:56
That would prevent it from working with "work alike" (duck type) classes, though.
msg230273 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-10-30 16:00
As I noted in my review, the docstring specifically says "other Counter".

If we want to relax that we could check for an 'items' attribute and 'return NotImplemented' if it isn't there, but one or the other should definitely happen.
msg230277 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-10-30 17:50
'counter' in the docstrings is in lower case, so that says nothing dispositive.  However, __add__ does an ininstance check, so it is hard to see why __iadd__ does not.

Personally I'd drop the isinstance checks and let the errors bubble up as they may.  Why should subtract explicitly support other data types, but not __sub__/__isub__?  But Raymond's opinion should hold the most weight here.
msg230278 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-10-30 18:00
Indeed -- we mostly discuss with each other to try and sway his opinion.  :)

stdlib types should not let every error bubble up.  Consider a dict:
----------------------------------------------------------------
--> d = {}
--> d += 2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +=: 'dict' and 'int'
----------------------------------------------------------------

now look at Counter
----------------------------------------------------------------
--> from collections import Counter
--> c = Counter()
--> c += 2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ethan/source/python/cpython/Lib/collections/__init__.py", line 709, in __iadd__
    for elem, count in other.items():
AttributeError: 'int' object has no attribute 'items'
----------------------------------------------------------------

Counter is not user-friendly in this case.

There are other areas of Counter that accept arbitrary mappings, so I would be fine the __ixxx__ methods also accepting arbitrary mappings, but if the thing passed in *will not* work with Counter, then returning NotImplemented is the appropriate course of action.
msg230280 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-10-30 18:04
I also think returning NotImplemented would be the right thing here.
msg230389 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-10-31 21:12
Here is what other mutable types do:

>>> s = []
>>> s.__iadd__(1)
Traceback (most recent call last):
  File "<pyshell#1>", line 1, in <module>
    s.__iadd__(1)
TypeError: 'int' object is not iterable

>>> s = set()
>>> s.__ior__(1)
NotImplemented

>>> from collections import deque
>>> s = deque()
>>> s.__iadd__(1)
Traceback (most recent call last):
  File "<pyshell#6>", line 1, in <module>
    s.__iadd__(1)
TypeError: 'int' object is not iterable

>>> from array import array
>>> s = array('i')
>>> s.__iadd__(1)
Traceback (most recent call last):
  File "<pyshell#10>", line 1, in <module>
    s.__iadd__(1)
TypeError: can only extend array with array (not "int")

>>> s = bytearray()
>>> s.__iadd__(1)
Traceback (most recent call last):
  File "<pyshell#12>", line 1, in <module>
    s.__iadd__(1)
TypeError: can't concat int to bytearray
msg230399 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-10-31 22:18
The behavior exhibited by set() is correct, and the behavior for array and bytearray are fine, as the error message is even more helpful.

Issues opened for list (issue22778) and deque (issue22779).
msg230400 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-10-31 22:18
Note, the += operation is conceptually similar to update() methods with are usually duck typeable:

>>> d = {}
>>> d.update(1)
Traceback (most recent call last):
  File "<pyshell#16>", line 1, in <module>
    d.update(1)
TypeError: 'int' object is not iterable

>>> s = set()
>>> s.update(1)
Traceback (most recent call last):
  File "<pyshell#18>", line 1, in <module>
    s.update(1)
TypeError: 'int' object is not iterable

>>> dict(1)
Traceback (most recent call last):
  File "<pyshell#19>", line 1, in <module>
    dict(1)
TypeError: 'int' object is not iterable
msg230402 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-10-31 22:23
I think I've changed my mind on this. The important distinction is not between "ducktyping" or "cooperating". It's that this is an in-place mutating operation. A mutating operation should always be the responsibility of the receiver, and so it sounds wrong to be able to delegate it to the read-only operand.

For example, when calling list.extend(op), the list object doesn't allow `op` to take over the operation's semantics. So I think TypeError is the right outcome here.
msg230405 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-10-31 22:31
https://docs.python.org/2/reference/datamodel.html#object.__iadd__
------------------------------------------------------------------
[...] These methods should attempt to do the operation in-place (modifying self) and return the result (which could be, but does not have to be, self). If a specific method is not defined, the augmented assignment falls back to the normal methods. For instance, to execute the statement x += y, where x is an instance of a class that has an __iadd__() method, x.__iadd__(y) is called. If x is an instance of a class that does not define a __iadd__() method, x.__add__(y) and y.__radd__(x) are considered, as with the evaluation of x + y.

-------------------------------------------------------------------

Returning NotImplemented still allows a TypeError to be raised, is subclass friendly, and is the way Python is designed.
msg230406 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-10-31 22:35
You misread that paragraph:

""" For instance, to execute the statement x += y, where x is an instance of a class that has an __iadd__() method, x.__iadd__(y) is called."""

This is the present case, and the case of most mutable containers.

"""If x is an instance of a class that does not define a __iadd__() method, x.__add__(y) and y.__radd__(x) are considered, as with the evaluation of x + y."""

This is not the present case.
msg230410 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-10-31 23:01
Nevertheless, that is the behavior if NotImplemented is returned by a method.  Here's some code to demonstrate:

--8<---------------------------------------
from collections import Counter

class Spam(int):
    "for sake of example"
    def __radd__(self, other):
        other[self] = other[self] + 1
        return other

s = Spam(5)
c = Counter()
print(c)
c += s
print(c)
c += 9
--8<---------------------------------------

before the patch
-------------------------------------------
Counter()
Traceback (most recent call last):
  File "blah.py", line 13, in <module>
    c += s
  File "/home/ethan/source/python/issue20284/Lib/collections/__init__.py", line 738, in __iadd__
    for elem, count in other.items():
AttributeError: 'Spam' object has no attribute 'items'
-------------------------------------------

after the patch
-------------------------------------------
Counter()
Counter({5: 1})
Traceback (most recent call last):
  File "blah.py", line 13, in <module>
    c += 9
TypeError: unsupported operand type(s) for +=: 'Counter' and 'int'
-------------------------------------------

As you can see, we get better support for other objects that know how to add themselves to the container in question, and a nicer and more correct error message for objects that do not.

As I said earlier, the only decision we should have to make here is whether to check for a Counter, or just something with a .items attribute, but either way we should be returning NotImplemented.
msg230411 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-10-31 23:06
Yes, it's true that the message is better. If we don't want to change behaviour, we should simply catch the AttributeError and return NotImplemented from there.
msg230429 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-11-01 06:42
Sorry, I don't want to change the kind of exception being raised (an API change from AttributeError to TypeError) without a really good reason.

In general, in-place methods are not required to return NotImplemented (for example, (3).__iadd__(4.5) raises an AttributeError).

Also, I prefer the current AttributeError with its clear indication that an items() method is needed for duck-typing.  These kind of error messages are very helpful when you're trying to figure-out how to duck-type on-purpose (for example, {}.update(1) and {}.update([[]]) both provide the information about what you would need to do to get update() to work).

The current duck-typeable behavior was an intended part of the design and is no different from a number of other methods that update in-place.

FWIW, I do recognize that in the specific case of "counter += 1" a TypeError is clearer than an AttributeError.  However, for duck-typeable methods, an AttributeError can sometimes be useful to indicate what is actually being required of the "other" argument (for example, Python 2 old-style classes raise an AttributeError with an informative message when calling an instance without a __call__ method or indexing an instance without a __getitem__ method).

At any rate, I want to avoid unnecessary API churn (and avoid contributing to what Guido has called "a death by a thousand cuts" for the growing list of tiny semantic differences between Python 2 and Python 3).
msg230446 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-11-01 13:08
I agree with Raymond about the value of the more specific error message.  And the point about the in-place operators being qualitatively different from the non-inplace operators is a good one.
msg230533 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-11-03 10:53
> I don't want to change the kind of exception being raised (an API change from
> AttributeError to TypeError) without a really good reason.

Subclasses cannot work with the current implementation.

> In general, in-place methods are not required to return NotImplemented (for
> example, (3).__iadd__(4.5) raises an AttributeError).

AttributeError is being raised because int does not have an __iadd__ method, not because of a problem with the float operand.

> Also, I prefer the current AttributeError with its clear indication that an items()
> method is needed for duck-typing.  These kind of error messages are very helpful
> when you're trying to figure-out how to duck-type on-purpose

That's what the docs are for.

> (for example, {}.update(1) and {}.update([[]]) both provide the information about
> what you would need to do to get update() to work).

update is not a binary operation, so has more leeway in how it handles problems.

> The current duck-typeable behavior was an intended part of the design and is no
> different from a number of other methods that update in-place.

The only problem with the design is that it does not play well with others.  For duck-typeable just do a check on 'other' to see if it has an .items() method, and return NotImplemented if it does not.  Oh, and check that 'self' is Counter, and also return NotImplemented if that fails.

> At any rate, I want to avoid unnecessary API churn (and avoid contributing to
> what Guido has called "a death by a thousand cuts" for the growing list of tiny
> semantic differences between Python 2 and Python 3).

Not an issue in this case, as the 2.7 Counter does not have the in-place methods.

Here's proof of subclass trouble -- the idea is to make an immutable Counter:
---------------------------------------------
from collections import Counter

class FrozenCounter(Counter):
    """
    immutable after definition
    """
    def __add__(self, other):
        new_counter = self.__class__()
        for elem, count in self.items():
            new_counter[elem] = count
        for elem, count in other.items():
            new_counter[elem] += count
        return new_counter._keep_positive()
        
fc = FrozenCounter("abbc")
sc = FrozenCounter("bubba")
original = fc
fc += sc
assert fc == FrozenCounter("aabbbbbcu")
assert fc is not original
---------------------------------------------

and the results:
---------------------------------------------
Traceback (most recent call last):
  File "blah.py", line 20, in <module>
    assert fc is not original
AssertionError
---------------------------------------------

For subclassing to work, the fix is:

        if not hasattr(other, 'items') or type(self) is not Counter:
            return NotImplemented
msg230536 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-11-03 13:49
On Mon, 03 Nov 2014 10:53:09 +0000, Ethan Furman <report@bugs.python.org> wrote:
> 
> Ethan Furman added the comment:
> 
> > I don't want to change the kind of exception being raised (an API change from
> > AttributeError to TypeError) without a really good reason.
> 
> Subclasses cannot work with the current implementation.

I don't understand this statement.

> > Also, I prefer the current AttributeError with its clear indication that an items()
> > method is needed for duck-typing.  These kind of error messages are very helpful
> > when you're trying to figure-out how to duck-type on-purpose
> 
> That's what the docs are for.

That's not the way it works in real life.  You implement something, it
doesn't work, and if the solution isn't obvious from the error message
*then* you look at the docs or google.  But if the error message says
"you can't do this", then like as not you don't even look at the docs.

> > (for example, {}.update(1) and {}.update([[]]) both provide the information about
> > what you would need to do to get update() to work).
> 
> update is not a binary operation, so has more leeway in how it handles problems.

As Raymond pointed out, the augmented assignment operator is a
*shortcut* for update.

> > The current duck-typeable behavior was an intended part of the design and is no
> > different from a number of other methods that update in-place.
> 
> The only problem with the design is that it does not play well with others.  For duck-typeable just do a check on 'other' to see if it has an .items() method, and return NotImplemented if it does not.  Oh, and check that 'self' is Counter, and also return NotImplemented if that fails.

No, that doesn't support duck typing.  Duck typing is not putting
arbitrary restrictions on what objects you accept, but only essential
restrictions.  And as noted above, if you return NotImplemented, then
you get an error message that essentially says "this is impossible"
instead of "this is what didn't work".

> Here's proof of subclass trouble -- the idea is to make an immutable Counter:

Your subclass doesn't override __iadd__ or any of the other mutation
methods.  Why would you expect it to be immutable if it doesn't?

> For subclassing to work, the fix is:
> 
>         if not hasattr(other, 'items') or type(self) is not Counter:
>             return NotImplemented

I've never seen a type check like that in a Python class, and I hope
I never do.  *That* breaks subclassing.
msg230544 - (view) Author: Jim Jewett (Jim.Jewett) * (Python triager) Date: 2014-11-03 15:56
If I know that a Counter (or any class X) can be updated in place, I will be surprised to find out that I'm using a different instance after a successful in-place operation.

I would even consider that (replacement of the original instance) a security risk, though it is mitigated by the fact that I don't see how to exploit it without already being able to create arbitrary subclasses.

> For subclassing to work, the fix is:
> 
>         if not hasattr(other, 'items') or type(self) is not Counter:
>             return NotImplemented

That breaks for Counter subclasses on the left hand side -- even a trivial subclass to change the repr would fail unless the _i* methods were all re-implemented, either by the otherwise-trivial Counter subclass or by the right-hand objects.  If you start making it work for the obvious cases, you just make the failure conditions more obscure.
msg230547 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-11-03 17:38
Ethan stated:
>> The only problem with the design is that it does not play well with others.  For
>> duck-typeable just do a check on 'other' to see if it has an .items() method, and
>> return NotImplemented if it does not.  Oh, and check that 'self' is Counter, and
>> also return NotImplemented if that fails.

R. David Murray replied:
> No, that doesn't support duck typing.  Duck typing is not putting arbitrary
> restrictions on what objects you accept, but only essential restrictions.

__iadd__ will raise an AttributeError if 'other' doesn't have an 'items' method -- how is that an arbitrary restriction?

If 'other' doesn't have an 'items' but does know how to add itself to Counter, it won't matter because Counter is raising an exception instead of returning NotImplemented.  

What other types return NotImplemented when 'other' is not the expected (sub)type?  bytes, bytearray, int, float, string, dict, list, tuple, Enum, deque, defaultdict, OrderedDict, Decimal, Fraction, and Counter itself for every binary method /except/ the four inplace ops it supplies itself.

> And as noted above, if you return NotImplemented, then you get an error message
> that essentially says "this is impossible" instead of "this is what didn't work".

Correct implementation should not be sacrificed for a nicer error message.

>> Here's proof of subclass trouble -- the idea is to make an immutable Counter:
>> [snip example]

> Your subclass doesn't override __iadd__ or any of the other mutation methods.  Why
> would you expect it to be immutable if it doesn't?

You're right, that was a bad example.  A subclass would need to override any method whose behavior it wants to change.

What does not seem correct to me is raising an exception because the type of 'other' is not correct, instead of returning NotImplemented, and in 14 other core datatypes it does work that way, and even in Counter it works that way for any in-place method that Counter itself doesn't override, and it does work that way /in/ Counter for the normal binary ops that it /does/ override.

> For subclassing to work, the fix is:
> 
>         if not hasattr(other, 'items') or type(self) is not Counter:
>             return NotImplemented

>> I've never seen a type check like that in a Python class, and I hope I never do.  *That*
>> breaks subclassing.

Explanations and examples go a lot further than bare statements.

[Jim Jewitt posts an explanation.]

Ah, I see.  Thank you, Jim.

Okay, I agree that the check against the base class is ill-advised (and I modified some of the above reply to match that understanding).

However, I am still unconvinced that 'other' should not be checked.  Hopefully the python-dev thread will shed more light on this.
msg230559 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-11-03 20:21
Okay, the python-dev ruling is in, and raising an exception in the __ixxx__ methods is allowed, and even a good idea in the cases of mutable containers.

However, doing the check on 'other' and raising a TypeError with an appropriate message would still be better for a couple reasons:

  - all the non-inplace operations raise TypeError when 'other' is not a correct type
  - __iand__ is currently buggy because it does not do the check

If backwards compatibility is a concern in this case, a custom "TypeError" that was a subclass of both TypeError and AttributeError could address that.
msg230665 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-11-05 08:17
> However, doing the check on 'other' and raising a TypeError
> with an appropriate message would still be better 

Let's be clear.  These are duck-typed methods.  A type check is inappropriate.  Anything with o.items() is allowed regardless of type.

Also, I generally won't approve changes to existing APIs without compelling real-world use cases to motivate the design change.  Otherwise, you just create unnecessary churn and consternation.
msg230692 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-11-05 16:04
Raymond declared:
----------------
> Let's be clear.  These are duck-typed methods.  A type check is inappropriate.
> Anything with o.items() is allowed regardless of type.

Wikipedia explains (http://en.wikipedia.org/wiki/Duck_typing):
-------------------------------------------------------------
> In computer programming with object-oriented programming languages, duck typing
> is an alternative to typing. In duck typing, an object's suitability for some
> purpose is determined by the presence of certain methods and properties [...]

I did use an actual 'type' check in one of my exmaples, and that was wrong.

It is possible to do a "duck-type check" with a `hasattr(other, 'items')`.

I don't use Counter myself -- I'll try and find some real-world examples.
msg230694 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-11-05 16:12
There is no purpose served by changing the AttributeError into a TypeError.  It's just extra unneeded code.
msg230695 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-11-05 16:36
I've posted to python-list and python-dev.  I'll report back here the findings, if any.
msg230741 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-11-06 13:48
> I've posted to python-list and python-dev.  I'll report back here the findings, if any.

http://comments.gmane.org/gmane.comp.python.devel/150073
msg230749 - (view) Author: Jim Jewett (Jim.Jewett) * (Python triager) Date: 2014-11-06 15:21
I wish there were an APIMismatchError superclass to unify
(AttributeError, TypeError).  But the wart probably isn't enough to
justify the surgery.

On Thu, Nov 6, 2014 at 8:48 AM, Serhiy Storchaka <report@bugs.python.org> wrote:
>
> Serhiy Storchaka added the comment:
>
>> I've posted to python-list and python-dev.  I'll report back here the findings, if any.
>
> http://comments.gmane.org/gmane.comp.python.devel/150073
>
> ----------
> nosy: +serhiy.storchaka
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue22766>
> _______________________________________
msg230807 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2014-11-07 14:34
No real-world use-cases have surfaced.  Many thanks to everyone's explanations and examples.
History
Date User Action Args
2014-11-07 14:34:32ethan.furmansetmessages: + msg230807
2014-11-06 15:21:38Jim.Jewettsetmessages: + msg230749
2014-11-06 13:48:44serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg230741
2014-11-05 16:36:06ethan.furmansetmessages: + msg230695
2014-11-05 16:12:11r.david.murraysetmessages: + msg230694
2014-11-05 16:04:33ethan.furmansetmessages: + msg230692
2014-11-05 08:17:29rhettingersetmessages: + msg230665
2014-11-03 20:21:46ethan.furmansetmessages: + msg230559
2014-11-03 17:38:25ethan.furmansetmessages: + msg230547
2014-11-03 15:56:36Jim.Jewettsetnosy: + Jim.Jewett
messages: + msg230544
2014-11-03 13:49:43r.david.murraysetmessages: + msg230536
2014-11-03 10:53:09ethan.furmansetmessages: + msg230533
2014-11-01 13:08:13r.david.murraysetmessages: + msg230446
2014-11-01 06:42:44rhettingersetstatus: open -> closed
resolution: not a bug
messages: + msg230429
2014-10-31 23:06:14pitrousetmessages: + msg230411
2014-10-31 23:01:19ethan.furmansetmessages: + msg230410
2014-10-31 22:35:52pitrousetmessages: + msg230406
2014-10-31 22:31:20ethan.furmansetmessages: + msg230405
2014-10-31 22:23:13pitrousetmessages: + msg230402
2014-10-31 22:18:47rhettingersetmessages: + msg230400
2014-10-31 22:18:07ethan.furmansetmessages: + msg230399
2014-10-31 21:12:48rhettingersetpriority: normal -> low

messages: + msg230389
2014-10-30 18:04:49pitrousetversions: - Python 2.7, Python 3.4
nosy: + pitrou

messages: + msg230280

type: behavior -> enhancement
2014-10-30 18:00:49ethan.furmansetmessages: + msg230278
2014-10-30 17:50:23r.david.murraysetmessages: + msg230277
2014-10-30 16:00:26ethan.furmansetmessages: + msg230273
2014-10-30 15:56:30r.david.murraysetnosy: + r.david.murray
messages: + msg230272
2014-10-30 15:25:02ethan.furmansetversions: + Python 2.7, Python 3.5
2014-10-30 15:15:59ethan.furmansetassignee: rhettinger

nosy: + rhettinger, ethan.furman
2014-10-30 15:06:37Joshua.Chincreate