This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Speed up Counter operators
Type: performance Stage: patch review
Components: Library (Lib) Versions: Python 3.6
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: joern, python-dev, rhettinger, serhiy.storchaka
Priority: normal Keywords: needs review, patch

Created on 2015-02-24 09:43 by joern, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
counter_faster.patch serhiy.storchaka, 2015-02-24 12:44 review
counter_bench.py serhiy.storchaka, 2015-02-24 12:45
counter_faster_2.patch serhiy.storchaka, 2015-03-27 08:26 review
counter_pos_neg.diff rhettinger, 2015-05-25 22:37 review
Messages (15)
msg236484 - (view) Author: Jörn Hees (joern) * Date: 2015-02-24 09:43
This caught me when i worked on a stats script and updated c = Counter() in a loop with c += Counter(temp_dict). After profiling, i found out that my script spent 2 minutes in Counter.__add__ which seemed terribly slow (each invocation was in the ms not µs range).

Looking into this i found out that there is no implementation for Counter.__iadd__ (or __isub__) which i just assumed to be based on Counter.update (Counter.subtract).

Here's some timing output:

In [1]: from collections import Counter

In [2]: a = range(1000)

In [3]: b = range(1000)

In [4]: ab = a+b

In [5]: len(ab)
Out[5]: 2000

In [6]: %timeit c=Counter(a)
1000 loops, best of 3: 361 µs per loop

In [7]: %timeit c=Counter(ab)
1000 loops, best of 3: 734 µs per loop

In [8]: %timeit c=Counter(a) ; c += Counter(b)
1000 loops, best of 3: 1.51 ms per loop

In [9]: %timeit c=Counter(a) ; c.update(b)
1000 loops, best of 3: 741 µs per loop

Counter.update is way faster than +=, even if you re-add the overhead to create a new Counter:

In [10]: %timeit c=Counter(a) ; c.update(Counter(b))
1000 loops, best of 3: 1.16 ms per loop

In [11]: %timeit c=Counter(a) ; c.update({x:1 for x in b})
1000 loops, best of 3: 844 µs per loop


Would it be welcome if i add __iadd__ and __isub__ which just invoke update and subtract?

One reason not to have this is the current __add__ and __sub__ behavior of min values > 0, which from a set-theoretic standpoint makes sense, but has a certain potential of confusing developers who just use it as a simplistic Stats module.
msg236486 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-02-24 10:32
Support of in-place math operators is already added in issue13121.
msg236489 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-02-24 12:44
However it is possible to speed up the implementation. Proposed patch increases performance of Counter operators from 20% to 1200%.

Unpatched        Patched

   422 (-3%)         411 c = Counter(a)
   260 (-4%)         251 c = Counter(b)
 15568 (-118%)      7155 c = Counter(a); c + b
 11536 (-56%)       7377 c = Counter(a); c - b
 15354 (-190%)      5291 c = Counter(a); c | b
 11291 (-60%)       7043 c = Counter(a); c & b
  8176 (-22%)       6712 c = Counter(a); c += b
 21976 (-162%)      8379 c = Counter(a); c -= b
  6090 (-24%)       4895 c = Counter(a); c |= b
 16346 (-34%)      12226 c = Counter(a); c &= b
 17125 (-1160%)     1359 +a
 10484 (-289%)      2693 +c  # c = Counter(); c.subtract(a)
  3325 (-234%)       997 -a
 10094 (-56%)       6480 -c  # c = Counter(); c.subtract(a)
msg236507 - (view) Author: Jörn Hees (joern) * Date: 2015-02-24 15:49
cool

minor question:
- in the given patch __add__ uses __iadd__, but __sub__ doesn't use __isub__, which seems a bit weird.

maybe off-topic, but maybe not, because of _keep_positive(self):
- is there place for a non multi-set centric "Stats" object which is like Counter but with + and - actually behaving without the (in my use cases of Counter often counter intuitive) "> 0" stuff? (pun intended ;) ) Counter feels like a sub-class of Stats with the added _keep_positive(self).
msg236544 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-02-24 21:05
> - in the given patch __add__ uses __iadd__, but __sub__ doesn't use
> __isub__, which seems a bit weird.

If Counters are positive (common case), then the result of addition is not 
less than both summands. a + b is a and may be additional elements from b.
In the case of substraction a - b can be less than a and may be much less than 
a. In this case it is cheaper to create empty Counter and copy only those 
elements from a that are not in b, than copy all a and then remove almost all 
elements.

Relative efficiency depends on input data, and for some input data implementing 
__sub__ via __isub__ can be more efficient.

> - is there place for a non multi-set centric "Stats" object which is like
> Counter but with + and - actually behaving without the (in my use cases of
> Counter often counter intuitive) "> 0" stuff? (pun intended ;) ) Counter
> feels like a sub-class of Stats with the added _keep_positive(self).

I'm sure there is such class in third-party modules. Counter wouldn't have 
much benefit from inheriting "Stats", because it would need to override almost 
all methods.
msg236655 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-02-26 08:53
Applied optimizations:

1) Used cached get() method instead of indexing. This optimization already was used in update() and subtract().
2) _keep_positive() is optimized for the case when most counts are not positive (common case for substraction and intersection).
3) __add__ and __or__ are defined via inplace operations which are faster (due to fast copying and _keep_positive()).
4) Inlined and simplified the code for __pos__ and __neg__.

May be following optimization can be made by implementing _keep_positive() in C.
msg239380 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-27 08:26
Added explanation comments to address Victor's comment.
msg243018 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-12 21:23
Could you please look at the patch Raymond? There are only few days are left to the feature freeze.
msg243854 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2015-05-22 22:00
Sorry, I don't want to any of these changes (though it is a close call on a couple of them).

Before the particulars, here are some high-level thoughts (not definitive).  I would like to confine the optimizations and complexities to the more important parts of the API (actually counting as opposed to counter-to-counter operations).  Also, I don't want to preclude some of the future possibilities under consideration (for example, I am leaning toward guaranteeing the order of updates so that the OrderedCounter recipe has guaranteed behavior).  Also, I'm considering removing the existing self.get(elem, 0) in update() and substract() so that subclassers can usefully override/extend the __missing__ method to return other types (like decimals, fractions, etc) or have other behaviors like logging missing entries, etc.  And the self.get optimization doesn't seem to perform well under PyPy in contrast to an inherited __getitem__.  The current code choices were biased towards simplicity, space-over-speed, and keeping a predictable operation order where possible.

Particulars:

1) get() bound method optimization:  This is a close call.  We already use this in update() and subtract() though I'm thinking of removing those two cases.  Switching from c[k] to c.get(k, 0) is a semantic change that affects subclasses that define, __getitem__(), get(), or __missing__().  Speedwise, c.get() is faster than a fallback to __missing__() for missing keys; conversely, the inherited __getitem__() is faster than c.get(k, 0) when the keys are present.  There is some room for debate about which is the common case (it really depends on what your application is) and I would prefer at this point not to shift assumptions about is more common.  Clarity-wise:  The square brackets are clearer than the boundmethod trick which I would like to use only where it really matters.

2)  The current _keep_positive() is shorter, clearer, maintains order for the OrderedCounter use case, and is more space-efficient (never using more space than the original counter and intentionally choosing to remove elements rather than building a new one from scratch).  This is how it was done in setobject.c for the same reasons.

3) Other than brevity, I don't see any advantage to __add__ and __or__ being defined via inplace operations.  That is a semantic change that can affect subclassers, violating the open-closed-principle (I want people to be able to override/extend the in-place methods without unintentionally breaking add/or methods).  Also, the current approach has a space saving bias (not storing negative counts in the first place rather than using a follow-on call to _keep_positive pass to eliminate the negatives after they have been stored).

4) The code expansion for __pos__ and __neg__ grows the code and is less clear (IMO).  The change for __pos__ scrambles the order, interfering with the OrderedCounter example.  Also, I want the meaning of +c to be the same as c plus an empty counter (at least, that is how I think of the operation).  FWIW, the unary plus operation was intended to be a trigger for _keep_positive.  It was modeled after the unary plus in Decimal which serves the purpose of triggering rounding.

I'm sure there is room for argument about any one of the points above, some are just judgment calls.  I'm closing this because the OP's original concern about wanting an in-place operation was already solved and because the proposed optimizations slightly change semantics, aren't really the important part of the API, the snarl the code a bit, and they interfere with some future directions I want keep open.  Also, I've already spent several hours to reviewing this patch and need to return attention to other matters.
msg243904 - (view) Author: Jörn Hees (joern) * Date: 2015-05-23 09:46
> I'm closing this because the OP's original concern about wanting an in-place operation was already solved

Was it? Are you referring to http://bugs.python.org/issue13121 ?

My main concern was that += is considerably slower than .update(), kind of catching me off-guard. As you closed this, i'd be very happy if you could maybe add a note to the docs https://docs.python.org/3/_sources/library/collections.txt that points this behavior out. Maybe by changing this:

    * The multiset methods are designed only for use cases with positive values.
      The inputs may be negative or zero, but only outputs with positive values
      are created.  There are no type restrictions, but the value type needs to
      support addition, subtraction, and comparison.

    * The :meth:`elements` method requires integer counts.  It ignores zero and
      negative counts.


to this:

    * The multiset methods (``+``, ``-`` and ``+=``, ``-=``) are designed only
      for use cases with positive values.
      The inputs may be negative or zero, but only outputs with positive values
      are created.  There are no type restrictions, but the value type needs to
      support addition, subtraction, and comparison.

    * Because of the necessary additional checks for positive values, a ``c += d``
      can be considerably slower than a ``c.update(d)``.

    * The :meth:`elements` method requires integer counts.  It ignores zero and
      negative counts.
msg243935 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-23 18:44
The OrderedCounter recipe doesn't support well multiset operations, because the result type is hardcoded to Counter. The order is already scrambled. update() and substract() don't work well with overloaded __missing__().

Proposed implementations of __add__ and __or__ simplify the code. If you don't want that overloaded inplace operation affect non-inplace operations (I consider this rather as a benefit), Counter.__iadd__(result, other) can be used instead of result += other.

Optimized __neg__ just contains inlined substraction (note that current implementation of __neg__ and __pos__ violate the open-closed-principle), with removed no-op code.

My second step would be to add C implementation of _keep_positive(), because this function is used in a number of multiset methods. It could be more efficient and preserve the order.

In any case thank you for spending your time Raymond.
msg244052 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2015-05-25 22:37
The change to __neg__ looked like a nice improvement and the same should technique can be done to __pos__.   Attaching a patch for those two.
msg244115 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-05-26 17:35
New changeset 5780ee2c18e1 by Raymond Hettinger in branch 'default':
Issue #23509: Speed up Counter operators
https://hg.python.org/cpython/rev/5780ee2c18e1
msg244129 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-26 19:20
Perhaps update __pos__ docstring?
msg244444 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-05-30 05:14
New changeset fe4efc0032b5 by Raymond Hettinger in branch '3.5':
Issue #23509: Speed up Counter operators
https://hg.python.org/cpython/rev/fe4efc0032b5
History
Date User Action Args
2022-04-11 14:58:13adminsetgithub: 67697
2015-05-30 05:14:20python-devsetmessages: + msg244444
2015-05-26 19:21:12serhiy.storchakasetmessages: - msg244128
2015-05-26 19:20:44serhiy.storchakasetmessages: + msg244129
2015-05-26 19:16:56serhiy.storchakasetmessages: + msg244128
2015-05-26 17:35:25python-devsetnosy: + python-dev
messages: + msg244115
2015-05-25 22:37:17rhettingersetfiles: + counter_pos_neg.diff

messages: + msg244052
versions: + Python 3.6, - Python 3.5
2015-05-23 18:44:35serhiy.storchakasetmessages: + msg243935
2015-05-23 09:46:17joernsetmessages: + msg243904
2015-05-22 22:00:49rhettingersetstatus: open -> closed
resolution: rejected
messages: + msg243854
2015-05-12 21:23:57serhiy.storchakasetkeywords: + needs review

messages: + msg243018
2015-03-27 08:26:23serhiy.storchakasetfiles: + counter_faster_2.patch

messages: + msg239380
2015-02-26 08:53:28serhiy.storchakasetmessages: + msg236655
2015-02-26 08:17:08rhettingersetassignee: rhettinger
2015-02-24 21:05:37serhiy.storchakasetmessages: + msg236544
2015-02-24 15:49:19joernsetmessages: + msg236507
2015-02-24 12:45:08serhiy.storchakasetfiles: + counter_bench.py
2015-02-24 12:44:22serhiy.storchakasetfiles: + counter_faster.patch

components: + Library (Lib)
title: Counter.__iadd__ falls back to __add__ with bad performance impact -> Speed up Counter operators
keywords: + patch
resolution: out of date -> (no value)
versions: + Python 3.5, - Python 2.7, Python 3.4
messages: + msg236489
stage: patch review
2015-02-24 10:32:30serhiy.storchakasetnosy: + rhettinger, serhiy.storchaka
resolution: out of date
messages: + msg236486
2015-02-24 09:43:42joerncreate