Author joern
Recipients joern
Date 2015-02-24.09:43:41
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1424771022.43.0.932025143765.issue23509@psf.upfronthosting.co.za>
In-reply-to
Content
This caught me when i worked on a stats script and updated c = Counter() in a loop with c += Counter(temp_dict). After profiling, i found out that my script spent 2 minutes in Counter.__add__ which seemed terribly slow (each invocation was in the ms not µs range).

Looking into this i found out that there is no implementation for Counter.__iadd__ (or __isub__) which i just assumed to be based on Counter.update (Counter.subtract).

Here's some timing output:

In [1]: from collections import Counter

In [2]: a = range(1000)

In [3]: b = range(1000)

In [4]: ab = a+b

In [5]: len(ab)
Out[5]: 2000

In [6]: %timeit c=Counter(a)
1000 loops, best of 3: 361 µs per loop

In [7]: %timeit c=Counter(ab)
1000 loops, best of 3: 734 µs per loop

In [8]: %timeit c=Counter(a) ; c += Counter(b)
1000 loops, best of 3: 1.51 ms per loop

In [9]: %timeit c=Counter(a) ; c.update(b)
1000 loops, best of 3: 741 µs per loop

Counter.update is way faster than +=, even if you re-add the overhead to create a new Counter:

In [10]: %timeit c=Counter(a) ; c.update(Counter(b))
1000 loops, best of 3: 1.16 ms per loop

In [11]: %timeit c=Counter(a) ; c.update({x:1 for x in b})
1000 loops, best of 3: 844 µs per loop


Would it be welcome if i add __iadd__ and __isub__ which just invoke update and subtract?

One reason not to have this is the current __add__ and __sub__ behavior of min values > 0, which from a set-theoretic standpoint makes sense, but has a certain potential of confusing developers who just use it as a simplistic Stats module.
History
Date User Action Args
2015-02-24 09:43:42joernsetrecipients: + joern
2015-02-24 09:43:42joernsetmessageid: <1424771022.43.0.932025143765.issue23509@psf.upfronthosting.co.za>
2015-02-24 09:43:42joernlinkissue23509 messages
2015-02-24 09:43:41joerncreate