Message236484
This caught me when i worked on a stats script and updated c = Counter() in a loop with c += Counter(temp_dict). After profiling, i found out that my script spent 2 minutes in Counter.__add__ which seemed terribly slow (each invocation was in the ms not µs range).
Looking into this i found out that there is no implementation for Counter.__iadd__ (or __isub__) which i just assumed to be based on Counter.update (Counter.subtract).
Here's some timing output:
In [1]: from collections import Counter
In [2]: a = range(1000)
In [3]: b = range(1000)
In [4]: ab = a+b
In [5]: len(ab)
Out[5]: 2000
In [6]: %timeit c=Counter(a)
1000 loops, best of 3: 361 µs per loop
In [7]: %timeit c=Counter(ab)
1000 loops, best of 3: 734 µs per loop
In [8]: %timeit c=Counter(a) ; c += Counter(b)
1000 loops, best of 3: 1.51 ms per loop
In [9]: %timeit c=Counter(a) ; c.update(b)
1000 loops, best of 3: 741 µs per loop
Counter.update is way faster than +=, even if you re-add the overhead to create a new Counter:
In [10]: %timeit c=Counter(a) ; c.update(Counter(b))
1000 loops, best of 3: 1.16 ms per loop
In [11]: %timeit c=Counter(a) ; c.update({x:1 for x in b})
1000 loops, best of 3: 844 µs per loop
Would it be welcome if i add __iadd__ and __isub__ which just invoke update and subtract?
One reason not to have this is the current __add__ and __sub__ behavior of min values > 0, which from a set-theoretic standpoint makes sense, but has a certain potential of confusing developers who just use it as a simplistic Stats module. |
|
Date |
User |
Action |
Args |
2015-02-24 09:43:42 | joern | set | recipients:
+ joern |
2015-02-24 09:43:42 | joern | set | messageid: <1424771022.43.0.932025143765.issue23509@psf.upfronthosting.co.za> |
2015-02-24 09:43:42 | joern | link | issue23509 messages |
2015-02-24 09:43:41 | joern | create | |
|