classification
Title: C accelerator for collections.Counter is slow
Type: performance Stage: patch review
Components: Library (Lib) Versions: Python 3.4, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: Anoop.Thomas.Mathew, jcea, pconnell, python-dev, rhettinger, scoder, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2013-07-30 06:49 by scoder, last changed 2013-10-04 23:53 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
counterbench.py serhiy.storchaka, 2013-07-30 12:48 Benchmark different methods of counting
collections_Counter_without_C_accelerator_with_comprehension.patch Anoop.Thomas.Mathew, 2013-09-15 18:38 collections.Counter without C accelerator patch review
performance_comparision.csv Anoop.Thomas.Mathew, 2013-09-15 18:39 performance comparison with and without C accelerator
fix_counter.diff rhettinger, 2013-09-30 07:39 Repair the test for choosing the fast path review
fix_counter2.diff rhettinger, 2013-10-01 08:45 Make the slow path match the pure Python version. review
Messages (14)
msg193914 - (view) Author: Stefan Behnel (scoder) * Date: 2013-07-30 06:49
The C accelerator for the collections.Counter class (_count_elements() in _collections.c) is slower than the pure Python versions for data that has many unique entries. This is because the fast path for dicts is not taken (Counter is a subtype of dict) and the slower fallback path raises exceptions for each value that wasn't previously seen. This can apparently make it slower than calling get() on Python side.

My suggestion is to drop the fallback path from the accelerator completely and to only call the C function when it's safe to use it, e.g. when "type(self) is Counter" and not a subclass.
msg193928 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-07-30 12:48
Are there any cases where the counter class with the C accelerator is faster than the pure Python version? Here is a benchmarking script (modification of Roy Smith's script [1]) and looks as the pure Python version is faster even for data that has not many unique entries.

[1] http://permalink.gmane.org/gmane.comp.python.general/738820
msg193929 - (view) Author: Eli Bendersky (eli.bendersky) * (Python committer) Date: 2013-07-30 12:53
That sounds like a good idea, Stefan.
msg197806 - (view) Author: Anoop Thomas Mathew (Anoop.Thomas.Mathew) * Date: 2013-09-15 18:38
40% faster collections.Counter() . Removed C accelerator. Patch attached. Passes all tests. Results comparison follows.
msg197807 - (view) Author: Anoop Thomas Mathew (Anoop.Thomas.Mathew) * Date: 2013-09-15 18:39
Performance comparison with and without patch applied.
msg198675 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-09-30 04:28
A C-accelerator should ALWAYS be able to beat a pure python version if it does the same steps but without the overhead of the eval-loop.  And in special cases such as type(self)==Counter, it can do much better.

To resolve this report, the C accelerator needs to be fixed.  As Stefan pointed-out, the fast-path isn't being triggered because of PyDict_CheckExact test.  And, the fallback path can be sped-up as well (by doing the same steps in C as are being done with the pure python code).
msg198683 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-09-30 08:25
Repaired version
----------------
$ py -m timeit -s 'from random import seed, randrange; seed(8675309); data=[randrange(1000) for i in range(100000)]; from collections import Counter'  'Counter(data)'
100 loops, best of 3: 14.3 msec per loop
$ py -m timeit -s 'from random import seed, randrange; seed(8675309); data=[randrange(500000) for i in range(100000)]; from collections import Counter'  'Counter(data)'
10 loops, best of 3: 40.8 msec per loop

Current with accelerator
------------------------
$ py -m timeit -s 'from random import seed, randrange; seed(8675309); data=[randrange(1000) for i in range(100000)]; from collections import Counter'  'Counter(data)'
10 loops, best of 3: 61.7 msec per loop
$ py -m timeit -s 'from random import seed, randrange; seed(8675309); data=[randrange(500000) for i in range(100000)]; from collections import Counter'  'Counter(data)'
10 loops, best of 3: 118 msec per loop

Current without accelerator
---------------------------
$ py -m timeit -s 'from random import seed, randrange; seed(8675309); data=[randrange(1000) for i in range(100000)]; from collections import Counter'  'Counter(data)'
10 loops, best of 3: 54.9 msec per loop
$ py -m timeit -s 'from random import seed, randrange; seed(8675309); data=[randrange(500000) for i in range(100000)]; from collections import Counter'  'Counter(data)'
10 loops, best of 3: 80.8 msec per loop
msg198685 - (view) Author: Stefan Behnel (scoder) * Date: 2013-09-30 09:24
Patch LGTM and seems to work well, according to your numbers.

Only minor nitpick would be that the method references could be decref-ed earlier, but that would complicate the code a bit.
msg198712 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-09-30 18:07
Benchmarking results look great.

But isn't _PyObject_LookupSpecial() more appropriate function for special methods lookup than PyObject_GetAttrString()?
msg198753 - (view) Author: Roundup Robot (python-dev) Date: 2013-10-01 08:01
New changeset 6aef095fdb30 by Raymond Hettinger in branch '3.3':
Issue #18594: Fix the fast path for collections.Counter().
http://hg.python.org/cpython/rev/6aef095fdb30
msg198756 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-10-01 08:45
Attaching a patch for the slow path.  Makes the code exactly match the pure python version.  This kicks in whether someone has subclassed Counter and overridden either __getitem__ or __setitem__.
msg198757 - (view) Author: Stefan Behnel (scoder) * Date: 2013-10-01 08:53
Can you update the benchmark numbers to show what the difference is compared to pure Python (and to the fastpath) now?

One more thing: the fastpath depends on .__getitem__() and friends, whereas the fallback path depends on .get(). What if someone overrides .get() but not .__getitem__()?  (Might be a hypothetical case...)
msg198817 - (view) Author: Roundup Robot (python-dev) Date: 2013-10-02 04:38
New changeset 1ee6f8a96fb9 by Raymond Hettinger in branch '3.3':
Issue #18594:  Fix the fallback path in collections.Counter().
http://hg.python.org/cpython/rev/1ee6f8a96fb9
msg198971 - (view) Author: Roundup Robot (python-dev) Date: 2013-10-04 23:53
New changeset e4cec1116e5c by Raymond Hettinger in branch '3.3':
Issue #18594:  Make the C code more closely match the pure python code.
http://hg.python.org/cpython/rev/e4cec1116e5c
History
Date User Action Args
2013-10-04 23:53:36python-devsetmessages: + msg198971
2013-10-02 04:44:43rhettingersetstatus: open -> closed
resolution: fixed
2013-10-02 04:38:48python-devsetmessages: + msg198817
2013-10-01 08:53:23scodersetmessages: + msg198757
2013-10-01 08:45:57rhettingersetfiles: + fix_counter2.diff

messages: + msg198756
2013-10-01 08:01:14python-devsetnosy: + python-dev
messages: + msg198753
2013-09-30 18:07:16serhiy.storchakasetmessages: + msg198712
2013-09-30 09:24:25scodersetmessages: + msg198685
2013-09-30 08:25:26rhettingersetmessages: + msg198683
2013-09-30 07:39:44rhettingersetfiles: + fix_counter.diff
2013-09-30 07:39:24rhettingersetfiles: - fix_counter.diff
2013-09-30 07:30:18rhettingersetfiles: + fix_counter.diff
stage: needs patch -> patch review
2013-09-30 04:28:47rhettingersetstage: needs patch
messages: + msg198675
versions: + Python 3.3
2013-09-30 03:36:56rhettingersetpriority: high -> normal
assignee: rhettinger
2013-09-28 15:10:59eli.benderskysetnosy: - eli.bendersky
2013-09-15 18:39:19Anoop.Thomas.Mathewsetfiles: + performance_comparision.csv

messages: + msg197807
2013-09-15 18:38:29Anoop.Thomas.Mathewsetfiles: + collections_Counter_without_C_accelerator_with_comprehension.patch

nosy: + Anoop.Thomas.Mathew
messages: + msg197806

keywords: + patch
2013-08-03 09:34:45pconnellsetnosy: + pconnell
2013-08-03 03:41:36jceasetnosy: + jcea
2013-07-30 12:53:03eli.benderskysetnosy: + eli.bendersky
messages: + msg193929
2013-07-30 12:48:43serhiy.storchakasetfiles: + counterbench.py

messages: + msg193928
2013-07-30 10:06:30pitrousetpriority: normal -> high
nosy: + rhettinger
2013-07-30 06:49:20scodercreate