classification
Title: why is (default)dict so slow on tuples???
Type: performance Stage:
Components: Versions: Python 2.5
process
Status: closed Resolution: works for me
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, belopolsky, eisele, loewis, rhettinger
Priority: normal Keywords:

Created on 2008-04-10 16:21 by eisele, last changed 2008-04-12 04:20 by rhettinger. This issue is now closed.

Files
File name Uploaded Description Edit
a.py loewis, 2008-04-10 19:15
Messages (10)
msg65293 - (view) Author: Andreas Eisele (eisele) Date: 2008-04-10 16:21
I need to count pairs of strings, and I use 
a defaultdict in a construct like

count[a,b] += 1

I am able to count 50K items per second on a very fast machine,
which is way too slow for my application.

If I count complete strings like

count[ab] += 1

it can count 500K items/second, which is more reasonable.

I don't see why there is a performance penalty of a factor
of 10 for such a simple construct.

Do I have to switch to Perl or C to get this done???

Thanks a lot for any insight on this.

Best regards,
Andreas

PS.: The problem seems to exist for ordinary
dicts as well, it is not related to the fact that
I use a defaultdict

PPS: I also tried nested defaultdicts
count[a][b] += 1
and get the same slow speed (and 50% more memory consumption)
msg65296 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2008-04-10 16:37
My guess is that this is due to the fact that strings cache their hash
values while tuples don't.  IIRC, caching tuple's hash values has been
proposed and rejected some time ago.  Furthermore, dict lookups are
heavily optimized for the case when all keys are strings.
msg65309 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-04-10 19:15
I cannot reproduce this. The attached script for me gives for 1M
iterations a total time of 0.8s, on 3.2GHz Pentium 4.
msg65313 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-04-10 19:49
Questions about relative performance should be directed to 
comp.lang.python.

Since no bug or suspect behavior was reported, closing as not-a-bug.
msg65332 - (view) Author: Andreas Eisele (eisele) Date: 2008-04-11 02:46
Sorry for not giving a good example in the first place.
The problem seems to appear only in the presence of
sufficiently many distinct tuples. Then I see performance
that looks rather like O(n*n)
Here is an example that shows the problem:

>>> from time import clock
>>> d = {}
>>> t0 = clock()
>>> for i in range(5):
 for j in range(i*1000000,(i+1)*1000000):
  d[str(j),str(j)]=j
 print clock()-t0


13.04
39.51
81.86
134.18
206.66
>>>

The same example with str(j)+str(j) works fine.

Sorry if this should be a non-issue. For me it is a
reason to implement functionality in C or Perl
that I would really love to do in Python.
I would call such a thing a performance bug, but
maybe I'm just too demanding...

Best regards,
Andreas
msg65341 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-04-11 08:44
The slowdown is because of the garbage collector, which has more and
more objects to traverse (the tuples).
If I add "import gc; gc.disable()" at the beginning of your script, it
runs much faster, and the timings look linear.

Martin's sample is not affected, because there are very few
deallocations, and the gc collection is not triggered.

Disabling the gc may not be a good idea in a real application; I suggest
you to play with the gc.set_threshold function and set larger values, at
least while building the dictionary. (700, 1000, 10) seems to yield good
results.
msg65343 - (view) Author: Andreas Eisele (eisele) Date: 2008-04-11 09:18
Great, that really solves my problem.

Thank you so much, Amaury!

As you say, the problem is unrelated to dicts,
and I observe it also when including the tuples to
a set or keeping them in lists.
Perhaps your GC thresholds would be better default
values than what is currently in force.
msg65374 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-04-11 21:27
> Perhaps your GC thresholds would be better default
> values than what is currently in force.

No, the defaults are correct for typical applications.
msg65388 - (view) Author: Andreas Eisele (eisele) Date: 2008-04-12 04:01
Even if they mean that creation
of a huge number N of objects
requires O(N*N) effort?
msg65390 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-04-12 04:20
This discussion ought to be moved to comp.lang.python.  The timing 
script needs work.  It doesn't isolate any one issue for discussion.  
It is affected by GC, dict resizing, details of creating and hashing 
string objects, the need to periodically call the system malloc, etc. 

You get different results if:
* a new dictionary is created on each pass
* if any of the objects get reused
* if range(5) gets replaces with [0]*5
* if you use longs instead of ints
* if the strings are of fixed length:  str(j)[:5]

This is the wrong forum to explain every nuance of what affects 
performance and whether you think perl is better.
History
Date User Action Args
2008-04-12 04:20:19rhettingersetmessages: + msg65390
2008-04-12 04:01:43eiselesetmessages: + msg65388
2008-04-11 21:27:28loewissetmessages: + msg65374
2008-04-11 09:18:05eiselesetmessages: + msg65343
2008-04-11 08:44:36amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg65341
2008-04-11 02:46:14eiselesetmessages: + msg65332
2008-04-10 19:49:49rhettingersetstatus: open -> closed
nosy: + rhettinger
messages: + msg65313
2008-04-10 19:15:47loewissetfiles: + a.py
2008-04-10 19:15:23loewissetresolution: works for me
messages: + msg65309
nosy: + loewis
2008-04-10 16:37:51belopolskysetnosy: + belopolsky
messages: + msg65296
2008-04-10 16:21:40eiselecreate