Author eisele
Recipients eisele
Date 2008-04-10.16:21:39
SpamBayes Score 0.0358361
Marked as misclassified No
Message-id <1207844501.86.0.649423716831.issue2607@psf.upfronthosting.co.za>
In-reply-to
Content
I need to count pairs of strings, and I use 
a defaultdict in a construct like

count[a,b] += 1

I am able to count 50K items per second on a very fast machine,
which is way too slow for my application.

If I count complete strings like

count[ab] += 1

it can count 500K items/second, which is more reasonable.

I don't see why there is a performance penalty of a factor
of 10 for such a simple construct.

Do I have to switch to Perl or C to get this done???

Thanks a lot for any insight on this.

Best regards,
Andreas

PS.: The problem seems to exist for ordinary
dicts as well, it is not related to the fact that
I use a defaultdict

PPS: I also tried nested defaultdicts
count[a][b] += 1
and get the same slow speed (and 50% more memory consumption)
History
Date User Action Args
2008-04-10 16:21:42eiselesetspambayes_score: 0.0358361 -> 0.0358361
recipients: + eisele
2008-04-10 16:21:41eiselesetspambayes_score: 0.0358361 -> 0.0358361
messageid: <1207844501.86.0.649423716831.issue2607@psf.upfronthosting.co.za>
2008-04-10 16:21:40eiselelinkissue2607 messages
2008-04-10 16:21:39eiselecreate