Author lars
Recipients lars
Date 2011-10-07.08:50:32
SpamBayes Score 0.000504677
Marked as misclassified No
Message-id <1317977434.46.0.626100406816.issue13121@psf.upfronthosting.co.za>
In-reply-to
Content
I've found some counterintuitive behavior in collections.Counter while hacking on the scikit-learn project [1]. I wanted to use a bunch of Counters to do some simple term counting in a set of documents, roughly as follows:

   count_total = Counter()
   for doc in documents:
       count_current = Counter(analyze(doc))
       count_total += count_current
       count_per_doc.append(count_current)

Performance was horrible. After some digging, I found out that Counter [2] does not have __iadd__ and += copies the entire left-hand side in __add__. I've attached a patch that fixes the issue (for += only, and I've not patched the testsuite.)

[1] https://github.com/scikit-learn/scikit-learn/commit/de6e93094499e4d81b8e3b15fc66b6b9252945af
History
Date User Action Args
2011-10-07 08:50:34larssetrecipients: + lars
2011-10-07 08:50:34larssetmessageid: <1317977434.46.0.626100406816.issue13121@psf.upfronthosting.co.za>
2011-10-07 08:50:33larslinkissue13121 messages
2011-10-07 08:50:33larscreate