Message145063
I've found some counterintuitive behavior in collections.Counter while hacking on the scikit-learn project [1]. I wanted to use a bunch of Counters to do some simple term counting in a set of documents, roughly as follows:
count_total = Counter()
for doc in documents:
count_current = Counter(analyze(doc))
count_total += count_current
count_per_doc.append(count_current)
Performance was horrible. After some digging, I found out that Counter [2] does not have __iadd__ and += copies the entire left-hand side in __add__. I've attached a patch that fixes the issue (for += only, and I've not patched the testsuite.)
[1] https://github.com/scikit-learn/scikit-learn/commit/de6e93094499e4d81b8e3b15fc66b6b9252945af |
|
Date |
User |
Action |
Args |
2011-10-07 08:50:34 | lars | set | recipients:
+ lars |
2011-10-07 08:50:34 | lars | set | messageid: <1317977434.46.0.626100406816.issue13121@psf.upfronthosting.co.za> |
2011-10-07 08:50:33 | lars | link | issue13121 messages |
2011-10-07 08:50:33 | lars | create | |
|