Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

collections.Counter's += copies the entire object #57330

Closed
lars mannequin opened this issue Oct 7, 2011 · 5 comments
Closed

collections.Counter's += copies the entire object #57330

lars mannequin opened this issue Oct 7, 2011 · 5 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@lars
Copy link
Mannequin

lars mannequin commented Oct 7, 2011

BPO 13121
Nosy @rhettinger, @akheron
Files
  • cpython-counter-iadd.diff: Patch that adds iadd to collections.Counter
  • counter.diff: Patch for in-place multiset operations
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/rhettinger'
    closed_at = <Date 2011-10-19.20:41:20.641>
    created_at = <Date 2011-10-07.08:50:33.835>
    labels = ['type-feature', 'library']
    title = "collections.Counter's += copies the entire object"
    updated_at = <Date 2011-10-19.20:41:20.640>
    user = 'https://bugs.python.org/lars'

    bugs.python.org fields:

    activity = <Date 2011-10-19.20:41:20.640>
    actor = 'rhettinger'
    assignee = 'rhettinger'
    closed = True
    closed_date = <Date 2011-10-19.20:41:20.641>
    closer = 'rhettinger'
    components = ['Library (Lib)']
    creation = <Date 2011-10-07.08:50:33.835>
    creator = 'lars'
    dependencies = []
    files = ['23336', '23457']
    hgrepos = []
    issue_num = 13121
    keywords = ['patch']
    message_count = 5.0
    messages = ['145063', '145065', '145067', '145071', '145959']
    nosy_count = 4.0
    nosy_names = ['rhettinger', 'python-dev', 'petri.lehtinen', 'lars']
    pr_nums = []
    priority = 'low'
    resolution = 'fixed'
    stage = 'patch review'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue13121'
    versions = ['Python 3.3']

    @lars
    Copy link
    Mannequin Author

    lars mannequin commented Oct 7, 2011

    I've found some counterintuitive behavior in collections.Counter while hacking on the scikit-learn project [1]. I wanted to use a bunch of Counters to do some simple term counting in a set of documents, roughly as follows:

       count_total = Counter()
       for doc in documents:
           count_current = Counter(analyze(doc))
           count_total += count_current
           count_per_doc.append(count_current)

    Performance was horrible. After some digging, I found out that Counter [2] does not have __iadd__ and += copies the entire left-hand side in __add__. I've attached a patch that fixes the issue (for += only, and I've not patched the testsuite.)

    [1] scikit-learn/scikit-learn@de6e930

    @lars lars mannequin added type-bug An unexpected behavior, bug, or error stdlib Python modules in the Lib dir labels Oct 7, 2011
    @akheron
    Copy link
    Member

    akheron commented Oct 7, 2011

    This is slightly backwards incompatible, as some people may depend on the old behavior. Otherwise sounds like a good idea to me.

    @mdickinson mdickinson added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Oct 7, 2011
    @lars
    Copy link
    Mannequin Author

    lars mannequin commented Oct 7, 2011

    If this is not implemented because it is backwards incompat, then it might be useful to add a note to update's docstring explaining that it is much more efficient than +=. I was very surprised that it took *minutes* to add a few thousand moderate-sized Counters.

    @rhettinger
    Copy link
    Contributor

    I'll add the in-place methods including __iadd__, __isub__, __iand__, and __ior__.

    If speed is your issue, you should continue to use the update() method which will always be faster because it doesn't have a step to strip zeros and negative values from the existing Counter. Also, update() has a fast-path written in C.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 19, 2011

    New changeset 5cced40374df by Raymond Hettinger in branch 'default':
    Issue bpo-13121: Support in-place math operators for collections.Counter().
    http://hg.python.org/cpython/rev/5cced40374df

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants