This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: collections.Counter - Cast list of keys into set to remove iteration over duplicate elements for __le__,__ge__ and __eq__
Type: performance Stage: resolved
Components: Library (Lib) Versions: Python 3.9, Python 3.8
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: argoop1728, rhettinger
Priority: normal Keywords: patch

Created on 2021-12-09 01:52 by argoop1728, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 29999 closed argoop1728, 2021-12-09 01:53
Messages (3)
msg408063 - (view) Author: Rahul Gupta (argoop1728) * Date: 2021-12-09 01:52
On lines 725, 737 and 749 there is the following code:

'''for c in (self, other) for e in c''' which generates an iterable with all the keys in self and other - casting c to a set will remove duplicates and allow faster iteration - some minor benchmarks I ran seem to agree.
msg408064 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-12-09 02:19
> casting c to a set will remove duplicates and allow faster iteration

Sorry, but this doesn't make any sense.  The *c* is either *self* or *other*, both of which are instances of Counter which is itself a subclass of dict.  So, the input cannot have duplicates keys.

> some minor benchmarks I ran seem to agree.

I'm dubious about the minor benchmarks.  If in fact the effect is real, it is merely exploiting an implementation quirk which is tenuous and subject to change (the premise would be that converting to a set and looping over a set is faster than the native dict iterator for a dict subclass).  Conceptually, it is always worse to spend the time and space for first converting to a set.

Besides a speed consideration, there is also a space consideration.  The existing code does not use any auxiliary memory.  The proposed code unnecessarily builds two new sets and then throws them away.

Thanks for the suggestion, but I am going to decline.  The timings seem dubious.  Conceptually, the PR makes the methods do more work.  To the extent some timing difference can be observed, it is likely an implementation quirk. The PR does not make the code cleaner or clearer, and it loops over a dict subclass in an unconventional way.  Also, the PR would have a negative impact on memory usage.
msg408065 - (view) Author: Rahul Gupta (argoop1728) * Date: 2021-12-09 02:27
After looking at this again, I agree with you - the key duplication issue seems to have gone. Thank you for providing this feedback, it is very helpful.
History
Date User Action Args
2022-04-11 14:59:53adminsetgithub: 90177
2021-12-09 02:27:54argoop1728setmessages: + msg408065
2021-12-09 02:20:27rhettingersetstatus: open -> closed
resolution: not a bug
stage: patch review -> resolved
2021-12-09 02:19:44rhettingersetnosy: + rhettinger
messages: + msg408064
2021-12-09 01:53:27argoop1728setkeywords: + patch
stage: patch review
pull_requests: + pull_request28222
2021-12-09 01:52:20argoop1728create