This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: frozen collection.Counter
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.9
process
Status: closed Resolution: postponed
Dependencies: Superseder:
Assigned To: Nosy List: aeros, phr, rhettinger
Priority: normal Keywords:

Created on 2020-04-27 23:59 by phr, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (14)
msg367472 - (view) Author: paul rubin (phr) Date: 2020-04-27 23:59
It would be nice to have frozen Counters analogous to frozensets, so they are usable as dictionary keys.  One can of course create frozenset(counter.items()) but that means the set items are tuples rather than the original set elements, so it's no longer quick to check membership.  I can work around this in my immediate application but it seems like a shortcoming.  There are some other set operations that aren't supported by counters either, that would be nice if they are conceptually multisets.
msg367475 - (view) Author: Kyle Stanley (aeros) * (Python committer) Date: 2020-04-28 00:42
Rather than starting this out as a bpo issue, I would highly recommend bringing forth a proposal for this in python-ideas@python.org and explaining in as much detail as possible:

1) How a frozen variation of collection.Counter would benefit your specific use case
2) The current available workaround you mentioned, issues with it, and the comparative benefit your proposal would provide
3) Why it should be included in the standard library, compared to a 3rd party package on PyPI

The 3rd one can be a bit tricky, but it typically involves explaining how widespread the potential use case(s) would be and how tricky it would be for users to implement something like this on their own (from both a functionality and performance perspective).

From a brief glance of the archives, it doesn't look like this specific idea was proposed, other than part of a more generalized one to add frozen equivalents for the entire collections module, which was rejected on the basis of the use cases being too theoretical and overly broad.

Also, while not specific to your particular proposal, I would recommend reading over the following post from Raymond Hettinger in the python-ideas archive (from the previously mentioned thread): https://mail.python.org/archives/list/python-ideas@python.org/message/QVBVJU4RNJ5MDKBJ5CNGINYG24WZDZX7/. It explains why the collections module tries to be minimalist, and specifically the requirement of new features providing a tangible, real-world benefit.

It's of course up to you how you decide to format the proposal, but by answering the above points and demonstrating clear value in specific use cases, it has a much better chance of eventually making into the standard library (or at the least, a constructive discussion about it).
msg367476 - (view) Author: Kyle Stanley (aeros) * (Python committer) Date: 2020-04-28 00:45
Also, adding Raymond to the nosy list in case he has any specific comments about a frozen collections.Counter.
msg367482 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-04-28 02:17
Do you want to freeze a counter to prevent future count changes or bto make it to be hashable?  Essentially the only reason we have frozenset is to use sets as dict keys to model graphs, but that wouldn't make much sense for counters.

What multiset methods do you need? We already have addition, subtraction, union, and intersection.  A symmetric_difference doesn't seem to make sense or have a use case.   AFAICT, all that is missing are subset/superset tests which were omitted because their semantics conflict with the existing dict equality semantics and because the use cases are rare.
msg367488 - (view) Author: paul rubin (phr) Date: 2020-04-28 03:35
Yes, the idea was for them to be hashable, to be used as dict keys.  E.g. if you use frozensets to model graphs, you'd use frozen multisets for hypergraphs.  My immediate use case involved word puzzles, e.g. treating words as bags of scrabble tiles with letters on them, and finding out what other words you could form from the letters.  It is not a pressing need.  

I'm trying to remember what the other missing operation I ran into was, a few days ago.  It may have been subset.  I found a workaround but it seemed a bit ugly.  It would be nice if Counters could also be thought of as multisets which can do the same things sets can do with unneeded head scratching.  It seems like counters and multisets/bags really are different things conceptually.  Their operations and implementations overlap, but they are different.  E.g. it can make sense for a counter to have a negative count of something, but not for a bag.
msg367489 - (view) Author: paul rubin (phr) Date: 2020-04-28 03:50
Kyle, thanks, I saw your comment after posting my own, and I looked at Raymond's mailing list post that you linked.  I do think that "completing the grid" is a good thing in the cases where it's obvious how to do it (if there's one and one obvious way, then doing it that way should work), and having missing combinations in the grid is also a form of interface complexity (expect subset to work, find that it doesn't, and implement some hack).  Also it seems to me that sets, dicts, bags are basic data objects in many languages these days, without a lot of design space to move around in.  So we don't have to worry much about constraining future choices.  We could just look at Smalltalk or Ruby or whatever and just do what they did.  

However, that is philosophical.  I did learn from Raymond's message that Counter doesn't really attempt to be a multiset implementation.  In that case, it works for what it is made for, so it really then is a separate question of whether implementing multisets properly is worthwhile.  (That's as opposed to saying we already have a multiset implementation but some functionality is missing from it).

I dislike the concept of shovelling off basic functionality to 3rd party libraries.  The rejection of that philosophy (Python's old motto "Batteries included") is one of the things that attracted me to Python in the first place.  Though that value is mostly historical now, I'm sad about the loss.  It's impossible to write a large Ruby or Javascript (npm) application with 100s of third party modules, every one of which is an attack vector ("supply chain attack" is the current buzzword), and  whose implementations vary widely in quality and usability.  I looked at the docs for Ruby's multiset gem (https://maraigue.hhiro.net/multiset/) and they are partly in Japanese.  Python has until the past few years managed to keep away from that kind of thing.
msg367491 - (view) Author: paul rubin (phr) Date: 2020-04-28 03:54
Oops I meant "*without* 100s of third party modules" in the case of ruby gems or npm.  There are just a few pip modules that I really use all the time, most notably bs4.  I continue to use urllib/urllib2 instead of requests because I'm used to them and because they eliminate an unnecessary dependency.
msg367492 - (view) Author: Vedran Čačić (veky) * Date: 2020-04-28 04:12
Just another usecase: one of my students is writing an implementation of ordinal arithmetic in Python as a graduate thesis. FrozenCounters whose keys are FrozenCounters and whose values are natural numbers are _exactly_ isomorphic (via Cantor's normal form) to countable ordinals below epsilon_0 (and if a Counter can have itself as a key, we can go above epsilon_0 too).

Examples: zero = f{}
          one = f{zero: 1}
          seven = f{zero: 7}
          omega = f{one: 1}
          w^7*3+5 = f{seven: 3, zero: 5}
          w^w^w = f{f{omega: 1}: 1}
          epsilon0 = f{epsilon0: 1}
          and so on

I realize this is not something everybody does, but just to show that the need exists. We started with https://github.com/tamuhey/python-frozen-counter, afterwards we rolled our own implementation. But if it were in the stdlib, it would have saved us a lot of work.
msg367495 - (view) Author: Kyle Stanley (aeros) * (Python committer) Date: 2020-04-28 05:00
paul rubin wrote:
> In that case, it works for what it is made for, so it really then is a separate question of whether implementing multisets properly is worthwhile.

Indeed, that seems to be the primary question to address. Personally, I don't have any strong opposition to including a multiset implementation in collections, I'm just not yet convinced they have enough widespread practical utility (that can't already be provided with Counter or a slightly modified version of it) to justify including them in collections.
 
> I dislike the concept of shovelling off basic functionality to 3rd party libraries.  The rejection of that philosophy (Python's old motto "Batteries included") is one of the things that attracted me to Python in the first place.  Though that value is mostly historical now, I'm sad about the loss.  It's impossible to write a large Ruby or Javascript (npm) application with 100s of third party modules, every one of which is an attack vector ("supply chain attack" is the current buzzword), and  whose implementations vary widely in quality and usability.

I can certainly sympathize with that sentiment of wanting to avoid excessive dependencies, and I think in general that we do still try to include more in the Python standard library compared to many other ecosystems. AFAIK, there's zero desire for the Python ecosystem to slowly turn into the "dependency hell" that NPM has become, with almost every library relying on a chain of minor 3rd party utilities.

However, rather than including everything that *might* be useful, the goal has been to include features that have substantial practical utility for a decent volume of users (and library maintainers), as long as they fit with the existing theme of a module (or are widely needed enough to justify a separate module) for a couple of primary reasons:

1) Making the stdlib modules as digestible as reasonably possible
2) Maximizing the ratio of practical benefit to implementation and long-term maintenance cost, which is especially important since development is voluntarily driven

Also, I would consider "batteries included" to still be a general goal, but it's practically impossible for the stdlib to fully cover what everyone would consider to be "basic functionality" while still maintaining its present quality. It's even more complicated by the fact that "basic functionality" will differ significantly depending upon who you ask, based upon what they personally use Python for and their unique experiences.

So, we tend to lean towards minimalism, focusing on implementing and improving features that have the most or strongest concrete use cases for a large volume of users. There are of course exceptions to this, such as features that are highly useful for a relatively small volume of users, but are difficult to implement a proper or optimized version. That has to be determined on a case-by-case basis though.
msg367497 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-04-28 05:16
If you're interested in pursuing this, please follow Kyle's suggestion and move this to Python ideas.  The two frozencounter use cases presented are pretty exotic, something encountered rarely in a lifetime.  That is well below the threshold for growing the standard library.  We cater to the commonplace. It is for PyPI to cater to everything else :-)
msg367498 - (view) Author: paul rubin (phr) Date: 2020-04-28 05:29
Yeah I think the basic answer to this ticket is "Python doesn't really have multisets and a proposal to add them should go somewhere else".  Fair enough-- consider the request withdrawn from here.

Regarding minimalism vs completeness, regarding some feature X (say X is multisets), it's reasonable per minimalism to decide not to have X.  But if on weighing the use cases you decide to have X after all, I think it's best to implement X properly and completely with all the edge cases handled, rather than implement a half-baked subset.  That was something Java was historically very good at and Python wasn't.  I guess that is one for the theorists though.  Thanks everyone.
msg367500 - (view) Author: paul rubin (phr) Date: 2020-04-28 05:50
Totally tangential: Veky, your ordinal example would work ok in Haskell and you'd have omega work the same way as epsilon0.  Take a look at Herman Ruge Jervell's book "Proof Theory" for a really nice tree-based ordinal notation that goes much higher than epsilon0.  It would be cool if your student implemented it.  Raymond, I agree that ordinals are a very esoteric use of multisets ;).
msg367505 - (view) Author: Kyle Stanley (aeros) * (Python committer) Date: 2020-04-28 07:16
paul rubin wrote:
> Yeah I think the basic answer to this ticket is "Python doesn't really have multisets and a proposal to add them should go somewhere else".  Fair enough-- consider the request withdrawn from here.

Not necessarily, python-ideas is just more so the designated location for cultivating Python change proposals in their early stages. If you're interested enough in this being added to the standard library, you can present your case there using some of the guidelines I suggested earlier. Also, if others are interested in the feature and have specific use cases for it, that can significantly increase the odds of it being added. The python-ideas mailing list tends to get much more traffic than an individual open issue on bpo.

In the context of significant new feature proposals, bpo is typically used once it's been approved of (with a formal PEP for especially large changes), to further discuss the implementation details and for tracking related PRs.
msg368927 - (view) Author: paul rubin (phr) Date: 2020-05-15 05:19
Note: PEP 603 may essentially take care of this, if it is accepted.
History
Date User Action Args
2022-04-11 14:59:30adminsetgithub: 84591
2020-05-15 05:19:44phrsetmessages: + msg368927
2020-04-28 07:16:44aerossetmessages: + msg367505
2020-04-28 05:50:09phrsetmessages: + msg367500
2020-04-28 05:29:52phrsetmessages: + msg367498
2020-04-28 05:16:38rhettingersetstatus: open -> closed
resolution: postponed
messages: + msg367497

stage: resolved
2020-04-28 05:00:52aerossetnosy: - veky
messages: + msg367495
2020-04-28 04:12:29vekysetnosy: + veky
messages: + msg367492
2020-04-28 03:54:03phrsetmessages: + msg367491
2020-04-28 03:50:39phrsetmessages: + msg367489
2020-04-28 03:35:19phrsetmessages: + msg367488
2020-04-28 02:17:38rhettingersetmessages: + msg367482
2020-04-28 00:45:21aerossetnosy: + rhettinger

messages: + msg367476
versions: + Python 3.9
2020-04-28 00:42:25aerossetnosy: + aeros
messages: + msg367475
2020-04-27 23:59:59phrcreate