This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Expose frozenset._hash classmethod
Type: performance Stage: resolved
Components: Versions: Python 3.11
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: jab, rhettinger
Priority: normal Keywords:

Created on 2022-02-08 18:22 by jab, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (3)
msg412856 - (view) Author: Joshua Bronson (jab) * Date: 2022-02-08 18:22
collections.abc.Set provides a _hash() method that includes the following in its docstring:

"""
Note that we don't define __hash__: not all sets are hashable.
But if you define a hashable set type, its __hash__ should
call this function.
...
We match the algorithm used by the built-in frozenset type.
"""

Because Set._hash() is currently implemented in pure Python, users face having to make a potentially challenging decision between whether to trade off runtime efficiency vs. space efficiency:

>>> hash(frozenset(x))  # Should I use this?
>>> Set._hash(x)        # Or this?

The former requires O(n) memory to create the frozenset, merely to throw it immediately away, but on the other hand gets to use frozenset's __hash__ implementation, which is implemented in C.

The latter requires only O(1) memory, but does not get the performance benefit of using the C implementation of this algorithm.

Why not expose the C implementation via a frozenset._hash() classmethod, and change Set._hash() to merely call that?

Then it would be much clearer that using Set._hash() is always the right answer.
msg412873 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2022-02-08 22:24
> Why not expose the C implementation via a frozenset._hash()
> classmethod, and change Set._hash() to merely call that?

The frozenset.__hash__ method is tightly bound to the internals of real sets to take advantage of the hash values already being known for real sets.  So we can't just expose it to ABC sets.

Also, the ABC sets need to be kept in sync with an __eq__ method, so users really need to see what Set._hash is actually doing.

Lastly, pure python hashable sets based on the ABC are not common, so it doesn't warrant a C fast path.  If C fast paths were offered, the union, intersection, and difference methods would more important targets.

So, thanks for the suggestion but it isn't as straight-forward as exposing existing code and it might make it harder for implementer to stay synced with an __eq__ method.

Tangential thought:  If you do implement __hash__ with Set._hash, consider using a cached_property decorator.
msg412877 - (view) Author: Joshua Bronson (jab) * Date: 2022-02-08 22:57
Thanks for the explanation, Raymond.

Regarding:

> Lastly, pure python hashable sets based on the ABC are not common

This would have implications for other use cases though, that are perhaps more common.

See, for example, the following code: https://github.com/jab/bidict/blob/ae9d06/bidict/_frozenbidict.py#L36-L38

This example demonstrates an implementation of a hashable, immutable Mapping type, whose __hash__ implementation returns collections.abc.ItemsView(self)._hash(). Since there are several other libraries I know of that implement hashable/immutable mapping types as well, I thought this might be beneficial enough to users to warrant consideration.
History
Date User Action Args
2022-04-11 14:59:55adminsetgithub: 90842
2022-02-09 05:32:53rhettingersetassignee: rhettinger
2022-02-08 22:57:39jabsetmessages: + msg412877
2022-02-08 22:24:51rhettingersetstatus: open -> closed

type: performance
versions: + Python 3.11
nosy: + rhettinger

messages: + msg412873
resolution: rejected
stage: resolved
2022-02-08 18:22:32jabcreate