This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author harahu
Recipients docs@python, eric.araujo, eric.smith, ezio.melotti, harahu, mdk, rhettinger, willingc
Date 2021-11-18.09:40:23
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1637228424.55.0.0998426529045.issue45832@roundup.psfhosted.org>
In-reply-to
Content
I am realising that me not knowing about the hash invariance is likely a symptom of something I have in common with most Python users, but not with Python maintainers: Having access to a powerful ecosystem, we mostly get our classes from 3rd parties, rather than implement them ourselves. When I do define my own classes, I usually don't have to touch the `__hash__` or `__eq__` implementations, since I am either subclassing, making a plain dataclass, or leaning on `attrs` to help me out. I think it is telling that even the pandas core devs are able to mess this up, and it suggests to me that this invariance isn't emphasised enough.

Here's a go at specifying what I mean with a backlink:

"""
For sequence container types such as list, tuple, or collections.deque,
the expression `x in y` is equivalent to `any(x is e or x == e for e in y)`.
For container that use hashing, such as dict, set, or frozenset, 
the same equivalence holds, assuming the [hash invariance](https://docs.python.org/3/glossary.html#term-hashable).
"""

I just derived this more or less directly from Hettinger's formulation. It could probably be made clearer.

I am realising that this, (famous, it seems), hash invariance isn't defined in isolation anywhere, making it slightly hard to link to. Any better suggestions than the glossary entry for hashable, which has the definition included? To me, it seems that such a fundamental assumption/convention/requirement, that isn't automatically enforced, should be as easy as possible to point to.

In my search for the definition (prompted by Hettinger) i discovered more surprised, by the way.

Surprise 1:
https://docs.python.org/3/library/collections.abc.html?highlight=hashable#collections.abc.Hashable

> ABC for classes that provide the __hash__() method.

Having now discovered the mentioned invariance, I am surprised this isn't explicitly formulated (and implemented? haven't checked) as:

"""
ABC for classes that provide the __hash__() and __eq__() methods.
"""

I also think this docstring deserves a backlink to the invariance definition, given it's importance, and how easy it is to shoot yourself in the foot. The current formulation of this docstring actually reflected what I (naively) assumed it meant to be hashable, suggesting this is the place in the docs I got my understanding of the term from.

Surprise 2:
https://docs.python.org/3/reference/expressions.html?highlight=hashable#value-comparisons

> The `hash()` result should be consistent with equality. Objects that are equal should either have the same hash value, or be marked as unhashable.

I appreciate that this is mentioned in this section (I was hoping to find it). But it feels like a reiteration of the definition of the invariant, and could thus be replaced with a backlink, like suggested above. I'd much rather see the text real estate be used for a motivating statement (you do't want weird behaviour in sets and dicts), and a reminder of the importance of checking the __hash__ implementation if you are modifying the __eq__ implementation, in, say, some subclass.

Surprise 3:
https://docs.python.org/3/reference/datamodel.html#object.__eq__

> See the paragraph on __hash__() for some important notes on creating hashable objects which support custom comparison operations and are usable as dictionary keys.

Another case of the invariance being mentioned (I appreciate it), but in a way where it isn't directly evident that extreme care should be taken when modifying an __eq__ implementation. Perhaps another case where the invariance should be referred to by link, and the text should focus on the consequences of breaking it.

Surprise 4:
https://docs.python.org/3/reference/datamodel.html#object.__hash__

Another definition-in-passing of the invariance:

> The only required property is that objects which compare equal have the same hash value.

Also replaceable by backlink?

There after follows descriptions of some, (in hindsight very important), protection mechanisms.

> User-defined classes have __eq__() and __hash__() methods by default; with them, all objects compare unequal (except with themselves) and x.__hash__() returns an appropriate value such that x == y implies both that x is y and hash(x) == hash(y).

> A class that overrides __eq__() and does not define __hash__() will have its __hash__() implicitly set to None.

But yet again, without some motivating statement for why we care about the invariance, all of this seems, well, surprising and weird.

Surprise 5:
https://docs.python.org/3/library/functions.html#hash

Perhaps another location where a backlink would be in order, although not sure in this case.
History
Date User Action Args
2021-11-18 09:40:24harahusetrecipients: + harahu, rhettinger, eric.smith, ezio.melotti, eric.araujo, docs@python, willingc, mdk
2021-11-18 09:40:24harahusetmessageid: <1637228424.55.0.0998426529045.issue45832@roundup.psfhosted.org>
2021-11-18 09:40:24harahulinkissue45832 messages
2021-11-18 09:40:23harahucreate