This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: functools._HashedSeq implements __hash__ but not __eq__
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.10, Python 3.9
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: leopold.talirz, rhettinger, serhiy.storchaka
Priority: normal Keywords:

Created on 2021-09-01 16:54 by leopold.talirz, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (3)
msg400858 - (view) Author: Leopold Talirz (leopold.talirz) Date: 2021-09-01 16:54
Disclaimer: this is my first issue on the python bug tracker. Please feel free to close this issue and redirect this to a specific mailing list, if more appropriate.


The implementation of `functools._HashedSeq` [1] does not include an implementation of `__eq__` that takes advantage of the hash:
```
class _HashedSeq(list):
    """ This class guarantees that hash() will be called no more than once
        per element.  This is important because the lru_cache() will hash
        the key multiple times on a cache miss.
    """

    __slots__ = 'hashvalue'

    def __init__(self, tup, hash=hash):
        self[:] = tup
        self.hashvalue = hash(tup)

    def __hash__(self):
        return self.hashvalue
```
As far as I can tell, the `_HashedSeq` object is used as a key for looking up values in the python dictionary that holds the LRU cache, and this lookup mechanism relies on `__eq__` over `__hash__` (besides shortcuts for objects with the same id, etc.).

This can cause potentially expensive `__eq__` calls on the arguments of the cached function and I wonder whether this is intended?

Here is a short example code to demonstrate this:
```
from functools import _HashedSeq

class CompList(list):
    """Hashable list (please forgive)"""
    def __eq__(self, other):
        print("equality comparison")
        return super().__eq__(other)
    def __hash__(self):
        return hash(tuple(self))

args1=CompList((1,2,3))  # represents function arguments passed to lru_cache
args2=CompList((1,2,3))  # identical content but different object


hs1=_HashedSeq( (args1,))
hs2=_HashedSeq( (args2,))

hs1 == hs2  # True, prints "equality comparison"

d={}
d[hs1] = "cached"
d[hs2] # "cached", prints "equality comparison"

```

Adding the following to the implementation of `_HashedSeq` gets rid of the calls to `__eq__`:
```
    def __eq__(self, other):
        return self.hashvalue == other.hashvalue
```
Happy to open a PR for this.

I'm certainly a bit out of my depth here, so apologies if that is all intended behavior.


[1] https://github.com/python/cpython/blob/679cb4781ea370c3b3ce40d3334dc404d7e9d92b/Lib/functools.py#L432-L446
msg400897 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-09-02 05:50
1. self.hashvalue == other.hashvalue is not enough. Different tuples can have the same hash value, so you steel need to compare their context.

2. _HashedSeq is only used as a key in a dictionary. When you look up a key in dictionary, it compares hashes first. __eq__ is only called when hashes match.
msg400902 - (view) Author: Leopold Talirz (leopold.talirz) Date: 2021-09-02 08:32
Thanks a lot for the clarification, I get it now (I think).

> 2. _HashedSeq is only used as a key in a dictionary. When you look up a key in dictionary, it compares hashes first. __eq__ is only called when hashes match.

I was looking at the docs [1] and [2] and didn't come across this.
For me, this explanation of how __hash__ and __eq__ are used in the LRU cache would have been helpful to read in [1], but of course there is a tradeoff in how long to make docstrings.
If it is already explained somewhere else in the docs, linking to that place from [1] could be useful.

[1] https://docs.python.org/3/library/functools.html#functools.lru_cache
[2] https://docs.python.org/3/reference/datamodel.html#object.__hash__
History
Date User Action Args
2022-04-11 14:59:49adminsetgithub: 89243
2021-09-02 08:32:43leopold.talirzsetstatus: open -> closed
resolution: not a bug
messages: + msg400902

stage: resolved
2021-09-02 05:50:55serhiy.storchakasetnosy: + rhettinger, serhiy.storchaka
messages: + msg400897
2021-09-01 16:54:27leopold.talirzcreate