This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: functools.lru_cache does not guarantee cache hits for equal values
Type: Stage: resolved
Components: Library (Lib) Versions: Python 3.9
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: brilee, rhettinger
Priority: normal Keywords:

Created on 2021-08-24 13:36 by brilee, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (3)
msg400209 - (view) Author: Brian Lee (brilee) Date: 2021-08-24 13:36
This seems like unexpected behavior: Two keys that are equal and have equal hashes should yield cache hits, but they do not.


Python 3.9.6 (default, Aug 18 2021, 19:38:01) 
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import functools
>>> 
>>> import numpy as np
>>> 
>>> @functools.lru_cache(maxsize=None)
... def f(x):
...   return x
... 
>>> py_str = 'hello world'
>>> np_str = np.str_(py_str)
>>> 
>>> assert py_str == np_str
>>> assert hash(py_str) == hash(np_str)
>>> 
>>> assert f.cache_info().currsize == 0
>>> f(py_str)
'hello world'
>>> assert f.cache_info().currsize == 1
>>> f(np_str)
'hello world'
>>> assert f.cache_info().currsize == 2
>>> print(f.cache_info())
CacheInfo(hits=0, misses=2, maxsize=None, currsize=2)
msg400217 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-08-24 16:17
Thanks for the report but this is an allowed behavior and not a bug.

Per the docs¹:  If typed is set to true, function arguments of different types will be cached separately. For example, f(3) and f(3.0) will always be treated as distinct calls with distinct results. If typed is false, the implementation will usually but not always regard them as equivalent calls and only cache a single result. 

¹ https://docs.python.org/3.10/library/functools.html#module-functools

In this particular case, exact type matches for str and int have an alternate path that tends to save space.

The cost is that str or int equivalents are cached separately.  That doesn't tend to be a problem in practice because functions are mostly called with the same types over and over again.

Note, we also treat equivalent calling patterns as distinct, f(a=1, b=2), is cached separately from f(b=2, a=1).
msg400223 - (view) Author: Brian Lee (brilee) Date: 2021-08-24 18:22
Thanks for clarifying - I see now that the docs specifically call out the lack of guarantees here with "usually but not always regard them as equivalent".

I did want to specifically explain the context of my bug; 

1. NumPy's strings have some unexpected behavior because they have fixed-length strings (represented inline) and var-length strings (which are pointers to Python strings). Various arcana dictate which version you get, and wrappers like pandas.read_csv can also throw a wrench in the mix. It is quite easy for the nominal "string type" to change from under you, which is how I stumbled on this bug.

2. I was using functools.cache as a way to intern objects and short-circuit otherwise very expensive equality calculations by reducing them to pointer comparisons - hence my desire for exact cache hits when typed=False.

While I agree this is Working As Documented, it does not Work As Expected in my opinion. I would expect the stdlib optimized implementation to follow the same behavior as this naive implementation, which does consider "hello world" and np.str_("hello world") to be equivalent.

def cache(func):
  _cache = {}
  @functools.wraps(func)
  def wrapped(*args, **kwargs):
    cache_key = tuple(args) + tuple(kwargs.items())
    if cache_key not in _cache:
      _cache[cache_key] = func(*args, **kwargs)
    return _cache[cache_key]
  return wrapped
History
Date User Action Args
2022-04-11 14:59:49adminsetgithub: 89155
2021-08-24 18:22:17brileesetmessages: + msg400223
title: functools.lru_cache does not consider strings and numpy strings as equivalent -> functools.lru_cache does not guarantee cache hits for equal values
2021-08-24 16:17:53rhettingersetstatus: open -> closed

nosy: + rhettinger
messages: + msg400217

resolution: not a bug
stage: resolved
2021-08-24 13:36:59brileecreate