This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: cached_method similar to cached_property to cache with classes
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.11
process
Status: closed Resolution: third party
Dependencies: Superseder:
Assigned To: Nosy List: AlexWaygood, eric.araujo, martenlienen, rhettinger, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2021-10-23 12:50 by martenlienen, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 29191 closed martenlienen, 2021-10-23 13:17
Messages (16)
msg404870 - (view) Author: Marten Lienen (martenlienen) * Date: 2021-10-23 12:50
There should be a `cached_method` equivalent to `cached_property` so that methods of long-lived objects can be cached easily.
msg404876 - (view) Author: Alex Waygood (AlexWaygood) * (Python triager) Date: 2021-10-23 13:48
Are you aware of the previously reported problems with `cached_property`? https://bugs.python.org/issue43468
msg404882 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-10-23 15:23
The simple implementation is:

def cached_method(func):
    return cached_property(lambda self: lru_cache()(partial(func, self)))
msg404885 - (view) Author: Marten Lienen (martenlienen) * Date: 2021-10-23 15:43
@AlexWaygood No, I was not aware of the problems. I have updated the PR by removing the lock entirely. Since the function has to be cacheable, it should be idempotent anyway, so that executing it multiple times in parallel does not make a program incorrect. This was also suggested as the a-priori best approach in the linked issue.
msg404886 - (view) Author: Marten Lienen (martenlienen) * Date: 2021-10-23 15:46
@serhiy.storchaka The simple implementation is very simple but does not allow overwriting the arguments to `lru_cache` and, probably more importantly, creates a circular reference on `self`, I believe.
msg404888 - (view) Author: Marten Lienen (martenlienen) * Date: 2021-10-23 15:53
An implementation based on cached_property (that therefore also inherits its locking problem) that does not create a circular reference and copies attributes (docs etc.) from the decorated function would be as follows (based on the WeaklyBoundMethod from this PR).

def cached_method_inner(func, lru_args):
    lru_args.setdefault("maxsize", None)
    def binder(self):
        method = WeaklyBoundMethod(func.__get__(self))
        cache = lru_cache(**lru_args)
        cached = cache(method)
        update_wrapper(cached, func)
        return cached
    prop = cached_property(binder)
    update_wrapper(prop, func)
    return prop

def cached_method(func=None, /, **lru_args):
    if func is not None:
        return cached_method_inner(func, lru_args)
    else:
        def decorator(late_func):
            return cached_method_inner(late_func, lru_args)
        return decorator
msg404889 - (view) Author: Marten Lienen (martenlienen) * Date: 2021-10-23 15:54
However, then the user gets error messages mentioning cached_property when they are actually using cached_method.
msg404894 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-10-23 16:38
AFAICT, the only capability added by the PR is keeping a weak reference to the instance.

This doesn't improve the hit rate and will make the hit rate worse for instances that define __hash__.  In the PR's example, two vector instances with equal coordinates would not be recognized as being equivalent.  Note, dataclasses make it trivially easy to add custom __eq__ and __hash__ support.

Users also lose the ability to limit maximum memory utilization.  Managing the combined size of many caches, one per instance, is much more difficult that setting an automatic maxsize limit on a single lru_cache.

Users further lose the ability to clear the entire cache all at once.  This can be important in testing.

AFAICT, the only benefit is that short-lived instances will be freed earlier than if they waited to age out of an lru_cache.  This would only ever matter if the instances were large, short-lived, and would never have other equivalent instances that could create a cache hit.

Lastly, I don't like the example given in the docs for the PR.  We really want the class to define __hash__ and __eq__ methods.  This doesn't just improve the hit rate, it is also necessary for avoiding bugs.  If the coordinates gets mutate, the cache has no way of knowing that its entry is invalid:
   
    v = Vector(10, 20, 30)
    print(v.norm())
    v.coordinates = (11, 22, 33)
    print(v.norm())
msg404895 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-10-23 16:40
See also:  https://docs.python.org/3/faq/programming.html#how-do-i-cache-method-calls
msg404938 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-10-24 17:31
For comparison, here is a recipe that I was originally going to include in the FAQ entry but later decided against it.

It only had an advantage over @lru_cache with instances so large that we can't wait for them to age out of the cache.  It shouldn't be used if new, equivalent instances to be created; otherwise, the hit rate would fall.  The class needs to be weak-referenceable, so __weakref__ needs to be a listed field when __slots__ are defined.  Also, @weak_lru is slower than @lru_cache.  

Compared to @cached_method in the current PR, @weak_lru creates a single unified cache rather than many separate caches.  This gives lower space overhead, allows a collective maxsize to be specified, and gives central control over cache statistics and clearing.  If the instances support hashing and equality tests, the @weak_lru recipe increases the hit rate across instances that are equivalent but not identical.

That said, @cached_method is much faster than @weak_lru because it doesn't need to create a new ref() on every call and it doesn't need a pure python wrapper.

-----------------------------------------------------

import functools
import weakref

def weak_lru(maxsize=128, typed=False):
    'LRU Cache decorator that keeps a weak reference to "self"'

    proxy = weakref.proxy

    def decorator(func):

        _func = functools.lru_cache(maxsize, typed)(func)

        @functools.wraps(func)
        def wrapper(self, /, *args, **kwargs):
            return _func(proxy(self), *args, **kwargs)

        return wrapper

    return decorator
msg404939 - (view) Author: Marten Lienen (martenlienen) * Date: 2021-10-24 18:35
Central control over the cache and its parameters is definitely a big plus. In my use case, the objects hold references to large blocks of GPU memory that should be freed as soon as possible. Additionally, I use cached_method to cache expensive hash calculations which the alternatives cannot do because they would run into bottomless recursion.

Caching methods looks to be significantly more complex than caching functions because there are multiple possible priorities, e.g. fast gc, performance, cross-instance cache hits, caching __hash__. Do you think, in the face of these ambiguities, that the stdlib should potentially just not cover this? Though there is already the shared, global cache alternative in applying @lru_cache to a method.
msg404940 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-10-24 20:36
> In my use case, the objects hold references to large blocks 
> of GPU memory that should be freed as soon as possible. 

That makes sense.  I can see why you reached for weak references.

Would it be a workable alternative to have an explicit close() method to free the underlying resources?  We do this with StringIO instances because relying on reference counting or GC is a bit fragile.


> Additionally, I use cached_method to cache expensive hash 
> calculations which the alternatives cannot do because they 
> would run into bottomless recursion.

This is a new concern.  Can you provide a minimal example that recurses infinitely with @lru_cache but not with @cached_method?  Offhand, I can't think of an example.


> Do you think, in the face of these ambiguities, that the
> stdlib should potentially just not cover this?

Perhaps this should be put on PyPI rather than in the standard library. 

The core problem is that having @cached_method in the standard library tells users that this is the recommended and preferred way to cache methods.  However in most cases they would be worse off.  And it isn't easy to know when @cached_method would be a win or to assess the loss of other optimizations like __slots__ and key-sharing dicts.
msg404957 - (view) Author: Marten Lienen (martenlienen) * Date: 2021-10-25 09:58
The lru_cache can trigger infinite recursion if it is used to cache a hash computation because the cache look-up itself requires the hash.

from functools import lru_cache

class CachedHash:
    @lru_cache
    def __hash__(self):
        # Expensive calculation because we are caching a big matrix for example.
        return 0

hashed = CachedHash()
hash(hashed) # => RecursionError: maximum recursion depth exceeded while calling a Python object

Even though this looks contrived, it is actually my use case. cached_property is not directly applicable because __hash__ needs to be a method and I wanted to avoid caching as an instance attribute because my class is a frozen dataclass. The dataclass thing also makes close() awkward because then I would have an outwardly resource-ful dataclass which is against the spirit of a dataclass.

I will think about putting this on pypi instead.
msg405013 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-10-26 00:36
> The lru_cache can trigger infinite recursion if it is used 
> to cache a hash computation because the cache look-up itself 
> requires the hash.

Yes, I see the problem.  Am not sure whether I should add a note to the docs for this.


> The dataclass thing also makes close() awkward because
> then I would have an outwardly resource-ful dataclass 
> which is against the spirit of a dataclass.

If you think of a dataclass as just a data holder like a mutable named tuple, I can see where the notion of the "spirit of dataclass" comes from.

However, if you think of it as a code generator for handling the boilerplate code in a more complex class, the only "spirit" is to do anything that any other class does.

For objects that hold resources, an explicit close() method is common and well supported (i.e. contextlib.closing and generator.close()).  It is a perfectly reasonable thing to do.

That said, it's a matter of taste.  Just do what works best for you.


> I will think about putting this on pypi instead.

If you do post it, let me know.  I'll add a link to it from the FAQ entry.
msg405378 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2021-10-30 17:51
The feature proposed here is not clear to me.

Is it about caching the method object on the instance to optimize the creation of the bound method, as the name suggests?  Or is it about caching the result of calling the method, which is consistent with the references to lru_cache and cachedproperty?
msg405416 - (view) Author: Marten Lienen (martenlienen) * Date: 2021-10-31 19:44
As suggested, I have extracted the code and tests into a package on PyPI: https://pypi.org/project/cached_method/
With this I will close this issue. "third party" sounds about right as the Resolution setting.

@eric.araujo This feature is about caching the result of a method call.
History
Date User Action Args
2022-04-11 14:59:51adminsetgithub: 89751
2021-10-31 19:44:09martenlienensetstatus: open -> closed
resolution: third party
messages: + msg405416

stage: patch review -> resolved
2021-10-30 17:51:49eric.araujosetnosy: + eric.araujo
messages: + msg405378
2021-10-26 00:36:34rhettingersetmessages: + msg405013
2021-10-25 09:58:39martenlienensetmessages: + msg404957
2021-10-24 20:36:23rhettingersetmessages: + msg404940
2021-10-24 18:35:37martenlienensetmessages: + msg404939
2021-10-24 17:31:57rhettingersetmessages: + msg404938
2021-10-23 16:40:32rhettingersetmessages: + msg404895
2021-10-23 16:38:08rhettingersetnosy: + rhettinger
messages: + msg404894
2021-10-23 15:54:08martenlienensetmessages: + msg404889
2021-10-23 15:53:25martenlienensetmessages: + msg404888
2021-10-23 15:46:06martenlienensetmessages: + msg404886
2021-10-23 15:43:41martenlienensetmessages: + msg404885
2021-10-23 15:23:44serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg404882
2021-10-23 13:48:43AlexWaygoodsetnosy: + AlexWaygood
messages: + msg404876
2021-10-23 13:17:36martenlienensetkeywords: + patch
stage: patch review
pull_requests: + pull_request27462
2021-10-23 12:50:45martenlienencreate