Issue45588
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2021-10-23 12:50 by martenlienen, last changed 2022-04-11 14:59 by admin. This issue is now closed.
Pull Requests | |||
---|---|---|---|
URL | Status | Linked | Edit |
PR 29191 | closed | martenlienen, 2021-10-23 13:17 |
Messages (16) | |||
---|---|---|---|
msg404870 - (view) | Author: Marten Lienen (martenlienen) * | Date: 2021-10-23 12:50 | |
There should be a `cached_method` equivalent to `cached_property` so that methods of long-lived objects can be cached easily. |
|||
msg404876 - (view) | Author: Alex Waygood (AlexWaygood) * | Date: 2021-10-23 13:48 | |
Are you aware of the previously reported problems with `cached_property`? https://bugs.python.org/issue43468 |
|||
msg404882 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2021-10-23 15:23 | |
The simple implementation is: def cached_method(func): return cached_property(lambda self: lru_cache()(partial(func, self))) |
|||
msg404885 - (view) | Author: Marten Lienen (martenlienen) * | Date: 2021-10-23 15:43 | |
@AlexWaygood No, I was not aware of the problems. I have updated the PR by removing the lock entirely. Since the function has to be cacheable, it should be idempotent anyway, so that executing it multiple times in parallel does not make a program incorrect. This was also suggested as the a-priori best approach in the linked issue. |
|||
msg404886 - (view) | Author: Marten Lienen (martenlienen) * | Date: 2021-10-23 15:46 | |
@serhiy.storchaka The simple implementation is very simple but does not allow overwriting the arguments to `lru_cache` and, probably more importantly, creates a circular reference on `self`, I believe. |
|||
msg404888 - (view) | Author: Marten Lienen (martenlienen) * | Date: 2021-10-23 15:53 | |
An implementation based on cached_property (that therefore also inherits its locking problem) that does not create a circular reference and copies attributes (docs etc.) from the decorated function would be as follows (based on the WeaklyBoundMethod from this PR). def cached_method_inner(func, lru_args): lru_args.setdefault("maxsize", None) def binder(self): method = WeaklyBoundMethod(func.__get__(self)) cache = lru_cache(**lru_args) cached = cache(method) update_wrapper(cached, func) return cached prop = cached_property(binder) update_wrapper(prop, func) return prop def cached_method(func=None, /, **lru_args): if func is not None: return cached_method_inner(func, lru_args) else: def decorator(late_func): return cached_method_inner(late_func, lru_args) return decorator |
|||
msg404889 - (view) | Author: Marten Lienen (martenlienen) * | Date: 2021-10-23 15:54 | |
However, then the user gets error messages mentioning cached_property when they are actually using cached_method. |
|||
msg404894 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2021-10-23 16:38 | |
AFAICT, the only capability added by the PR is keeping a weak reference to the instance. This doesn't improve the hit rate and will make the hit rate worse for instances that define __hash__. In the PR's example, two vector instances with equal coordinates would not be recognized as being equivalent. Note, dataclasses make it trivially easy to add custom __eq__ and __hash__ support. Users also lose the ability to limit maximum memory utilization. Managing the combined size of many caches, one per instance, is much more difficult that setting an automatic maxsize limit on a single lru_cache. Users further lose the ability to clear the entire cache all at once. This can be important in testing. AFAICT, the only benefit is that short-lived instances will be freed earlier than if they waited to age out of an lru_cache. This would only ever matter if the instances were large, short-lived, and would never have other equivalent instances that could create a cache hit. Lastly, I don't like the example given in the docs for the PR. We really want the class to define __hash__ and __eq__ methods. This doesn't just improve the hit rate, it is also necessary for avoiding bugs. If the coordinates gets mutate, the cache has no way of knowing that its entry is invalid: v = Vector(10, 20, 30) print(v.norm()) v.coordinates = (11, 22, 33) print(v.norm()) |
|||
msg404895 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2021-10-23 16:40 | |
See also: https://docs.python.org/3/faq/programming.html#how-do-i-cache-method-calls |
|||
msg404938 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2021-10-24 17:31 | |
For comparison, here is a recipe that I was originally going to include in the FAQ entry but later decided against it. It only had an advantage over @lru_cache with instances so large that we can't wait for them to age out of the cache. It shouldn't be used if new, equivalent instances to be created; otherwise, the hit rate would fall. The class needs to be weak-referenceable, so __weakref__ needs to be a listed field when __slots__ are defined. Also, @weak_lru is slower than @lru_cache. Compared to @cached_method in the current PR, @weak_lru creates a single unified cache rather than many separate caches. This gives lower space overhead, allows a collective maxsize to be specified, and gives central control over cache statistics and clearing. If the instances support hashing and equality tests, the @weak_lru recipe increases the hit rate across instances that are equivalent but not identical. That said, @cached_method is much faster than @weak_lru because it doesn't need to create a new ref() on every call and it doesn't need a pure python wrapper. ----------------------------------------------------- import functools import weakref def weak_lru(maxsize=128, typed=False): 'LRU Cache decorator that keeps a weak reference to "self"' proxy = weakref.proxy def decorator(func): _func = functools.lru_cache(maxsize, typed)(func) @functools.wraps(func) def wrapper(self, /, *args, **kwargs): return _func(proxy(self), *args, **kwargs) return wrapper return decorator |
|||
msg404939 - (view) | Author: Marten Lienen (martenlienen) * | Date: 2021-10-24 18:35 | |
Central control over the cache and its parameters is definitely a big plus. In my use case, the objects hold references to large blocks of GPU memory that should be freed as soon as possible. Additionally, I use cached_method to cache expensive hash calculations which the alternatives cannot do because they would run into bottomless recursion. Caching methods looks to be significantly more complex than caching functions because there are multiple possible priorities, e.g. fast gc, performance, cross-instance cache hits, caching __hash__. Do you think, in the face of these ambiguities, that the stdlib should potentially just not cover this? Though there is already the shared, global cache alternative in applying @lru_cache to a method. |
|||
msg404940 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2021-10-24 20:36 | |
> In my use case, the objects hold references to large blocks > of GPU memory that should be freed as soon as possible. That makes sense. I can see why you reached for weak references. Would it be a workable alternative to have an explicit close() method to free the underlying resources? We do this with StringIO instances because relying on reference counting or GC is a bit fragile. > Additionally, I use cached_method to cache expensive hash > calculations which the alternatives cannot do because they > would run into bottomless recursion. This is a new concern. Can you provide a minimal example that recurses infinitely with @lru_cache but not with @cached_method? Offhand, I can't think of an example. > Do you think, in the face of these ambiguities, that the > stdlib should potentially just not cover this? Perhaps this should be put on PyPI rather than in the standard library. The core problem is that having @cached_method in the standard library tells users that this is the recommended and preferred way to cache methods. However in most cases they would be worse off. And it isn't easy to know when @cached_method would be a win or to assess the loss of other optimizations like __slots__ and key-sharing dicts. |
|||
msg404957 - (view) | Author: Marten Lienen (martenlienen) * | Date: 2021-10-25 09:58 | |
The lru_cache can trigger infinite recursion if it is used to cache a hash computation because the cache look-up itself requires the hash. from functools import lru_cache class CachedHash: @lru_cache def __hash__(self): # Expensive calculation because we are caching a big matrix for example. return 0 hashed = CachedHash() hash(hashed) # => RecursionError: maximum recursion depth exceeded while calling a Python object Even though this looks contrived, it is actually my use case. cached_property is not directly applicable because __hash__ needs to be a method and I wanted to avoid caching as an instance attribute because my class is a frozen dataclass. The dataclass thing also makes close() awkward because then I would have an outwardly resource-ful dataclass which is against the spirit of a dataclass. I will think about putting this on pypi instead. |
|||
msg405013 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2021-10-26 00:36 | |
> The lru_cache can trigger infinite recursion if it is used > to cache a hash computation because the cache look-up itself > requires the hash. Yes, I see the problem. Am not sure whether I should add a note to the docs for this. > The dataclass thing also makes close() awkward because > then I would have an outwardly resource-ful dataclass > which is against the spirit of a dataclass. If you think of a dataclass as just a data holder like a mutable named tuple, I can see where the notion of the "spirit of dataclass" comes from. However, if you think of it as a code generator for handling the boilerplate code in a more complex class, the only "spirit" is to do anything that any other class does. For objects that hold resources, an explicit close() method is common and well supported (i.e. contextlib.closing and generator.close()). It is a perfectly reasonable thing to do. That said, it's a matter of taste. Just do what works best for you. > I will think about putting this on pypi instead. If you do post it, let me know. I'll add a link to it from the FAQ entry. |
|||
msg405378 - (view) | Author: Éric Araujo (eric.araujo) * | Date: 2021-10-30 17:51 | |
The feature proposed here is not clear to me. Is it about caching the method object on the instance to optimize the creation of the bound method, as the name suggests? Or is it about caching the result of calling the method, which is consistent with the references to lru_cache and cachedproperty? |
|||
msg405416 - (view) | Author: Marten Lienen (martenlienen) * | Date: 2021-10-31 19:44 | |
As suggested, I have extracted the code and tests into a package on PyPI: https://pypi.org/project/cached_method/ With this I will close this issue. "third party" sounds about right as the Resolution setting. @eric.araujo This feature is about caching the result of a method call. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:51 | admin | set | github: 89751 |
2021-10-31 19:44:09 | martenlienen | set | status: open -> closed resolution: third party messages: + msg405416 stage: patch review -> resolved |
2021-10-30 17:51:49 | eric.araujo | set | nosy:
+ eric.araujo messages: + msg405378 |
2021-10-26 00:36:34 | rhettinger | set | messages: + msg405013 |
2021-10-25 09:58:39 | martenlienen | set | messages: + msg404957 |
2021-10-24 20:36:23 | rhettinger | set | messages: + msg404940 |
2021-10-24 18:35:37 | martenlienen | set | messages: + msg404939 |
2021-10-24 17:31:57 | rhettinger | set | messages: + msg404938 |
2021-10-23 16:40:32 | rhettinger | set | messages: + msg404895 |
2021-10-23 16:38:08 | rhettinger | set | nosy:
+ rhettinger messages: + msg404894 |
2021-10-23 15:54:08 | martenlienen | set | messages: + msg404889 |
2021-10-23 15:53:25 | martenlienen | set | messages: + msg404888 |
2021-10-23 15:46:06 | martenlienen | set | messages: + msg404886 |
2021-10-23 15:43:41 | martenlienen | set | messages: + msg404885 |
2021-10-23 15:23:44 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages: + msg404882 |
2021-10-23 13:48:43 | AlexWaygood | set | nosy:
+ AlexWaygood messages: + msg404876 |
2021-10-23 13:17:36 | martenlienen | set | keywords:
+ patch stage: patch review pull_requests: + pull_request27462 |
2021-10-23 12:50:45 | martenlienen | create |