classification
Title: functools.lru_cache keeps objects alive forever
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: ncoghlan, pitrou, rhettinger, serhiy.storchaka, thesheep
Priority: normal Keywords: patch

Created on 2013-12-02 10:27 by thesheep, last changed 2013-12-04 21:24 by rhettinger. This issue is now closed.

Files
File name Uploaded Description Edit
66c1c9f32567.diff serhiy.storchaka, 2013-12-03 20:05 review
Repositories containing patches
https://bitbucket.org/thesheep/cpython-lru_cache-weakref
Messages (13)
msg204995 - (view) Author: Radomir Dopieralski (thesheep) Date: 2013-12-02 10:27
As most naïve "memoized" decorator implementations, lru_cache keeps references to all the values of arguments of the decorated function in the cache. That means, that if we call such a decorated function with an object as a parameter, that object will be kept alive in memory forever -- that is, until the program ends. This is an obvious waste, since when we no longer have any other reference to that object, we are unable to call that function with the same parameter ever again, so it just wastes the cache space.

This is a very common case when we decorate a method -- the first parameter is "self". One solution for this particular case is to use a dedicated "memoized_method" decorator, that stores the cache on the "self" object itself, so that it can be released together with the object.

A more general solution uses weakrefs where possible in the cache key, maybe even with an additional callback that removes the cache entry when any of its parameters is dead. Obviously it adds some overhead and makes the caching decorator even slower, but it can let us save a lot of memory, especially in long-running applications.

To better illustrate what I mean, here is an example of such an improved @memoized decorator that I wrote: https://review.openstack.org/#/c/54117/5/horizon/utils/memoized.py

It would be great to have an option to do something similar with lru_cache, and if there is an interest, I would like to work on that.
msg205120 - (view) Author: Radomir Dopieralski (thesheep) Date: 2013-12-03 15:29
I prepared a proof of concept solution at:

https://bitbucket.org/thesheep/cpython-lru_cache-weakref/commits/66c1c9f3256785552224ca177ed77a8312de6bb8
msg205146 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-03 20:30
Weak references make no sense in most cases when arguments disappear just after function call. For the self argument of a method it makes more sense. Perhaps lru_cache can return not a function, but a descriptor. This will allow implement special processing of first argument.

Or new decorator lru_cache_method should be added.
msg205147 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-03 20:47
Perhaps following technique can be used to prevent object's life prolongation:

def spam(self, *args, **kwargs):
    @lru_cache(maxsize=20)
    def spam(foo, bar=1, *, baz=None):
        ...
    self.spam = spam
    return self.spam(*args, **kwargs)
msg205148 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-12-03 20:50
I don't think this is an appropriate use of an LRU cache.  There are other ways to "freeze" a method return value (typically by storing the result in an instance).

Here's one way of doing it (taken from the source code for Itty https://pypi.python.org/pypi/itty/0.8.2 ):

class lazyproperty(object):
    """A property whose value is computed only once. """
    def __init__(self, function):
        self._function = function

    def __get__(self, obj, _=None):
        if obj is None:
            return self
        value = self._function(obj)
        setattr(obj, self._function.func_name, value)
        return value

Here is how it is used:

class Request(object):
    """An object to wrap the environ bits in a friendlier way."""

    ...

    @lazyproperty
    def POST(self):
        return self.build_complex_dict()

    ...
msg205175 - (view) Author: Radomir Dopieralski (thesheep) Date: 2013-12-03 23:43
The method example is just the most common case where this problem can be easily seen, but not the only one. We indeed use the @cached_property decorator on properties (similar to https://github.com/mitsuhiko/werkzeug/blob/master/werkzeug/utils.py#L35), and I've actually written a @memoized_method decorator for methods, that do pretty much what Serhiy suggests (except it's not repeated all over the place, a la copy-pasta, but kept in one decorator). That is fine, and the @cached_property is actually superior, as it avoids a lookup and a check once the value has been calculated.

However, this still doesn't solve the problems that are encountered in practice in actual code, like here: https://github.com/openstack/horizon/blob/master/openstack_dashboard/api/neutron.py#L735

Here we have a normal function, not a method, that calls a remote API through HTTP (the call is relatively slow, so we definitely want to cache it for multiple invocations). The function takes a ``request`` parameter, because it needs it for authentication with the remote service. The problem we had is that this will keep every single request in memory, because it's referenced by the cache.

Somehow it feels wrong to store the cache on an arbitrary attribute of the function, like the request in this case, and it's easy to imagine a function that takes two such critical arguments.

This is the code that actually made me write the weakref version of the @memoized decorator that I linked initially, and I thought that it could also be useful to have that in Python's caching decorator as an option.

I can understand if you think that this is too much, and that in such tricky situations the programmer should either write their own caching, or rewrite the code to avoid a memory leak. But I am sure that most programmers will not even think about this caveat. I think that we should at least add a note to lru_cache's documentation warning about this scenario and advising them to write their own caching decorators.
msg205178 - (view) Author: Radomir Dopieralski (thesheep) Date: 2013-12-03 23:57
Actually, after looking closer, my @memoize_method decorator does something completely different than Serhiy suggested. Still it only solves the common case of methods, and does nothing if you pass your short-lived objects as other parameters than self.

Limiting the cache size is also not a solution in the practical example with request that I linked to in the previous comment, because we can't know in advance how many times per request the function is going to be called, picking an arbitrary number feels wrong and may lead to unexpected behaviors when other fragments of code change (like a sudden slowdown every N calls).
msg205208 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-12-04 09:32
> Limiting the cache size is also not a solution in the 
> practical example with request that I linked to in the
> previous comment, because we can't know in advance how
> many times per request the function is going to be called, 
> picking an arbitrary number feels wrong and may lead to 
> unexpected behaviors

This suggests that you don't really want an LRU cache which is specifically designed to limit the cache size by expelling the
least recently used entry.

At its heart, the cache decorator is all about mapping a fixed inputs to fixed outputs.  The memory conservation comes from the replacement strategy and an option to clear the cache entirely.

The reason that my answer and Serhiy's answer don't fit your needs is that it isn't clear what you really want to do.  I think you should move this discussion to StackOverflow so others can help you tease-out your actual needs and suggest appropriate solutions.  Ideally, you should start with real use cases rather than focusing on hacking-up the LRU cache implementation.
msg205213 - (view) Author: Radomir Dopieralski (thesheep) Date: 2013-12-04 10:15
Thank you for your attention. I'm actually quite happy with the solution we have, it works well. That's actually I thought that it may be worthwhile to try and push it upstream to Python. I can totally understand why you don't want to add too much to the standard library, after all, everything you add there has to stay forever. So please consider this patch abandoned.

But I think it's would be still worthwhile to add a note to the lru_cache's documentation, saying something like:

"""
Warning! lru_cache will keep references to all the arguments for which it keeps cached values, which prevents them from being freed from memory when there are no other references. This can lead to memory leaks when you call a function with lru_cache on a lot of short-lived objects.
"""

I suppose you can come up with a nicer phrasing.
msg205218 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-12-04 12:15
On 4 December 2013 20:15, Radomir Dopieralski <report@bugs.python.org> wrote:
> But I think it's would be still worthwhile to add a note to the lru_cache's documentation, saying something like:
>
> """
> Warning! lru_cache will keep references to all the arguments for which it keeps cached values, which prevents them from being freed from memory when there are no other references. This can lead to memory leaks when you call a function with lru_cache on a lot of short-lived objects.
> """

Umm, that's part of the operational definition of a value based cache
- it needs to keep things alive, so that if a different instance shows
up with the same value, it will still get a cache hit.

We're willing to put warnings in the docs for cases where it's easy to
inadvertently introduce a security vulnerability, but not for
situations like this where using a container inappropriately may cause
it to keep objects alive that you didn't intend to keep alive.
msg205220 - (view) Author: Radomir Dopieralski (thesheep) Date: 2013-12-04 13:49
> Umm, that's part of the operational definition of a value based cache
> - it needs to keep things alive, so that if a different instance shows
> up with the same value, it will still get a cache hit.

If it only kept the return value alive, that wouldn't be a problem, it's indeed intuitively obvious that it has to do that in order to work. But what many people miss to notice is that it also keeps any arguments that were passed to the function alive. This is not intuitive, and as I demonstrated with my patch, not even necessary, so I think it might be worthwhile to at least mention this little implementation quirk.
msg205223 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-12-04 15:05
Just adding a note in the documentation sounds enough.
msg205248 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-12-04 21:24
I'm not in favor of filling the docs with warnings like this.  It tends to make everything sound dangerous even when the tools are doing exactly what they are supposed to do.

Every container (except for the weakref containers) keeps their references alive.  That is how Python works.  The LRU cache is no more special in this regard than a dictionary, list, or set.   In addition, the older entries get flushed-out and freed as the LRU cache gets newer entries.

[Radomir Dopieralski]
> So please consider this patch abandoned.

Marking this as closed.
History
Date User Action Args
2013-12-04 21:24:57rhettingersetstatus: open -> closed
resolution: rejected
messages: + msg205248
2013-12-04 15:05:11pitrousetnosy: + pitrou
messages: + msg205223
2013-12-04 13:49:00thesheepsetmessages: + msg205220
2013-12-04 12:15:28ncoghlansetmessages: + msg205218
2013-12-04 10:15:20thesheepsetmessages: + msg205213
2013-12-04 09:32:47rhettingersetmessages: + msg205208
2013-12-03 23:57:43thesheepsetmessages: + msg205178
2013-12-03 23:43:02thesheepsetmessages: + msg205175
2013-12-03 20:50:13rhettingersetmessages: + msg205148
2013-12-03 20:47:08serhiy.storchakasetmessages: + msg205147
2013-12-03 20:38:52rhettingersettype: resource usage -> enhancement
versions: - Python 3.1, Python 2.7, Python 3.2, Python 3.3, Python 3.4
2013-12-03 20:30:04serhiy.storchakasetmessages: + msg205146
2013-12-03 20:28:49rhettingersetassignee: rhettinger
2013-12-03 20:06:24serhiy.storchakasetnosy: + rhettinger, ncoghlan
2013-12-03 20:05:14serhiy.storchakasetfiles: + 66c1c9f32567.diff
keywords: + patch
2013-12-03 15:29:15thesheepsethgrepos: + hgrepo215
messages: + msg205120
2013-12-02 18:58:32pitrousetnosy: + serhiy.storchaka
2013-12-02 10:27:47thesheepcreate