functools.lru_cache keeps objects alive forever #64058

thesheep · 2013-12-02T10:27:47Z

BPO	19859
Nosy	@rhettinger, @ncoghlan, @pitrou, @serhiy-storchaka
Files	66c1c9f32567.diff

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/rhettinger'
closed_at = <Date 2013-12-04.21:24:57.035>
created_at = <Date 2013-12-02.10:27:47.237>
labels = ['type-feature', 'library']
title = 'functools.lru_cache keeps objects alive forever'
updated_at = <Date 2013-12-04.21:24:57.033>
user = 'https://bugs.python.org/thesheep'

bugs.python.org fields:

activity = <Date 2013-12-04.21:24:57.033>
actor = 'rhettinger'
assignee = 'rhettinger'
closed = True
closed_date = <Date 2013-12-04.21:24:57.035>
closer = 'rhettinger'
components = ['Library (Lib)']
creation = <Date 2013-12-02.10:27:47.237>
creator = 'thesheep'
dependencies = []
files = ['32950']
hgrepos = ['215']
issue_num = 19859
keywords = ['patch']
message_count = 13.0
messages = ['204995', '205120', '205146', '205147', '205148', '205175', '205178', '205208', '205213', '205218', '205220', '205223', '205248']
nosy_count = 5.0
nosy_names = ['rhettinger', 'ncoghlan', 'pitrou', 'serhiy.storchaka', 'thesheep']
pr_nums = []
priority = 'normal'
resolution = 'rejected'
stage = None
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue19859'
versions = ['Python 3.5']

thesheep · 2013-12-02T10:27:46Z

As most naïve "memoized" decorator implementations, lru_cache keeps references to all the values of arguments of the decorated function in the cache. That means, that if we call such a decorated function with an object as a parameter, that object will be kept alive in memory forever -- that is, until the program ends. This is an obvious waste, since when we no longer have any other reference to that object, we are unable to call that function with the same parameter ever again, so it just wastes the cache space.

This is a very common case when we decorate a method -- the first parameter is "self". One solution for this particular case is to use a dedicated "memoized_method" decorator, that stores the cache on the "self" object itself, so that it can be released together with the object.

A more general solution uses weakrefs where possible in the cache key, maybe even with an additional callback that removes the cache entry when any of its parameters is dead. Obviously it adds some overhead and makes the caching decorator even slower, but it can let us save a lot of memory, especially in long-running applications.

To better illustrate what I mean, here is an example of such an improved @memoized decorator that I wrote: https://review.openstack.org/#/c/54117/5/horizon/utils/memoized.py

It would be great to have an option to do something similar with lru_cache, and if there is an interest, I would like to work on that.

thesheep · 2013-12-03T15:29:15Z

I prepared a proof of concept solution at:

https://bitbucket.org/thesheep/cpython-lru_cache-weakref/commits/66c1c9f3256785552224ca177ed77a8312de6bb8

serhiy-storchaka · 2013-12-03T20:30:05Z

Weak references make no sense in most cases when arguments disappear just after function call. For the self argument of a method it makes more sense. Perhaps lru_cache can return not a function, but a descriptor. This will allow implement special processing of first argument.

Or new decorator lru_cache_method should be added.

serhiy-storchaka · 2013-12-03T20:47:08Z

Perhaps following technique can be used to prevent object's life prolongation:

def spam(self, *args, **kwargs):
    @lru_cache(maxsize=20)
    def spam(foo, bar=1, *, baz=None):
        ...
    self.spam = spam
    return self.spam(*args, **kwargs)

rhettinger · 2013-12-03T20:50:13Z

I don't think this is an appropriate use of an LRU cache. There are other ways to "freeze" a method return value (typically by storing the result in an instance).

Here's one way of doing it (taken from the source code for Itty https://pypi.python.org/pypi/itty/0.8.2 ):

class lazyproperty(object):
    """A property whose value is computed only once. """
    def __init__(self, function):
        self._function = function

    def __get__(self, obj, _=None):
        if obj is None:
            return self
        value = self._function(obj)
        setattr(obj, self._function.func_name, value)
        return value

Here is how it is used:

class Request(object):
    """An object to wrap the environ bits in a friendlier way."""

...

    @lazyproperty
    def POST(self):
        return self.build_complex_dict()

...

thesheep · 2013-12-03T23:43:02Z

The method example is just the most common case where this problem can be easily seen, but not the only one. We indeed use the @cached_property decorator on properties (similar to https://github.com/mitsuhiko/werkzeug/blob/master/werkzeug/utils.py#L35), and I've actually written a @memoized_method decorator for methods, that do pretty much what Serhiy suggests (except it's not repeated all over the place, a la copy-pasta, but kept in one decorator). That is fine, and the @cached_property is actually superior, as it avoids a lookup and a check once the value has been calculated.

However, this still doesn't solve the problems that are encountered in practice in actual code, like here: https://github.com/openstack/horizon/blob/master/openstack_dashboard/api/neutron.py#L735

Here we have a normal function, not a method, that calls a remote API through HTTP (the call is relatively slow, so we definitely want to cache it for multiple invocations). The function takes a request parameter, because it needs it for authentication with the remote service. The problem we had is that this will keep every single request in memory, because it's referenced by the cache.

Somehow it feels wrong to store the cache on an arbitrary attribute of the function, like the request in this case, and it's easy to imagine a function that takes two such critical arguments.

This is the code that actually made me write the weakref version of the @memoized decorator that I linked initially, and I thought that it could also be useful to have that in Python's caching decorator as an option.

I can understand if you think that this is too much, and that in such tricky situations the programmer should either write their own caching, or rewrite the code to avoid a memory leak. But I am sure that most programmers will not even think about this caveat. I think that we should at least add a note to lru_cache's documentation warning about this scenario and advising them to write their own caching decorators.

thesheep · 2013-12-03T23:57:43Z

Actually, after looking closer, my @memoize_method decorator does something completely different than Serhiy suggested. Still it only solves the common case of methods, and does nothing if you pass your short-lived objects as other parameters than self.

Limiting the cache size is also not a solution in the practical example with request that I linked to in the previous comment, because we can't know in advance how many times per request the function is going to be called, picking an arbitrary number feels wrong and may lead to unexpected behaviors when other fragments of code change (like a sudden slowdown every N calls).

rhettinger · 2013-12-04T09:32:47Z

Limiting the cache size is also not a solution in the
practical example with request that I linked to in the
previous comment, because we can't know in advance how
many times per request the function is going to be called,
picking an arbitrary number feels wrong and may lead to
unexpected behaviors

This suggests that you don't really want an LRU cache which is specifically designed to limit the cache size by expelling the
least recently used entry.

At its heart, the cache decorator is all about mapping a fixed inputs to fixed outputs. The memory conservation comes from the replacement strategy and an option to clear the cache entirely.

The reason that my answer and Serhiy's answer don't fit your needs is that it isn't clear what you really want to do. I think you should move this discussion to StackOverflow so others can help you tease-out your actual needs and suggest appropriate solutions. Ideally, you should start with real use cases rather than focusing on hacking-up the LRU cache implementation.

thesheep · 2013-12-04T10:15:20Z

Thank you for your attention. I'm actually quite happy with the solution we have, it works well. That's actually I thought that it may be worthwhile to try and push it upstream to Python. I can totally understand why you don't want to add too much to the standard library, after all, everything you add there has to stay forever. So please consider this patch abandoned.

But I think it's would be still worthwhile to add a note to the lru_cache's documentation, saying something like:

"""
Warning! lru_cache will keep references to all the arguments for which it keeps cached values, which prevents them from being freed from memory when there are no other references. This can lead to memory leaks when you call a function with lru_cache on a lot of short-lived objects.
"""

I suppose you can come up with a nicer phrasing.

ncoghlan · 2013-12-04T12:15:29Z

On 4 December 2013 20:15, Radomir Dopieralski <report@bugs.python.org> wrote:

But I think it's would be still worthwhile to add a note to the lru_cache's documentation, saying something like:

"""
Warning! lru_cache will keep references to all the arguments for which it keeps cached values, which prevents them from being freed from memory when there are no other references. This can lead to memory leaks when you call a function with lru_cache on a lot of short-lived objects.
"""

Umm, that's part of the operational definition of a value based cache

it needs to keep things alive, so that if a different instance shows
up with the same value, it will still get a cache hit.

We're willing to put warnings in the docs for cases where it's easy to
inadvertently introduce a security vulnerability, but not for
situations like this where using a container inappropriately may cause
it to keep objects alive that you didn't intend to keep alive.

thesheep · 2013-12-04T13:49:00Z

Umm, that's part of the operational definition of a value based cache

it needs to keep things alive, so that if a different instance shows
up with the same value, it will still get a cache hit.

If it only kept the return value alive, that wouldn't be a problem, it's indeed intuitively obvious that it has to do that in order to work. But what many people miss to notice is that it also keeps any arguments that were passed to the function alive. This is not intuitive, and as I demonstrated with my patch, not even necessary, so I think it might be worthwhile to at least mention this little implementation quirk.

pitrou · 2013-12-04T15:05:11Z

Just adding a note in the documentation sounds enough.

rhettinger · 2013-12-04T21:24:57Z

I'm not in favor of filling the docs with warnings like this. It tends to make everything sound dangerous even when the tools are doing exactly what they are supposed to do.

Every container (except for the weakref containers) keeps their references alive. That is how Python works. The LRU cache is no more special in this regard than a dictionary, list, or set. In addition, the older entries get flushed-out and freed as the LRU cache gets newer entries.

[Radomir Dopieralski]

So please consider this patch abandoned.

Marking this as closed.

thesheep mannequin added performance Performance or resource usage stdlib Python modules in the Lib dir labels Dec 2, 2013

rhettinger self-assigned this Dec 3, 2013

rhettinger added type-feature A feature request or enhancement and removed performance Performance or resource usage labels Dec 3, 2013

rhettinger closed this as completed Dec 4, 2013

ezio-melotti transferred this issue from another repository Apr 10, 2022

VannTen mentioned this issue Aug 22, 2022

Reevaluate custom lru_cache decorator thoth-station/storages#2688

Open

ocefpaf mentioned this issue May 22, 2023

Implement lru_cache for get_variables_by_attributes Unidata/netcdf4-python#1253

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

functools.lru_cache keeps objects alive forever #64058

functools.lru_cache keeps objects alive forever #64058

thesheep mannequin commented Dec 2, 2013

thesheep mannequin commented Dec 2, 2013

thesheep mannequin commented Dec 3, 2013

serhiy-storchaka commented Dec 3, 2013

serhiy-storchaka commented Dec 3, 2013

rhettinger commented Dec 3, 2013

thesheep mannequin commented Dec 3, 2013

thesheep mannequin commented Dec 3, 2013

rhettinger commented Dec 4, 2013

thesheep mannequin commented Dec 4, 2013

ncoghlan commented Dec 4, 2013

thesheep mannequin commented Dec 4, 2013

pitrou commented Dec 4, 2013

rhettinger commented Dec 4, 2013

functools.lru_cache keeps objects alive forever #64058

functools.lru_cache keeps objects alive forever #64058

Comments

thesheep mannequin commented Dec 2, 2013

thesheep mannequin commented Dec 2, 2013

thesheep mannequin commented Dec 3, 2013

serhiy-storchaka commented Dec 3, 2013

serhiy-storchaka commented Dec 3, 2013

rhettinger commented Dec 3, 2013

thesheep mannequin commented Dec 3, 2013

thesheep mannequin commented Dec 3, 2013

rhettinger commented Dec 4, 2013

thesheep mannequin commented Dec 4, 2013

ncoghlan commented Dec 4, 2013

thesheep mannequin commented Dec 4, 2013

pitrou commented Dec 4, 2013

rhettinger commented Dec 4, 2013