classification
Title: Standardise (and publish?) cache handling in standard library
Type: enhancement Stage: needs patch
Components: Library (Lib) Versions: Python 3.2
process
Status: closed Resolution: fixed
Dependencies: 9567 Superseder:
Assigned To: rhettinger Nosy List: eric.araujo, flox, ncoghlan, r.david.murray, rhettinger
Priority: low Keywords:

Created on 2010-07-28 11:27 by ncoghlan, last changed 2010-08-22 08:51 by eric.araujo. This issue is now closed.

Messages (12)
msg111790 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2010-07-28 11:27
The standard library has several cache implementations (e.g. in re, fnmatch and ElementTree) with different cache size limiting strategies. These should be standardised and possibly even exposed for general use.

Refer to python-dev discussion:
http://mail.python.org/pipermail/python-dev/2010-July/102473.html
msg113273 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2010-08-08 15:44
With Raymond adding functools.lru_cache and functools.lfu_cache, it should be possible to use those for the various caches in the standard library.

My only point of concern is that the standard lib caches tend to allow dynamic modification of the max cache size, while the new cache decorators appear to be fixed at the size specified when defining the decorator.
msg113300 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-08-08 19:22
I will take a look at the various caches and see if some of their features can be consolidated.  It is okay if some need to have their own strategies or situation specific approaches, but you're right that common features should be looked at to see if they make sense in the functools caches.

Am not sure that dynamically changing the maxsize is a generally useful feature (most are set to the largest size that makes sense and not revisited latter), but I will take a look to whether that complicates the code (if it's simple and fast, it may be worth doing).
msg113374 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2010-08-09 04:08
Yeah, I'm not sure the dynamic resizing makes sense in general. I was just pointing it out as something supported by the existing caches that could complicate a consolidation effort.
msg113378 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-08-09 04:29
Applied the lru_cache() to fnmatch and re.
See r83874 r83871.

I did find a simple way to dynamically resize the maxcache, but did not apply it yet.   Will look at more applications to see if it is really needed.

Nick, thanks for the great ideas.  These changes simplified the code where they were applied and resulted in a smarter caching strategy.
msg113382 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2010-08-09 04:52
On Mon, Aug 9, 2010 at 2:29 PM, Raymond Hettinger
<report@bugs.python.org> wrote:
> I did find a simple way to dynamically resize the maxcache, but did not apply it yet.   Will look at more applications to see if it is really needed.
>
> Nick, thanks for the great ideas.  These changes simplified the code where they were applied and resulted in a smarter caching strategy.

The reason I mentioned the dynamic sizing specifically was that the
discussions where we realised we had all these different caches
floating around had to do with tuning the caching strategy to a
particular application based on speed vs memory trade-offs. While we
can pick a number that is a reasonable default, a server deployment
may want to turn the dial towards faster response times with higher
memory consumption, while an embedded device may want to push the dial
the other way. The smarter caching strategies you added are likely to
help far more than the blunt hammer approach of increasing the cache
size, but the speed/memory trade-off in choosing that size is still a
question that has no universally correct answer.
msg113383 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-08-09 05:11
I was thinking about the problem of developers wanting a different cache size than that provided in standard lib modules. 

ISTM that now we've offered caching abilities in functools, a developer can easily add another layer of cache around any API they are interested in.  For example, if someone is using thousands of recurring fnmatch patterns, they can write something like:

   @functools.lfu_cache(maxsize=10000)  # custom fat cache
   def fnmatchcase(*args):
       return fnmatch.fnmatchcase(*args)

IMO, this beats adding caching controls to lots of APIs that should be focused only on their problem domain.  IOW, it is probably not a good idea to add purge() and cache_resize() functions to multiple modules throughout the standard lib.

ISTM, we should just provide basic caching with reasonable space consumption (i.e. not huge) that gives improvements to common use cases (like I've done with the fnmatch and re module) and let programmers with unusual cases add their own caching options rather that be tied into our choice of lru vs lfu or whatnot.
msg113385 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2010-08-09 05:51
On Mon, Aug 9, 2010 at 3:11 PM, Raymond Hettinger
<report@bugs.python.org> wrote:
> ISTM, we should just provide basic caching with reasonable space consumption (i.e. not huge) that gives improvements to common use cases (like I've done with the fnmatch and re module) and let programmers with unusual cases add their own caching options rather that be tied into our choice of lru vs lfu or whatnot.

A very good point! Perhaps we should note that somewhere? I'm not sure
where though, unless we just mention it in the docs for the relevant
modules..

Going the other way (using a smaller, or no, cache), perhaps in
addition to the new hit/miss attributes, the cache decorators should
expose the original function to allow the cache to be bypassed?
msg113570 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-08-10 23:31
Great minds think alike.  I was just about to propose that functools.wraps add a standard attribute to point at the underlying function (on the theory that objects should be introspectable).  This would allow a standard way to get to the underlying unwrapped functions.
msg113580 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-08-11 02:38
After discussion with RDM on IRC, I’m opening a new report to track this feature request separately. (It’s also a dependency of this bug.)
msg113791 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2010-08-13 16:15
Have we had any luck getting this to play nicely with the buildbots yet? (I lost track of where the last checkin got to). The necessary Modules/Setup change to adjust when _collections is built should have propagated through by now.
msg114647 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-08-22 08:51
Raymond, out of curiosity, can you tell why you removed lfu_cache?
History
Date User Action Args
2010-08-22 08:51:32eric.araujosetmessages: + msg114647
2010-08-22 08:06:14rhettingersetstatus: open -> closed
resolution: fixed
2010-08-13 16:15:11ncoghlansetmessages: + msg113791
2010-08-11 02:38:21eric.araujosetassignee: r.david.murray -> rhettinger
dependencies: + Add attribute pointing to wrapped function in functools.update_wrapper
messages: + msg113580
2010-08-10 23:31:36rhettingersetstatus: closed -> open
priority: normal -> low

assignee: rhettinger -> r.david.murray

nosy: + r.david.murray
messages: + msg113570
resolution: fixed -> (no value)
2010-08-09 05:51:02ncoghlansetmessages: + msg113385
2010-08-09 05:11:46rhettingersetmessages: + msg113383
2010-08-09 04:52:45ncoghlansetmessages: + msg113382
2010-08-09 04:29:10rhettingersetstatus: open -> closed
resolution: fixed
messages: + msg113378
2010-08-09 04:10:06eric.araujosetnosy: + eric.araujo
2010-08-09 04:08:58ncoghlansetmessages: + msg113374
2010-08-08 19:24:07rhettingersetassignee: rhettinger
2010-08-08 19:22:59rhettingersetmessages: + msg113300
versions: + Python 3.2, - Python 3.3
2010-08-08 15:44:02ncoghlansetmessages: + msg113273
2010-08-04 09:48:02pitrousetnosy: + rhettinger
2010-08-04 09:29:01floxsetnosy: + flox
2010-07-28 11:27:10ncoghlancreate